Conversation
This is based on https://github.com/WordPress/php-toolkit de13df1465d5b685dbd77ca0337aac017dcf606e
There was a problem hiding this comment.
Code Review
This pull request introduces the WP_CSS_Token_Processor class, a pull-based CSS tokenizer compliant with the CSS Syntax Level 3 specification, along with a comprehensive unit test suite. The processor supports efficient stylesheet streaming and lexical updates, such as URL rewriting. Feedback was provided regarding the set_token_value method for TOKEN_URL tokens, which currently converts unquoted URLs into quoted strings, thereby altering their semantic token type. It is recommended to implement a dedicated escaping helper for unquoted URLs to maintain strict specification adherence.
| $this->lexical_updates[] = array( | ||
| 'start' => $this->token_value_starts_at, | ||
| 'length' => $this->token_value_length, | ||
| 'text' => $this->create_css_string( $new_value ), | ||
| ); |
There was a problem hiding this comment.
The set_token_value method, when used with TOKEN_URL, converts an unquoted URL into a quoted string by calling create_css_string. According to the CSS Syntax Level 3 specification, url() with an unquoted value is a url-token, while url("...") or url('...') is a function-token followed by a string-token. This change in tokenization type can lead to unexpected behavior or incorrect parsing by downstream consumers that strictly adhere to the CSS spec.
To maintain spec adherence and avoid altering the token's semantic type, a dedicated helper function should be introduced to escape characters specifically for unquoted URL values, without adding quotes. This new helper should then be used for TOKEN_URL updates.
case self::TOKEN_URL:
$this->lexical_updates[] = array(
'start' => $this->token_value_starts_at,
'length' => $this->token_value_length,
'text' => $this->create_css_unquoted_url_value( $new_value ),
);
return true;| ); | ||
| return "\"{$escaped}\""; | ||
| } | ||
|
|
There was a problem hiding this comment.
A new private helper method create_css_unquoted_url_value is needed to correctly handle escaping for unquoted URL values, as create_css_string is designed for quoted strings and adds unnecessary quotes for TOKEN_URL.
This method should escape characters such as backslashes, parentheses, quotes, whitespace, and null bytes, as these are not allowed in unquoted URLs according to the CSS Syntax Level 3 specification.
/**
* Create an unquoted CSS URL value from a plain PHP string value.
*
* Escapes characters that are not allowed in unquoted URLs.
*
* @see https://www.w3.org/TR/css-syntax-3/#consume-url-token
*/
private function create_css_unquoted_url_value( string $value ): string {
$escaped = strtr(
$value,
array(
'\\' => '\\5C ',
'(' => '\\28 ',
')' => '\\29 ',
'"' => '\\22 ',
"'" => '\\27 ',
// Pre-processing replaces NULLs and some newlines. Replace and escape as necessary.
"\0" => "\u{FFFD}",
// Newlines must be escaped in CSS unquoted URLs.
// Normalize and replace newlines. https://www.w3.org/TR/css-syntax-3/#input-preprocessing
"\r\n" => '\\A ',
"\r" => '\\A ',
"\f" => '\\A ',
"\n" => '\\A ',
// Space must be escaped in CSS unquoted URLs.
' ' => '\\20 ',
)
);
return $escaped;
}
Trac ticket:
Use of AI Tools
This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.