-
Notifications
You must be signed in to change notification settings - Fork 3
Conversation
* | ||
* @return string The token value. | ||
*/ | ||
public function get_value(): string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I like that using get_value()
is lazy and can generally nicely work for any token type where we need to interpret or normalize any values, I'm wondering how to solve the NO_BACKSLASH_ESCAPES
SQL mode.
It's a very simple IF, but in the token instance, we just know nothing about SQL modes 🤔 The tokenizer knows it, so it could pass in a flag, or use a different token instance, but that makes it a bit less elegant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it be a constructor argument? The mode is already determined when the token is created. If that was a boolean flag baked into the Token instance, we could still keep the get_value()
method argument-less.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Done in 6e5a8f5.
* > of pattern-matching contexts, they evaluate to the strings \% and | ||
* > \_, not to % and _. | ||
*/ | ||
'\%' => '\\\\%', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about using '\\%'
instead of \%
just to make sure PHP won't surprise us by treating the backslash as an escape sequence? Or, alternatively, $backslash . '%'
or '\x5C%'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought maybe we're good since PHP docs says this:
To specify a literal single quote, escape it with a backslash (
\
). To specify a literal backslash, double it (\\
). All other instances of backslash will be treated as a literal backslash
But this gives me trust issues:
// This backslash is not treated as a literal backslash:
php > var_dump("\x5C");
string(1) "\"
// Neither is this:
php > var_dump("\n");
string(1) "
"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah no, that was just me using the wrong quotes. We are good indeed:
php > var_dump('\x5C');
string(4) "\x5C"
php > var_dump('\n');
string(2) "\n"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're good here, but it's a good point to be a bit more explicit. I don't like the maps like '\n' => "\n"
, where at a quick sight, the left and right sides look the same.
I followed the $baskslash
idea in 931c82b, and I think it looks better now.
/* | ||
* Apply the replacements. | ||
* | ||
* It is important to use "strtr()" and not "str_replace()", because |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such a brilliant find ❤️
* A backslash with any other character represents the character itself. | ||
* That is, \x evaluates to x, \\ evaluates to \, and \🙂 evaluates to 🙂. | ||
*/ | ||
$value = preg_replace( '/\\\\(.)/u', '$1', $value ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I feel nostalgic. The first time I've read how a quadruple backslash evaluates to a single backslash in preg_* functions was about 20 years ago in a PHP4 book. I'm old. 👴 Can we either document this or express this in a different way? Perhaps $escaped_backslash = preg_quote("\x5C");
and preg_replace( '/'. $escaped_backslash .'(.)/u', '$1', $value );
? Maybe that's an overkill. Feel free to make any call here, I just want to make sure this gets brought up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I did that together with the other improvements in 931c82b.
wp-includes/sqlite-ast/class-wp-sqlite-information-schema-reconstructor.php
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, thank you Jan! I'm approving provisionally – I'd still like to see a rigorous test case that directly targets the quote_mysql_utf8_string_literal
method. I know it's private. Perhaps it's generic and useful enough to be exposed publicly? And if not, there are ways to test private methods, too (although protected ones are easier).
I left a nitpick about a comment, but the substance of the PR looks great. Thank you for additional tests! |
The translate_string_literal logic in the SQLite driver that handles "unescaping" (interpreting) of MySQL escape sequences is only applied to a
textStringLiteral
AST node. It turns out, the same logic is needed intranslate_pure_identifier
, or, in other words, also forWP_MySQL_Lexer::DOUBLE_QUOTED_TEXT
andWP_MySQL_Lexer::BACK_TICK_QUOTED_ID
tokens.Therefore, it's probably better to move this logic to the tokenization phase and make token instances return the correct unquoted token values.
Surfaced in #42.