Skip to content

Control characters should be allowed in strings as part of control sequences #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ovidiu-munteanu opened this issue Feb 11, 2017 · 1 comment
Assignees

Comments

@ovidiu-munteanu
Copy link

ovidiu-munteanu commented Feb 11, 2017

According to the ECMA-404 standard (see section 9 String) control characters are allowed within JSON strings as long as they are escaped correctly.

Therefore, the string regex given in this implementation is incorrect as it excludes control characters that should be accepted. There are also some additional errors as the \ character should itself be escaped within a JSON string.

A correct definition for JSON strings is given below:

\\ Exclude the code points for the characters that should be escaped
valid_set_char = [^\u0000-\u001F\"\\]

\\ But allow them as part of a valid escape sequence
valid_escape_seq = \\\" | \\\\ | \\/ | \\b | \\f | \\n | \\r | \\r | \\t | (\\u([0-9a-fA-F]{4}))

\\ Therefore, a valid string consists of any number of valid characters or valid escape sequences
valid_character = {valid_set_char} | {valid_escape_seq}
@kimeshan
Copy link
Owner

kimeshan commented Oct 20, 2017

Nice catch, feel free to submit a PR with the fix!

ovidiu-munteanu added a commit to ovidiu-munteanu/kimeshan-json-parser that referenced this issue Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants