Skip to content
This repository was archived by the owner on May 30, 2025. It is now read-only.

Commit 1b3ceab

Browse files
authored
XMLProcessor: Support SYSTEM and PUBLIC DOCTYPE sections (#7)
Support DOCTYPE declarations with SYSTEM and PUBLIC sections. The following, previously unsupported, DOCTYPE declarations are now correctly parsed and validated: **SYSTEM identifiers**: ```xml <!DOCTYPE html SYSTEM "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ``` **PUBLIC identifiers**: ```xml <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> ``` Here's how the new API can be consumed: ```php $processor = XMLProcessor::create_from_string( '<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.1//EN\" \"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd\"><root>Content</root>' ); $processor->next_token(); // Output: html echo $processor->get_doctype_name(); // Output: -//W3C//DTD XHTML 1.1//EN echo $processor->get_pubid_literal(); // Output: http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd echo $processor->get_system_literal(); ``` This conforms with the following part of the DOCTYPE syntax: ``` doctypedecl ::= '<!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '>' ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral SystemLiteral ::= ('"' [^"]* '"') | ("'" [^']* "'") PubidLiteral ::= '"' PubidChar* '"' | "'" (PubidChar - "'")* "'" PubidChar ::= #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] ``` ### Other changes XMLProcessor now distinguishes between an invalid (malformed) and an unsupported (e.g. inline entity declaration) DOCTYPE syntax and yields an appropriate error. ## Testing instructions Confirm all the XMLProcessor unit tests pass (the CI is red overall because of other, unrelated failures). cc @akirk
1 parent 97e41c6 commit 1b3ceab

File tree

2 files changed

+538
-163
lines changed

2 files changed

+538
-163
lines changed

0 commit comments

Comments
 (0)