Skip to content
This repository was archived by the owner on May 30, 2025. It is now read-only.

XMLProcessor: Support namespaces #23

Draft
wants to merge 6 commits into
base: trunk
Choose a base branch
from
Draft

Conversation

adamziel
Copy link
Contributor

A first WIP draft of XML namespaces support in the processor. Built to satisfy https://www.w3.org/TR/2006/REC-xml-names11-20060816/#uniqAttrs.

Remaining work:

  • Merging $stack_of_open_elements and $namespace_stack into a single stack of objects that carry tag's local name, namespace prefix, full namespace reference, and a list of namespaces for the current tag scope.
  • Support querying by namespaces in next_tag()
  • Tests

cc @sirreal @dmsnell


$this->assertTrue( $processor->next_tag(), 'Querying a tag did not return true' );
$this->assertEquals( '&#x94', $processor->get_attribute( 'enabled' ) );
$this->assertEquals( '&#x94', $processor->get_attribute( '', 'enabled' ) );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are good, and the thing I wanted to review most specifically. it was confusing to me how attributes behave since they are not namespaced without an explicit namespace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I don't know why is the spec handling elements and attributes so differently and I agree this is confusing. I went for requiring an explicit namespace as the first argument to most of these functions after seeing this passage in the spec:

An attribute-based declaration syntax is provided to bind prefixes to namespace names and to bind a default namespace that applies to unprefixed element names; these declarations are scoped by the elements on which they appear so that different bindings may apply in different parts of a document. Processors conforming to this specification MUST recognize and act on these declarations and prefixes.

Interestingly, the existing XML parsers I've checked seem to expose an API that largely ignores namespaces. You have to know about their semantics and explicitly use namespaced functions. I wanted to avoid this here so I made the full namespace a required first argument in most places where it's relevant. It's surely annoying to use, but I also couldn't think of any other way of making namespaces first-class citizens.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like I missed any tests for changing the default namespace, and that encounters two facets:

  • changing the default namespace changes it for the element itself and all of its children, though it does not apply to attributes
  • changing the default namespace to xmlns="" removes a default namespace

another thing I pulled from the namespace spec is that we have to be careful not to allow attributes to have the same extended name using different namespace prefixes: these are equivalent to having duplicate copies of the same unprefixed attributes.

<!-- http://www.w3.org is bound to n1 and n2 -->
<x xmlns:n1="http://www.w3.org" 
   xmlns:n2="http://www.w3.org" >
  <bad a="1"     a="2" />
  <bad n1:a="1"  n2:a="2" />
</x>

the interpretation of unprefixed attributes is determined by the element on which they appear.

this is also there, but I feel like there’s nothing we can do at this point, where we cannot make any document-specific interpretation beyond “no namespace” for unprefixed attributes.

@adamziel
Copy link
Contributor Author

adamziel commented May 28, 2025

@dmsnell good call! Ive added a few basic tests for these scenarios in c2940bd.

this is also there, but I feel like there’s nothing we can do at this point, where we cannot make any document-specific interpretation beyond “no namespace” for unprefixed attributes.

Agreed, I think assuming "no namespace" is the best we can do.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants