Skip to content

URL hash is hashing the wrong thing #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
annevk opened this issue May 20, 2025 · 3 comments
Open

URL hash is hashing the wrong thing #8

annevk opened this issue May 20, 2025 · 3 comments

Comments

@annevk
Copy link

annevk commented May 20, 2025

'sha256-SHA256("script.js")' is one of the examples. Presumably this function call is to be replaced by the actual hash? This isn't clear from the explainer.

Either way, the input cannot be script.js as that is the input to the URL parser and that won't be there when Fetch calls into CSP. Requiring that to survive would be a layering violation and a very objectionable one at that.

@meacer
Copy link
Collaborator

meacer commented May 20, 2025

Sorry for the confusion, 'sha256-SHA256("script.js")' is indeed a placeholder. In practice, it'll look like 'url-sha256-7T7n4L7qJJj/O4yoWXPRIvxvo9WF1itYB+wDTQzwdrM='. I'll update the doc.

One of the uses cases for url hashes is to generate a working CSP from the html of the page itself, without knowing where the page will deploy. Our main goal is to make the deployment of a reasonably secure CSP easier, so supporting relative URLs in this manner will allow that.

The current idea to get the relative URL is not to use it from page source, but to extract it from the <document URL, resource URL> pair. Assuming both URLs are well formed and normalized, the algorithm to do this is fairly straightforward:

Given two URLs document_url and resource_url:

1. If document_url's origin is not equal to resource_url's origin, return null
2. Let document_parts = document_url's path split by "/"
3. Let resource_parts = resource_url's path split by "/"
4. Let common_prefix_len = 0 and min_len = min(len(document_parts), len(resource_parts))
5. While common_prefix_len < min_len and document_parts[common_prefix_len] == relative_parts[common_prefix_len], increment common_prefix_len. Otherwise, break.
6. Let level_difference = len(document_parts) - common_prefix_len
7. Let relative_path_parts be a string array consisting of level_difference times "..".
8. Return relative_path_parts.join("/") + resource_parts[common_prefix_len:]

This returns ../script.js for (https://example.com/abc/def/page.html, https://example.com/abc/script.js) pair. (The exact algorithm might have to handle more edge cases)

This should avoid layering violations since it's not using the attribute directly from the HTML. Would something like this work?

@annevk
Copy link
Author

annevk commented May 21, 2025

I don't think so. There are endless variations you can spell these kind of paths. E.g., /fakedirectory/../anotherfakedirectory/../script.js.

@meacer
Copy link
Collaborator

meacer commented May 21, 2025

I think any path will need to be normalized with something like Remove Dot Segments algorithm (https://datatracker.ietf.org/doc/html/rfc3986#section-5.2.4) before being hashed. Both the browser and the CSP generator tool will need to do this.

I previously looked into HTTP archive data to see if normalization could be achieved easily. I don't have the exact numbers right now, but there were fewer than 100 resources from the top 1000 URLs that couldn't be normalized this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants