Add HTML Page Title to Element Metadata in partition_html() #3970
prasannaJosium
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Description:
Currently, when using
partition_html()
, the metadata of elements doesn't include the HTML page title, which is a valuable piece of information that could be useful for many use cases. The title is available in the HTML document's<title>
tag but isn't being extracted and included in the element metadata.Proposed Solution:
Add a
page_title
field to theElementMetadata
class and modify thepartition_html()
function to extract and include the page title in the metadata of each element. This would involve:page_title: Optional[str] = None
to theElementMetadata
classFIRST
strategyBenefits:
Example Usage:
Would you like me to submit a PR with these changes?
If there are other ways to get his done, please do educate.
Cheers
Beta Was this translation helpful? Give feedback.
All reactions