Improper chunking for pdf #3803
Unanswered
anshulgoyal43
asked this question in
Q&A
Replies: 1 comment
-
Did you try strategy="hi_res"? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to do pdf-chunking for my RAG
here is the code I ran
`from unstructured.partition.pdf import partition_pdf
file = "/Users/anshulgoyal/work/pdf_files/a1836-10.pdf"
print("Processing file:", file)
chunks = partition_pdf(filename=file, strategy="fast", chunking_strategy="basic")
for i in chunks:
print(i)
print("-"*100)
`
the link to pdf 'https://www.indiacode.nic.in/bitstream/123456789/18935/1/a1836-10.pdf'
The sentences are broken in middle in chunks, what am I missing?
Is this an issue with pdf or unstructured itself?
Beta Was this translation helpful? Give feedback.
All reactions