Skip to content

PDFs with graphics instead of text #4412

Closed Answered by dosubot bot
KeenanFernandes2000 asked this question in Q&A
Discussion options

You must be logged in to vote

🤖

Hey @KeenanFernandes2000, nice to see you back! Hope you're doing well.

The current implementation of the WebPDFLoader class in LangChainJS is designed to handle text content in PDFs. It uses the getTextContent() method from the pdf-parse library to extract text from each page of the PDF. This method does not handle graphical content, which is why you're seeing an error when trying to process PDFs with graphical content.

To handle PDFs that contain graphical content, you would need to use a different method or library that can extract or convert graphical content from PDFs. Unfortunately, the pdf-parse library used in this code does not support this functionality.

However, based on the …

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by KeenanFernandes2000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
1 participant