PDFs with graphics instead of text #4412
-
If I use the PDFLoader and the pdf I upload has content but in the form of graphics,. How can I handle this type of document as it current throws an error when trying to convert it into a vector. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
🤖 Hey @KeenanFernandes2000, nice to see you back! Hope you're doing well. The current implementation of the To handle PDFs that contain graphical content, you would need to use a different method or library that can extract or convert graphical content from PDFs. Unfortunately, the However, based on the provided context, the LangChainJS codebase has several methods that could potentially handle graphical content in PDFs. These methods are part of the
These methods and classes could potentially handle graphical content in PDFs as part of the rendering process. However, this would require significant changes to the You can find more information about these classes and methods in the langchain/src/types/pdf-parse.d.ts file. Please note that modifying these classes or methods to handle graphical content in PDFs would require a deep understanding of the LangChainJS codebase and the I hope this information helps. If you have any further questions, feel free to ask. Sources
|
Beta Was this translation helpful? Give feedback.
🤖
Hey @KeenanFernandes2000, nice to see you back! Hope you're doing well.
The current implementation of the
WebPDFLoader
class in LangChainJS is designed to handle text content in PDFs. It uses thegetTextContent()
method from thepdf-parse
library to extract text from each page of the PDF. This method does not handle graphical content, which is why you're seeing an error when trying to process PDFs with graphical content.To handle PDFs that contain graphical content, you would need to use a different method or library that can extract or convert graphical content from PDFs. Unfortunately, the
pdf-parse
library used in this code does not support this functionality.However, based on the …