This Python-based tool allows for efficient comparison of two or more PDF documents, highlighting the differences between them. It extracts and compares the words in the PDFs, ignoring whitespace differences, and highlights the changed, added, or missing words.
- Word-based Comparison: Compares text from two or more PDFs, highlighting only added, modified, or deleted words.
- Whitespace Ignored: Ignores any differences in whitespace, focusing only on actual word changes.
- Precise Highlighting: Highlights the differences in the compared PDF files using custom colors (e.g., red for PDF2 and green for PDF3).
- Side-by-Side Merging: Merges the original and highlighted PDFs side by side for easy comparison.
- Provide paths to the PDF files to be compared.
- The tool will extract words from the PDFs, compare them, and highlight the differences.
- It saves the highlighted PDFs and a merged output with the original and highlighted PDFs placed side by side for an easy visual comparison.
- PyMuPDF (fitz)
- difflib (standard Python library)
- The tool performs comparison on a page-by-page basis. If the PDF documents differ in the number of pages, it will compare up to the smallest page count.
- Text formatting (e.g., font size, style) is not considered in the comparison; only the raw text content is compared.
- Original PDF: The untouched source document.
- Highlighted PDF: PDFs with added, changed, or missing words highlighted in different colors.
- Combined Output: A single PDF containing the original and highlighted versions side by side.