pdfsyntax

A Python library to inspect and transform PDF files

🔎

implements all the detailed document structure management down to the byte level (the PDF Syntax) and offers a higher level API for inspection and transformation use cases (access to metadata, rotation,...)

🎈

is written from scratch in pure Python as a lightweight package without dependency, with a focus on simplicity and immutability

favors non-destructive edits allowed by the PDF Specification: by default incremental updates are added at the end of the original file (you may rewind or squash all revisions into a single one)

allows efficient transformation of files by parsing only the necessary objects/blocks (lazy loading, memoization) and writing only the changes (incremental updates)

is free and Open Source (MIT licence)