PythonPDF Library Source Code License

PythonPDF Library is a tool for extracting information from PDF documents. Unlike other PDF-related tools, it focuses entirely on getting and analyzing text contents and PDF construction. PythonPDF Library allows one to obtain the exact location of text in a page, as well as other information such as fonts or lines. It includes a PDF converter that can transform PDF files into other text formats (such as HTML). It has an extensible PDF parser that can be used for other purposes than text analysis.

PythonPDF Library is a Python library and utility that reads and writes PDF files:

  • PythonPDF Library is tested and works on Python 2.6, 2.7, 3.3, 3.4, 3.5, and 3.6.
  • Operations include subsetting, merging, rotating, modifying metadata, etc.
  • The fastest pure Python PDF parser available.
  • Has been used for years by a printer in pre-press production.
  • Can be used with rst2pdf to faithfully reproduce vector images.
  • Can be used either standalone, or in conjunction with your application to reuse existing PDFs in new ones.
  • Written entirely in Python.
  • Parse, analyze, and convert PDF documents.
  • PDF-1.7 specification support. (well, almost)
  • CJK languages and vertical writing scripts support.
  • Various font types (Type1, TrueType, Type3, and CID) support.
  • Basic encryption (RC4) support.
  • PDF to HTML conversion (with a sample converter web app).
  • Outline (TOC) extraction.
  • Tagged contents extraction.
  • Reconstruct the original layout by grouping text chunks.
  • Extracting document information (title, author, …).
  • Splitting documents page by page.
  • Merging documents page by page.
  • Cropping pages.
  • Merging multiple pages into a single page.
  • Encrypting and decrypting PDF files.

PythonPDF will faithfully reproduce vector formats without rasterization, so the rst2pdf package has used PythonPDF for PDF and SVG images by default since March 2010.

PythonPDF can also be used in conjunction with your application, in order to re-use portions of existing PDFs in new PDFs created with any PDF software.

PythonPDF library comes with several examples that show more operations to PDF files.

All examples
The examples directory has a few scripts which use the library. Note that if these examples do not work with your PDF, please feel free send this PDF file to us, we will analyze this PDF file and come back to you asap.

  • 4up.py will shrink pages down and place 4 of them on each output page.
  • alter.py shows an example of modifying metadata, without altering the structure of the PDF.
  • booklet.py shows an example of creating a 2-up output suitable for printing and folding (e.g on tabloid size paper).
  • cat.py shows an example of concatenating multiple PDFs together.
  • extract.py will extract images and Form XObjects (embedded pages) from existing PDFs to make them easier to use and refer to from new PDFs (e.g. with or rst2pdf).
  • poster.py increases the size of a PDF so it can be printed as a poster.
  • print_two.py Allows creation of 8.5 X 5.5" booklets by slicing 8.5 X 11" paper apart after printing.
  • rotate.py Rotates all or selected pages in a PDF.
  • subset.py Creates a new PDF with only a subset of pages from the original.
  • unspread.py Takes a 2-up PDF, and splits out pages.
  • watermark.py Adds a watermark PDF image over or under all the pages of a PDF.
  • rl1/4up.py Another 4up example, using canvas for output.
  • rl1/booklet.py Another booklet example, using canvas for output.
  • rl1/subset.py Another subsetting example, using canvas for output.
  • rl1/platypus_pdf_template.py Another watermarking example, using canvas and generated output for the document. Contributed by user asannes.
  • rl2 Experimental code for parsing graphics. Needs work.
  • subset_booklets.py shows an example of creating a full printable pdf version in a more professional and pratical way.

By being Pure-Python, it should run on any Python platform without any dependencies on external libraries. It can also work entirely on StringIO objects rather than file streams, allowing for PDF manipulation in memory. It is therefore a useful tool for websites that manage or manipulate PDFs.

Write a review

Note: HTML is not translated!
    Bad           Good
Captcha

PythonPDF Library Source Code License

  • Product Code: MOD190303211522
  • Availability: In Stock
  • Viewed: 16714
  • Sold By: eDoc Software
  • Seller Rating:
  • Seller Reviews: (0)
  • $299.00


Available Options


Related Products

PDF to ePub Converter Command Line

PDF to ePub Converter Command Line

PDF to ePub Converter Command Line does convert Adobe PDF files to Responsive EPUB documents. PDF..

$79.95

PS to Image Converter SDK

PS to Image Converter SDK

PostScript to Image Converter SDK is a DLL SDK Library for developers. PS to Image Converter SDK i..

$295.00

PDF Comparer for Windows

PDF Comparer for Windows

PDF Comparer can be used to compare two PDF files and text files. PDF Comparer is able to find the..

$39.95

PDF DRM Protector Solution for Business

PDF DRM Protector Solution for Business

Use PDF DRM Protector Solution to protect PDF and Office Documents for your business. PDF DRM Protec..

$5,000.00

Excel Converter Command Line

Excel Converter Command Line

Excel Converter Command Line converts XLS, XLSX, ODS, XML spreadsheets in batch. Excel Converter T..

$79.95

DocVoicer (Text-To-Speech) Software

DocVoicer (Text-To-Speech) Software

DocVoicer is a Text-To-Speech (TTS) software to read Text, PDF, MS Office, OpenOffice, Web Page an..

$39.95

PDF Page Counter for All Sub-folders by PHP Script

PDF Page Counter for All Sub-folders by PHP Script

PDF Page Counter can be used to count the number of pages of all PDFs in current directory and all..

$59.95

Save
17%

PDFSearch Command Line Tool for Windows

PDFSearch Command Line Tool for Windows

pdfsearch is a Command Line Tool to search text in PDF files on Windows system. pdfsearch can be use..

$49.95 $59.95

PDF Repair Tool

PDF Repair Tool

PDF Repair Tool is the best software for repairing damaged PDF files. PDF Repair Tool is an effectiv..

$29.95

PDF Highlighter Command Line

PDF Highlighter Command Line

PDF Highlighter Command Line PDF Highlighter Command Line is a command line application which can b..

$299.00

Java PDFTools GUI

Java PDFTools GUI

Java PDFTools GUI is a Java Swing application that can combine, split, rotate, reorder, watermark,..

$39.95

PhotoSlicer software for big poster printing

PhotoSlicer software for big poster printing

PhotoSlicer cuts a raster image into pieces which can afterwards be printed out and assembled to a..

$39.95

Online PDF to Table Extractor (Online PDF to Excel Converter)

Online PDF to Table Extractor (Online PDF to Excel Converter)

Online PDF to Table Extractor (PDF to Excel Converter) is a Java application which can be used to ..

$9.95

PDF Stamper SDK

PDF Stamper SDK

Welcome to the PDF Stamper SDK. This SDK allows you to stamp barcodes, hyperlinks, images, lines a..

$199.00

Tags: merge pdf by python, pdf library, pdf sdk, pypdf, python, python pdf, python pdf library, python watermark pdf, source code

You Recently Viewed

Save
20%

DWG to Image Converter SDK for Developers Royalty Free

DWG to Image Converter SDK for Developers Royalty Free

DWG to Image Converter SDK for Developers Royalty Free is a Control Component and Windows DLL Librar..

$1,200.00 $1,495.00

Educational Maze Game for Kids - HTML5 Games

Educational Maze Game for Kids - HTML5 Games

Educational Maze Game for Kids is an Easy fun Maze for Preschool Toddler Kids. Maze puzzle is alwa..

$0.00

SaveAs PDF for WordPress Plugin

SaveAs PDF for WordPress Plugin

SaveAs PDF for WordPress is a plugin for WordPress application, it allows site visitors convert po..

$39.95

PHP Login and User Management

PHP Login and User Management

PHP Login and User Management is an User Registration & Login and User Management System With Admin ..

$29.95