PDF Software

How to extract text and text coordinates from a PDF file? PDF Parsing with Text and Coordinates. PDF Text Extraction with Coordinates.

I want to extract all the text boxes and text box coordinates from a PDF file. I would like to extract text from a portion (using coordinates) of PDF page, can anyone help me out?

Given a PDF file, output should look something like:

   489, 41,  "Signature"
   500, 52,  "b"
   630, 202, "a_g_i_r"

Customer #1  
-----------------------------------------------
Hi,

I was wondering if anyone could recommend a program which can extract the starting (top left) coordinates (x,y) of each word in a PDF file (and the end if possible). Ideally output would be in a format that could be easily inserted into a database.

Customer #2
-----------------------------------------------

image
Sometimes, we have some customers who want to extract text contents and their positions from PDF pages, the text positions are used to parse the values, such as read invoice numbers from PDF files or looking for some other information.

PDF Extractor SDK (PDF Parser SDK and Command Line) is a good product to extract various information from PDF files, of course, it can extract text contents and text coordinates also.

1. You may download the trial version of PDF Extractor SDK (PDF Parser SDK and Command Line) from this web page first,

https://veryutils.com/pdf-extractor-sdk-pdf-parser-sdk-and-command-line

2. After you download it, you may unzip it to a folder.

3. Please run a CMD window first, if you don't know how to run a CMD window, please look at following web page,

https://veryutils.com/blog/top-10-methods-to-run-a-command-line-window-in-windows-10/

4. pdfextract.exe is a command line application, it supports following command line options,

D:\VeryPDF_PDFExtractTool>pdfextract.exe
pdfextract.exe version 3.0
Copyright 1996-2017 VeryPDF.com Inc.
Product Name: VeryPDF PDF Extract Tool Command Line
http://www.verypdf.com
http://www.verydoc.com
http://support.verypdf.com
Email: support@verypdf.com
Usage: pdfextract.exe [options] <PDF-file>
  -f <int>           : first page to print
  -l <int>           : last page to print
  -opw <string>      : owner password (for encrypted files)
  -upw <string>      : user password (for encrypted files)
  -outfolder <string>: Set a folder to store extracted files
  -layout            : maintain original physical layout
  -textfile          : Extract text contents from PDF file
  -textpos           : Extract text and coordinates from PDF file
  -nopgbrk           : don't insert page breaks between pages
  -h                 : print usage information
  -help              : print usage information
  --help             : print usage information
  -?                 : print usage information
  -$ <string>        : input your license key
Example:
   pdfextract.exe D:\in.pdf
   pdfextract.exe -outfolder D:\out\ D:\in.pdf
   pdfextract.exe -outfolder D:\out\ D:\in.pdf
   pdfextract.exe -opw 123 -upw 456 -outfolder D:\out\ D:\in.pdf
   pdfextract.exe -outfolder D:\out\ D:\in.pdf > out.log
   pdfextract.exe -outfolder D:\out\ D:\in.pdf out.log
   pdfextract.exe D:\in.pdf out.log
   pdfextract.exe -textpos D:\in.pdf D:\out.txt
   pdfextract.exe -textpos -nopgbrk D:\in.pdf D:\out.txt
   pdfextract.exe -textfile D:\in.pdf D:\out.txt
   pdfextract.exe -layout -textfile D:\in.pdf D:\out.txt

5. You can simple run following command line to extract all information from your PDF file,

pdfextract.exe -outfolder D:\VeryUtils\test\ D:\downloads\Test_in.pdf

6. You will find a "TextFileWithPosition.txt" file in the "D:\VeryUtils\test" folder, this text file contains all text contents and coordinates for each word, such as,

image

7. "PageContents.xml" is a XML file which contain coordinates for each character, such as,

image

8. Now, you can write a simple PHP or Python application to read and parse X/Y positions from these PDF files, then you can process these PDF files easily.

image

If you wish extract more information from PDF files, such as hyperlinks, colorspaces, attachments, bookmarks, pictures, embedded fonts, forms, etc. elements, please feel free to contact us, we are glad to assist you asap,

https://veryutils.com/contact

PDF Software, Utilities

How to integrate a EMF/PDF/Image Virtual Printer Driver into your developed applications?

VeryUtils EMF Printer Driver is a virtual printer driver for Windows 2000, Windows XP and later systems, which allows you to create EMF (Enhanced Meta File) and WMF (Windows Meta File - the old version of EMF) vector images from any Windows application which supports printing. Additionally, the EMF Printer Driver also supports PDF format, Postscript format and more than 100 raster formats (PNG, JPG, TIFF, BMP, etc.) in case you don't need vector images.

With VeryUtils EMF Printer Driver, you can export to EMF files (Enhanced-Format Metafiles) from any printable applications, you can develop your own application based on VeryUtils EMF Virtual Printer SDK easily. VeryUtils EMF Printer Driver is designed for Win 9X/2K/XP/2003/Vista/7 32 and 64 bit systems.

How VeryUtils EMF Printer Driver works?
Print any file to VeryUtils Virtual Printer (you can request us to custom the printer name) simply, then EMF file(s) and a ini file will be created automatically. Your application can receive full name of file(s) exported, Job Title and page size from the ini file.

How to embed VeryUtils EMF Printer Driver SDK into your product?
You'll get full version after purchasing, no limitation, no message box about our company, and you can request us add custom notice message (OPTIONAL).

1. Embed SDK to your installation packet.
2. Simply extract the SDK exe while installing, save to temporary folder and execute the exe with below command line while installing.

"x:\temp folder\setupx64.exe" /VERYSILENT

Please by following steps to test the VeryUtils EMF Printer Driver,

1. Please download VeryUtils EMF Printer Driver from this web page,

https://veryutils.com/emf-pdf-image-virtual-printer-driver-sdk

2. After you download and unzip it to a folder, please run setupx64.exe application to install EMF Printer into your system, you will see a "VeryPDF EMF Printer" or "VeryPDF PDF Printer" appear in the Printer&Fax folder after a few seconds.

image

3. You may run "swaprun.exe" to monitor the Printer Queue prior to printing,

4. Now, you can run a Windows application, such as Chrome, print a HTML page to "VeryPDF EMF Printer",

image

5. You will see following message box after a few seconds, this message box will be suppressed in the purchased version,

image

These EMF flies are generated EMF files in the temporary folder,

image

6. You will get a final PDF file after a few seconds also,

image

This is a screenshot of final PDF file,

image

7. You can integrate EMF Printer into your product to extend the printing capabilities easily, with it, you can print any documents to EMF, PDF, PS and Image formats, then you can import these files into your software for further processing.

VeryUtils EMF Virtual Printer Driver SDK, allows you to integrate Virtual Printer and Document Converting features into your own application. Print any document then export PDF, TIFF, JPG, PNG, GIF, BMP, TGA, PCX, TXT, EMF or SPL format ( .SPL, Print Spooling File ) from VeryUtils EMF Virtual Printer Driver.

https://veryutils.com/emf-pdf-image-virtual-printer-driver-sdk

PDF Software, Utilities

VeryUtils Virtual Metafile EMF Printer Driver SDK for Windows Royalty Free

VeryUtils Metafile EMF Printer Driver is a Windows Virtual Printer software, it can convert any printable files into Enhanced Metafiles (EMF) files, PDF files and Image files. Because the EMF format is a vector format, metafiles are used primarily by applications that require further processing of the printed documents. The Metafile EMF printer driver can also extract ASCII text from a printed file in addition to generating EMF output.

VeryUtils Metafile EMF Printer Driver can be purchased from VeryUtils Platform on this web page,

https://veryutils.com/emf-pdf-image-virtual-printer-driver-sdk

VeryUtils Metafile EMF printer driver is a Royalty Free product, it allows developers to bundle and distribute the EMF Printer Driver as part of their own application with no per user fees. We can also provide custom-build service for Metafile EMF Virtual Printer, such as change the default printer name, and more PDF related functions, etc., all royalty free.

image

Virtual Metafile EMF Printer Driver Features:
* Print to searchable PDF;
* Print to image (BMP, TIFF, JPEG, PNG);
* Print to text (ANSI, UTF-8 or Unicode);
* Can act as a print server with shared printing, supports terminal services and works in a domain;
* Print job redirection to hardware printer;
* Print job management: document modification, cancel printing;
* Add watermarks to documents with many configuration options;
* Upload files using FTP/FTPS/SFTP;
* ESC/POS receipt parser (virtual POS printer);
* Early Access: allows to obtain converted files right after User start print a document;
* N-Up feature: allows to print 2, 4, 6, 9 or 16 pages per sheet;
* MSI installer with full source code;
* Supported OS (both x86 and x64): Windows XP, Windows Server 2003, Windows Server 2008R2, Windows Server 2012, Windows Vista, Windows 7, Windows 8/8.1, Windows 10.

If you are a software developer, our Metafile EMF Virtual Printer SDK will help you to:

1. Generate an output in the form of standard raster or vector formats from your program (or from any other software application meant to produce printing forms).
Supported output formats:
* EMF
* PDF
* TIFF with various compressions including CCITT fax compression. Virtual printer also supports the special fax – resolutions such as 204×98 and 204×196 DPI.
* JPEG, BMP, PNG
* Plain text in different encodings (ANSI, UTF-8 or Unicode)
* PostScript (without converters from PS to other formats).

2. Redirect the print job to other printer. Thus, when sending the document to the printer, you can save it in the set format (PDF, BMP, JPEG, TIFF, PNG, TXT) and print it in paper form on the physical printer at the same time.

3. Modify a virtual-printed document before sending it for actual printing, for example, you can add a demo watermark, company copyright text message, company logo, etc.

4. Import documents from other applications. Imported documents can be converted to your format with the use of an EMF format.

EMF (Enhanced MetaFile)
EMF (Enhanced MetaFile) and raw are terms for spool file formats used in printing by the Windows operating system. When a print job is sent to the printer, if it is already printing another file, the computer reads the new file and stores it, usually on the hard disk or in memory, for printing at a later time. Spooling allows multiple print jobs to be given to the printer at one time.

The EMF format is the 32-bit version of the original Windows metafile (WMF) format. The EMF format was created to solve the deficiencies of the WMF format in printing graphics from sophisticated graphics programs. The EMF format is device-independent. This means that the dimensions of a graphic are maintained on the printed copy regardless of the resolution in dots per inch of the printer. In a network, the smaller file size of the EMF format reduces network traffic. EMF is the spool file used by the Windows operating system.

A raw spool file is a one that is sent to the Windows spooler unprocessed (which is why it's called "raw"). The raw file is used to send Postscript commands to a Postscript printer. The Postscript commands are understood by the printer, but are just plain data to the Windows spooler. The raw format is device-dependent and slower. If printing problems occur while using the EMF format, they can sometimes be fixed by simply changing the format to "raw" in the printer Properties.

PDF Software, Photo Software

Convert TIFF to PDF files in batch using VeryUtils TIFF to PDF Converter Command Line software

VeryUtils TIFF to PDF Converter Command Line software is a Windows application which can be used to batch convert lots of TIFF files to PDF files, it's a standalone application and without require Adobe Acrobat and Adobe Reader installed in your system.

The TIFF file format is undoubtedly used widely in industries related to graphics editing, faxing and printing. But there are times when you might want to access the image in a more portable and standard format like PDF.

image

So how can you convert your TIFF file into PDF? Problematic and chaotic task ahead? Not at all! Simply seek the services of TIFF to PDF Converter Command Line software from VeryUtils and easily convert TIFF to PDF in batch.

1. The first, you may download VeryUtils TIFF to PDF Converter Command Line software from this web page,

https://veryutils.com/tiff-to-pdf-converter-command-line

2. After you download and unzip tiff-to-pdf-cmd.zip package to a folder, you may run a Command Line Window, if you don't know how to run a Command Line window, please look at following article,

https://veryutils.com/blog/top-10-methods-to-run-a-command-line-window-in-windows-10/

3. In the cmd window, please go to the folder where tiff2pdf.exe exist, you can run following command lines to convert your multipage TIFF file to multipage PDF file,

tiff2pdf.exe -$ XXXX-XXXX-XXXX-XXXX -p A4 -o _out_A4.pdf multipage.tif
tiff2pdf.exe -$ XXXX-XXXX-XXXX-XXXX -p A3 -o _out_A3.pdf multipage.tif
tiff2pdf.exe -$ XXXX-XXXX-XXXX-XXXX -p A4 -F -f -o _out_A4_fit.pdf multipage.tif

This is a screenshot of converted PDF file,

image

tiff2pdf.exe supports following command line options,

TIFF Tools Command Line Software
Copyright (c) 1988-2029 VeryUtils, Inc.
https://veryutils.com

usage:  tiff2pdf [options] input.tiff
options:
-$: set your license key
-o: output to file name
-j: compress with JPEG
-q: compression quality
-n: no compressed data passthrough
-d: do not compress (decompress)
-i: invert colors
-u: set distance unit, 'i' for inch, 'm' for centimeter
-x: set x resolution default in dots per unit
-y: set y resolution default in dots per unit
-w: width in units
-l: length in units
-r: 'd' for resolution default, 'o' for resolution override
-p: paper size, e.g. "letter", "legal", "A4"
-F: make the tiff fill the PDF page
-f: set PDF "Fit Window" user preference
-e: date, overrides image or current date/time default, YYYYMMDDHHMMSS
-c: sets document creator, overrides image software default
-a: sets document author, overrides image artist default
-t: sets document title, overrides image document name default
-s: sets document subject, overrides image image description default
-k: sets document keywords
-b: set PDF "Interpolate" user preference
-h: usage

You can call tiff2pdf.exe from PHP on your server to convert user uploaded TIFF files to PDF files easily, for example,

<?php
$output = shell_exec('D:\\VeryUtils\\tiff2pdf.exe -$ XXXX-XXXX-XXXX-XXXX -p A4 -o D:\\VeryUtils\\_out_A4.pdf D:\\VeryUtils\\multipage.tif');
echo "<pre>$output</pre>";
?>

PDF Software

VeryUtils PDF Viewer OCX is a standalone embeddable PDF Viewer OCX for Windows developers

PDF Viewer SDK ActiveX v4.0
Platform : Windows 10, Windows 8, Vista, Windows 7, XP

PDF Viewer SDK ActiveX is PDF Viewer SDK, fast open PDF file, support PDF Printing, searching the text with C++ , C#, VB.Net , VB6, Delphi, Vfp, MS Access. PDF Viewer OCX is the best PDF reader, helps you view, print, search keywords on PDF pages securely. You can zoom in, zoom out and rotate PDF pages. Besides, the navigation panes including Bookmarks, Thumbnails, etc. make it easy to use.

https://veryutils.com/pdf-viewer-ocx-component

VeryUtils PDF Viewer OCX is a standalone embeddable PDF Viewer OCX for Windows developers. Windows developers can build a customer interface for viewing/printing pdf documents using Visual Basic, VC, Delphi or any other programming languages which support ActiveX controls. PDF documents can be loaded from Streams or disk files. If you need to embed a PDF Viewer Control into your Windows application, VeryUtils PDF Viewer ActiveX control will be a best choice to you.

image

PDF Viewer SDK ActiveX Highlight Features:
* Support very fast open PDF.
* Support go to specific page when open multipage PDF file.
* Support open protected PDF file.
* Support zoom in and zoom out the PDF file with good quality.
* Support searching the text on multipage PDF file.
* Support export the pages or specific page to bitmap files and scale the to specific size.
* Support Print a PDF with Print Dialog.
* Support Print a PDF, select Print, set Page Range, Page Orientation programmatically.
* Support Get the default Printer Name.
* Support Print event, you will know how many pages printed and when is finished.
* Support Rotate at specific degree of PDF.
* Support view Unicode content, including Chinese, Japanese, Arabic and Hebrew.
* Provide User Define Display Area.
* User Define Zoom in or Zoom out value.
* Include C# 2019, C# 2010, VB.NET 2019, VB.Net 2010, Visual Basic, Visual Basic Script (vbs), Visual C/C++, Visual Foxpro, Delphi, Access, Web Page Sample Code.
* Compatible with any programming language that supports ActiveX (Access, Visual C/C++, Visual Basic, Visual Foxpro, Delphi, .Net, etc.).
* Royalty free distribution of the OCX file.
* Every PDF viewing or Full-screen viewing.
* Select a certain page or range of pages for viewing.
* View PDF files inside Mozilla Firefox, Internet Explorer or Google Chrome as long as it supports OCX/ActiveX control.
* Zoom in and out with any PDF file with handy keyboard shortcuts or typical mouse-based controls.
* Quickly and simply rotate any page by 90 degrees.
* Simply use the keyboard shortcut or click in the Find box to search for any word or phrase.
* Provide intelligent display panes, including Pages, Bookmarks, etc.

Welcome use PDF Viewer SDK ActiveX Control, this SDK ActiveX Control for c#, Visual Basic VB.Net, Visual FoxPro ,Delphi, Visual C++, .Net and Compatible with any programming language that supports ActiveX. The PDF Viewer ActiveX control requires the following minimum configuration:

Windows 98/ME/NT/2000/XP or Windows Vista, Windows 7, Windows 8, Windows 10.

With PDF Viewer SDK ActiveX Control let your application support display mulitpage PDF, go to specific page, Rotate at any degree, Zoom In, Zoom Out, Print the PDF. Export each page to Bitmap file and Scale to specific size and search the text and highlight match the text.

PDF Viewer SDK ActiveX Control Reference:
------------------------------
BOOL OpenPDF(LPCTSTR lpszPDFFile, LPCTSTR lpszUserPwd, LPCTSTR lpszOwnerPwd);
void ClosePDF();
long RunCommand(long nCode, long nPara1, long nPara2, long nPara3);
void SetFindText(LPCTSTR lpszFindText);
void SetViewMode(long nViewMode);
void RotateViewLeft();
void RotateViewRight();
void ViewNextPage();
void ViewPreviousPage();
void ViewFirstPage();
void ViewLastPage();
void ViewPage();
void FindPreviousText();
void FindNextText();
void ZoomFitPage();
void ZoomActualPage();
void ZoomFitWidth();
void Zoom(float nZoom);
void ViewModeSinglePage();
void ViewModeFacing();
void ViewModeContinuous();
void ViewModeContinuousFacing();
void ShowHideBookmarks();
void SetRegCode(LPCTSTR lpszRegCode);
void ZoomIn();
void ZoomOut();
void ViewGotoPage(long nPageIndex);
long SetGotoPageNumber(long nPageIndex);
long GetCurrentPage();
long GetPageTotalCount();
float GetCurrentZoom();
void SetMsgCallbackWnd(long hMsgWnd);
long FlattenPDF(LPCTSTR lpszInPDF, LPCTSTR lpszOutPDF);
BOOL EnableAnnotations(BOOL bEnable);
BOOL OpenPDFFromMem(long lpPDFData, long nPDFDataLen, LPCTSTR lpszUserPwd, LPCTSTR lpszOwnerPwd);
long EnableLaunchLink(long bEnable);
long EnableMouseWhellInFacingMode(long bEnable);
long GetScrollBar();
void ViewModeBook();
long ShowScrollBar(BOOL bShow);
BOOL IsScrollBarShown();
long ShowContextMenu(BOOL bShow);
BOOL IsContextMenuShown();
void AboutBox();
------------------------------

If you have any question for this product, please feel free to let us know, we are glad to assist you asap.

https://veryutils.com/contact