🎁 Limited time offer: Use Coupon Code "SAVE20NOW" To Get Instant 20% Discount on all VeryUtils products!

PDF to Text OCR Converter Command Line

PDF to Text OCR Converter Command Line utility that uses the best Optical Character Recognition (OCR) technology to convert PDF files and image files into fully text searchable PDF files and plain text files. This is the perfect tool for adding OCR data to existing scanned images or existing PDF files.

PDF Full Text OCR that is fast and affordable. PDF Full Text OCR is designed for batch processing with desktop or network scanners.

PDF to Text OCR Converter Command Line uses the best OCR technology to batch convert scanned documents to plain text files and searchable PDF files.

We have also a OCR to Any Converter Command Line software, with this software, you can convert from scanned documents to plain text, MS Word, Excel, HTML or searchable PDF Image + Hidden Text files. OCR to Any Converter Command Line software can also extract data from documents using zone OCR or by searching the full page text for matching patterns or a list of values. This data can be used to organize scanned documents automatically, and it can be exported to CSV, XML, or any database. It even works with MS Office documents and media files too!

https://veryutils.com/components/ocr/ocr-to-any-converter-command-line

PDF to Text OCR Converter Command Line features:

* Automatic rotation of pages based on the text content.
* Remove blank pages from existing files during conversion so they do not get included in the finished PDF.
* Convert existing image files (TIFF, JPG, BMP, PNG etc.) into Fully Text Searchable PDF files.
* Convert existing scanned PDF documents into Fully Text Searchable PDF files.
* Can be set-up as a scheduled task for automated processing.
* Create text file output of OCR data and extract text from text PDF.
* Set PDF Dublin Core Metadata properties such as Author, Title, Subject and Keywords.
* Control PDF view preferences and security (user/master password).
* Enable/Disable print for PDF files.
* Easy to integrate, no programming skills required just simple command line switches.
* Batch scan with any TWAIN or ISIS scanner (available on required).
* Create searchable Image + Text PDF files using OCR.
* Extract data from OCR or existing text in Office/PDF files with pattern matching.
* Recognize 36 1D and 2D barcode formats (available on required).
* Export captured data to XML, CSV, or any database (included in the OCR to Any Converter Command Line software).
* Process files from network scanners & copiers on an unattended server.
* Perform simple application integration via the command line.
* Intelligent OCR Processing. This can be controlled per input folder via the OCR profile and is available both for the "PDF to PDF" and the "PDF to TXT" processing.
* Advanced OCR (Optical Character Recognition) Engine.
* Support all Windows systems (XP/Vista/7/8/10 and later systems).
* Support PDF, JPEG, GIF, PNG, PICT, BMP, and most common image formats as input.
* Can convert to both editable text and searchable PDF.
* Support over 60 languages including English, German, French, Chinese, Japanese, and Spanish. Learn more
* Batch Conversion (convert multiple files as a batch).
* Silent Mode For Automaton Scripting (with PHP Folder Watcher software)

PDF to Text OCR Converter Command Line is a good choice for WebService. With a command line invocation PDF documents and image documents can be converted via a web service interface from any workstation via a central PDF to Text OCR Converter Command Line server (on the local network or the Internet) to searchable PDF or PDF/A. It is also possible to obtain only the recognized text to a file. The use of OCR server profiles, the OCR engine, processing parameters and Language can be controlled and selected.

OCR processor - Generates searchable PDF and PDF/A documents. PDF to Text OCR Converter Command Line is an OCR processor watching pre-defined folders, converting automatically new added or changed image documents to full text searchable PDF or PDF/A documents. This function is require the PHP Folder Watcher software which can be purchased from this web page,

https://veryutils.com/php-folder-watcher

PDF to Text OCR Converter Command Line works like a Windows Service, no user interface, the OCR processing works in the background.

The following OCR languages are supported:
Afrikaans (afr) Greek (ell) Odiya (ori)
Albanian (sqi) Gujarati (guj) Panjabi (pan)
Amharic (amh) Haitian (hat) Persian (fas)
Ancient Greek (grc) Hebrew (heb) Polish (pol)
Arabic (ara) Hindi (hin) Portuguese (por)
Assamese (asm) Hungarian (hun) Pushto (pus)
Azerbaijani (aze) Icelandic (isl) Romanian (ron)
Basque (eus) Indic (inc) Russian (rus)
Belarusian (bel) Indonesian (ind) Sanskrit (san)
Bengali (ben) Inuktitut (iku) Serbian (srp)
Bosnian (bos) Irish (gle) Sinhala (sin)
Bulgarian (bul) Italian (ita) Slovak (slk)
Burmese (mya) Japanese (jpn) Slovenian (slv)
Catalan (cat) Javanese (jav) Spanish (spa)
Cebuano (ceb) Kannada (kan) Swahili (swa)
Central Khmer (khm) Kazakh (kaz) Swedish (swe)
Cherokee (chr) Kirghiz (kir) Syriac (syr)
Chinese - Simplified (chi_sim) Korean (kor) Tagalog (tgl)
Chinese - Traditional (chi_tra) Kurukh (kru) Tajik (tgk)
Croatian (hrv) Lao (lao) Tamil (tam)
Czech (ces) Latin (lat) Telugu (tel)
Danish (dan) Latvian (lav) Thai (tha)
Dutch (nld) Lithuanian (lit) Tibetan (bod)
Dzongkha (dzo) Macedonian (mkd) Tigrinya (tir)
English (eng) Malay (msa) Turkish (tur)
Esperanto (epo) Malayalam (mal) Uighur (uig)
Estonian (est) Maltese (mlt) Ukrainian (ukr)
Finnish (fin) Marathi (mar) Urdu (urd)
Frankish (frk) Math/Equations (equ) Uzbek (uzb)
French (fra) Middle English (1100-1500) (enm) Vietnamese (vie)
Galician (glg) Middle French (1400-1600) (frm) Welsh (cym)
Georgian (kat) Nepali (nep) Yiddish (yid)
German (deu) Norwegian (nor)


System requirement

  • Windows 2000 / XP / Server 2003 / Vista / Server 2008 / 7 / 8 / Later systems of both 32 and 64-bit.
PDF to Text OCR Converter Command Line Options
PDF to Text OCR Converter Command Line
-------------------------------------------------------
Description:
  1. Convert text based PDF files to plain text files.
  2. Convert scanned PDF files and image files to plain text files and searchable PDF files by OCR technology.
  3. Convert embedded fonts in PDF file to a new searchable PDF file.
  4. Keep color during PDF, TIFF and image formats to searchable PDF files conversion.
  5. Deskew, Despeckle and Noise Removal, Auto-Orientation, Dithering, Black Border Removal.
Input formats:
  1. Text based PDF files
  2. Scanned PDF files
  3. Scanned single page and multi-page TIFF files
  4. Scanned JPEG, PNG, BMP, GIF, PCX, TGA, PBM, PNM, PPM files
Output formats:
  1. Plain text files without layout
  2. Plain text files with layout
  3. Plain text based PDF files (PDF is contain text only)
  4. Attach OCRed text layer to original PDF file
  5. OCRed BW PDF files with hidden text layer
  6. OCRed Color PDF files with hidden text layer
  7. OCRed Grayscale PDF files with hidden text layer
  8. Output to TIFF, PNG, BMP, TGA, GIF with Deskew, Despeckle, etc. options
-------------------------------------------------------
Usage: pdf2txtocr.exe [options] [PDF-file] [Text-file]
  -firstpage [int]      : first PDF page to convert
  -lastpage [int]       : last PDF page to convert
  -res [int]            : set resolution, the unit is DPI (default is 300 dpi)
  -ownerpwd [string]    : set owner password for encrypted PDF file
  -userpwd [string]     : set user password for encrypted PDF file
  -layout               : maintain original physical layout
  -layout2              : pdf to table conversion with Best Column Alignment
  -table                : same as -layout2
  -pdf2table            : same as -layout2
  -noc                  : don't insert page breaks 0x0C between pages in text file
  -bitcount [int]       : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8bit
  -rotate [int]         : rotate pages before OCR
  -threshold [int]      : lightness threshold that used to convert image to B&W, from 1 to 255, 0 is auto, default is -1
  -imageopt             : deskew and despeckle images automatically
  -dither [int]         : convert the color image to B&W using the desired method:
    -dither 0: Floyd-Steinberg
    -dither 1: Ordered-Dithering (4x4)
    -dither 2: Burkes
    -dither 3: Stucki
    -dither 4: Jarvis-Judice-Ninke
    -dither 5: Sierra
    -dither 6: Stevenson-Arce
    -dither 7: Bayer (4x4 ordered dithering)
  -resizewidth [int]    : resize the image's width, only availalbe when -resizeheight used
  -resizeheight [int]   : resize the image's height, only availalbe when -resizewidth used
  -flip                 : flip the image vertically
  -mirror               : mirror the image horizontally
  -ocr                  : enable OCR function for scanned PDF file
  -lang [string]        : choose the language for OCR engine
  -ocrmode [int]        : set OCR mode
    -ocrmode 0: output to text file
    -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
    -ocrmode 2: output to plain text based PDF file
    -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
    -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
  -text [string]        : add additional text at end of each text page, this parameter supports the following variables:
    %PageNumber%: current page number
    %PageCount% : total page count of PDF file
  -outboxfile           : output [X, Y, Width, Height] information for each word when OCR
  -producer [string]    : Set 'producer' to output PDF file
  -creator [string]     : Set 'creator' to output PDF file
  -subject [string]     : Set 'subject' to output PDF file
  -title [string]       : Set 'title' to output PDF file
  -author [string]      : Set 'author' to output PDF file
  -keywords [string]    : Set 'keywords' to output PDF file
  -ownerpwdout [string] : Set 'owner password' to output PDF file
  -openpwdout [string]  : Set 'open password' to output PDF file
  -keylen [int]         : Key length (40 or 128 bit)
        -keylen 0:  40 bit RC4 encryption (Acrobat 3 or higher)
        -keylen 1: 128 bit RC4 encryption (Acrobat 5 or higher)
        -keylen 2: 128 bit RC4 encryption (Acrobat 6 or higher)
  -encryption [int]     : Restrictions
        -encryption    0: Encrypt the file only
        -encryption 3900: Deny anything
        -encryption    4: Deny printing
        -encryption    8: Deny modification of contents
        -encryption   16: Deny copying of contents
        -encryption   32: No commenting
        ===128 bit encryption only -] ignored if 40 bit encryption is used
        -encryption  256: Deny FillInFormFields
        -encryption  512: Deny ExtractObj
        -encryption 1024: Deny Assemble
        -encryption 2048: Disable high res. printing
        -encryption 4096: Do not encrypt metadata
  -$ [string]           : input your License Key
Examples:
  pdf2txtocr.exe C:\in.pdf C:\out.txt
  pdf2txtocr.exe -firstpage 1 -lastpage 1 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ocr -res 300 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ownerpwd 123 -userpwd 456 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -layout C:\in.pdf C:\out.txt
  pdf2txtocr.exe -layout2 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -table C:\in.pdf C:\out.txt
  pdf2txtocr.exe -pdf2table C:\in.pdf C:\out.txt
  pdf2txtocr.exe -noc C:\in.pdf C:\out.txt
  pdf2txtocr.exe C:\in.tif C:\out.txt
  pdf2txtocr.exe C:\in.jpg C:\out.txt
  pdf2txtocr.exe C:\in.bmp C:\out.txt
  pdf2txtocr.exe C:\in.png C:\out.txt
  pdf2txtocr.exe -ocr -lang eng C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ocr -bitcount 1 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ocr -bitcount 8 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ocr -bitcount 24 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ocr -lang deu C:\in.pdf C:\out.txt
  pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
  pdf2txtocr.exe -text "PageText %PageNumber% of %PageCount%" C:\in.pdf C:\out.txt
  pdf2txtocr.exe -subject "subject" C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ownerpwdout 123 -keylen 2 -encryption 3900 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -subject "subject" -title "title" C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang eng -ocrmode 0 C:\in.pdf C:\out.txt
  pdf2txtocr.exe -ocr -lang deu -ocrmode 1 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang eng -ocrmode 2 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang eng -ocrmode 3 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang eng -ocrmode 2 -outboxfile C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang fra -ocrmode 1 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang ita -ocrmode 1 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang nld -ocrmode 1 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocr -lang spa -ocrmode 1 C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -bitcount 24 -ocrmode 4 -ocr C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -bitcount 8 -ocrmode 4 -ocr C:\in.pdf C:\out.pdf
  pdf2txtocr.exe -ocrmode 4 -ocr C:\in.tif C:\out.pdf
  pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.tif C:\out.pdf
  pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.tif C:\out.pdf

Process image files with Deskew, Despeckle and Noise Removal, Black Border Remova options:
  pdf2txtocr.exe -imageopt C:\in.tif C:\out.tif
  pdf2txtocr.exe -imageopt -rotate 45 C:\in.png C:\out.tif
  pdf2txtocr.exe -imageopt -rotate 90 C:\in.png C:\out.tif
  pdf2txtocr.exe -imageopt -threshold 0 C:\in.tif C:\out.bmp
  pdf2txtocr.exe -threshold 240 C:\in.tif C:\out.bmp
  pdf2txtocr.exe -dither 0 C:\in.bmp C:\out.png
  pdf2txtocr.exe -dither 7 C:\in.bmp C:\out.png
  pdf2txtocr.exe -imageopt -resizewidth 800 -resizeheight 600 C:\in.gif C:\out.tga
  pdf2txtocr.exe -imageopt -flip C:\in.png C:\out.gif
  pdf2txtocr.exe -imageopt -mirror C:\in.tif C:\out.pcx
  pdf2txtocr.exe -imageopt C:\in.bmp C:\out.tif

Following command line will OCR all PDF files in D:\temp\ folder to text files:
  for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr -lang deu "%F" "%~dpnF.txt"

Following command line will OCR all PDF files in D:\temp\ folder and subdirectories to text files:
  for /r D:\temp %F in (*.pdf) do pdf2txtocr.exe -ocr "%F" "%~dpnF.txt"

Following command line will OCR all PDF files from D:\temp\ folder and output text files to C:\test folder:
  for %F in (D:\temp\*.pdf) do pdf2txtocr.exe -ocr "%F" "C:\test\%~nF.txt"

Write a review

Note: HTML is not translated!
    Bad           Good
Captcha

PDF to Text OCR Converter Command Line

  • Brand: VeryPDF
  • Product Code: MOD190220223326
  • Availability: In Stock
  • Viewed: 51372
  • Sold By: VeryPDF
  • Seller Rating:
  • Seller Reviews: (1)
  • $195.00


Available Options


Related Products

Desktop Search

Desktop Search

VeryUtils Desktop Search software allows you to quickly find files and specific contents stored on..

$79.95

Java PDFTools GUI

Java PDFTools GUI

Java PDFTools GUI is a Java Swing application that can combine, split, rotate, reorder, watermark,..

$39.95

Java PDF Reader Custom Build Service

Java PDF Reader Custom Build Service

Java PDF Reader (Windows, Mac, Linux) Java PDF Reader is Java Visual Component to Display PDF, Offi..

$5,000.00

BatchPrint for Windows

BatchPrint for Windows

BatchPrint is a batch printing software for Windows to batch print multiple documents in different f..

$49.95

Photo Watermark Command Line

Photo Watermark Command Line

Photo Watermark Command Line is a very powerful tool for manipulating and combining images. You ca..

$19.95

PHP Email Extractor

PHP Email Extractor

PHP Email Extractor - the Best tool for extracting any email address. Extract email addresses from..

$19.95

PHP Invoice Generator

PHP Invoice Generator

PHP Invoice Generator - PHP Class For Beautiful PDF Invoices, it supports HTML Templates also. PH..

$49.95

PhotoSlicer software for big poster printing

PhotoSlicer software for big poster printing

PhotoSlicer cuts a raster image into pieces which can afterwards be printed out and assembled to a..

$39.95

PDF to Word Converter

PDF to Word Converter

PDF to Word Converter is a Windows desktop software for Windows users. It allows you easily and qu..

$39.95

PDF Comparer for Windows

PDF Comparer for Windows

PDF Comparer can be used to compare two PDF files and text files. PDF Comparer is able to find the..

$39.95

PDF Virtual Printer SDK Based on Postscript Printer Driver for Developer Royalty Free

PDF Virtual Printer SDK Based on Postscript Printer Driver for Developer Royalty Free

PDF Virtual Printer SDK Based on Postscript Printer Driver for Windows Developers Royalty Free PDF ..

$1,500.00

PDF Object Editor

PDF Object Editor

PDF Object Editor is a Low-level PDF Editor, it is a PDF Inspector which can be used to inspect th..

$59.95

Save
17%

PDF to Word OCR Converter

PDF to Word OCR Converter

PDF to Word OCR Converter is a tool that can convert both text based PDF files and scanned PDF files..

$49.95 $59.95

PDF Toolkit Command Line Tools & Utilities

PDF Toolkit Command Line Tools & Utilities

PDF Toolkit Command Line Tools & UtilitiesPDF Toolkit Command Line gives you a wide range of profess..

$299.00

Tags: extract text from pdf, ocr, ocr pdf, ocr to excel, ocr to powerpoint, ocr to ppt, ocr to word, optical character recognition, pdf ocr, pdf text extractor, pdf to text, pdf to txt, scan to excel, scan to text, scan to word, windows ocr