VeryUtils ScanOCR is a simple OCR software for Windows, Mac and Linux systems, providing character recognition support for common image formats, and multi-page images and PDF files. The program has postprocessing which helps correct errors regularly encountered in the OCR process, boosting the accuracy rate on the result. The program can also function as a console application, executing from the command line.

Batch processing is now supported. The program monitors a watch folder for new image files, automatically processes them through the OCR engine, and outputs recognition results to an output folder.

ScanOCR does able to capture actionable data from any documents, from structured forms and surveys to unstructured text-heavy papers. If you have a scanner and want to avoid retyping your documents, ScanOCR is the fast, best way to do it.

ScanOCR has also a royalty-free OCR SDK for developers to use in their custom applications, please feel free to contact us if you are interest in ScanOCR SDK product.

ScanOCR Highlight Features:

  • Support Multi-platform (Java version only), include Windows, Solaris, Linux/Unix, Mac OS X, Others.
  • Support more input formats, include PDF, TIFF, JPEG, GIF, PNG, BMP image etc. formats.
  • Support Multi-page TIFF images.
  • Able to OCR Single and Multi-page PDF files.
  • Able to OCR scanned PDF files and normal PDF files.
  • Support Screenshots.
  • Able to OCR characters in selection box.
  • File drag-and-drop.
  • Paste image from clipboard.
  • Postprocessing for OCRed characters to boost accuracy rate.
  • Localized user interface for many languages (Localization project).
  • Integrated scanning support.
  • Watch folder monitor for support of batch processing.
  • Custom text replacement in postprocessing.
  • Spellcheck with Hunspell.
  • Support for downloading and installing language data packs and appropriate spell dictionaries.

Do you dread having to retype that document you are holding in your hand? If only you had the electronic file, your life would be so much easier. With ScanOCR software, you could easily and accurately convert that paper document into editable electronic text for use in any application including Word and WordPerfect.

Despeckle -- For those documents which are not particularly clear (i.e. faxes, copies of copies, …), ScanOCR provides a despeckle or "noisy document" option which increases ScanOCR's accuracy.

Plain Text Extraction -- Just need the plain text from the original document? No problem. ScanOCR can be set to recognize the characters and words but ignore the formatting. The resulting file is ready for your word processor or your HTML/web editor and your own custom formatting.

Simplified Error Correction -- Our text editor highlights suspected errors in the recognized text for easier correction. This simplifies the otherwise time-consuming task of proof reading the recognized text for errors.

Batch OCR -- Do you have several documents to OCR? Just point ScanOCR to them and it will OCR them from start to finish without delay.

Zone OCR -- Sometimes all you may need is to extract the text from a certain area in a document. Maybe one column. Maybe a footnote. Maybe just one paragraph. Unlike other OCR applications, ScanOCR can limits its OCR ability to a user defined area. There is no need to OCR an entire document only to use a small portion of it. With ScanOCR, OCR only what you need.

Input Formats -- ScanOCR works with all fully compliant TWAIN scanners and also accepts input from TIFF files.

Output Formats -- ScanOCR can save the documents it acquires in text formats (TXT and RTF) importable into most every program such as Word, WordPerfect, HTML editors, and e-mail programs, either fully formatted or as plain text. Additionally, it can save scanned documents in the industry standard TIFF format, a format as widely accepted as PDF files.

Multiple Language Recognition -- ScanOCR supports 20+ languages recognition.

Images to be OCRed should be scanned at resolution from at least 200 DPI (dot per inch) to 400 DPI in monochrome (black&white) or grayscale. Scanning at higher resolutions will not necessarily result in better recognition accuracy, which currently can be higher than 97% for characters. Even so, the actual rates still depend greatly on the quality of the scanned image. The typical settings for scanning are 300 DPI and 1 bpp (bit per pixel) black&white or 8 bpp grayscale uncompressed TIFF or PNG format.

The Screenshot Mode offers better recognition rates for low-resolution images, such as screen prints, by rescaling them to 300 DPI.

In addition to the built-in text postprocessing algorithm, you can add your own custom text replacement scheme via a UTF-8-encoded tab-delimited text file named x.DangAmbigs.txt, where x is the ISO639-3 language code. Both plain and Regex text replacements are supported.

Some built-in tools are provided to merge several images or PDF files into a single one for convenient OCR operations, or to split a TIFF or PDF file into smaller ones if it contains too many pages, which can cause out-of-memory exceptions.

How Postprocessing works in VeryUtils ScanOCR software?
The recognition errors can generally be classified into three categories. Many of the errors are related to the letter cases — for example: hOa, nhắC — which can be easily corrected by popular Unicode text editors. Many other errors are a result of the OCR process, such as missing diacritical marks, wrong letters with similar shape, etc. — huu -- hưu, mang -- marg, h0a -- hoa, la -- 1a, uhìu - nhìn. These can also be easily fixed by spell checker programs. The built-in Postprocessing function can help correct many of the aforementioned errors.

The last category of errors is the most difficult to detect because they are semantic errors, which means that the words are valid entries in the dictionary but are wrong in the context — e.g., tinh -- tình, vân -- vấn. These errors require the editor to read though and manually correct them according to the original image.

Following are instructions on how to correct the first two categories of OCR errors using the built-in functionality:

  • Group lines. The lines need to be grouped to the paragraph they belong, as being OCRed, each line becomes a separate 1-line paragraph. Use Remove Line Breaks function under Format menu. Note that this operation may not be needed for poems.
  • Select Change Case, also under Format menu, and choose Sentence case to correct most of the letter case errors. Locate and fix the rest of remaining letter case errors.
  • Correct the misspelled errors using the integrated Spell Check.

Through the above process, most of common errors can be eliminated. The remaining, semantic errors are few, but it requires a human editor to read though and make necessary edits to make the document like the original scanned document, and error-free if desired.

System Requirements

  • ScanOCR works on any version of windows, from Windows 95-10 and beyond!
  • Your scanner need only a TWAIN driver, the driver that comes with a majority of all scanners sold. In short, ScanOCR will most likely work with the PC and scanner you already have.
  • ScanOCR works on Mac and Linux systems.

If there is any questions, please feel free to contact us, we are glad to assist you asap.

Write a review

Note: HTML is not translated!
    Bad           Good
Captcha

ScanOCR

  • Product Code: MOD191107201229
  • Availability: In Stock
  • Viewed: 52280
  • Units Sold: 1
  • Sold By: eDoc Software
  • Seller Rating:
  • Seller Reviews: (0)
  • $29.95

  • Ex Tax: $29.95

Available Options


Related Products

Java PDFTools GUI

Java PDFTools GUI

Java PDFTools GUI is a Java Swing application that can combine, split, rotate, reorder, watermark,..

$39.95 Ex Tax: $39.95

PDF Comparer for Windows

PDF Comparer for Windows

PDF Comparer can be used to compare two PDF files and text files. PDF Comparer is able to find the..

$39.95 Ex Tax: $39.95

PDF Signer Software

PDF Signer Software

PDF Signer can be used to add your signature to PDF documents. The main function of PDF Signer is ..

$39.95 Ex Tax: $39.95

DocVoicer (Text-To-Speech) Software

DocVoicer (Text-To-Speech) Software

DocVoicer is a Text-To-Speech (TTS) software to read Text, PDF, MS Office, OpenOffice, Web Page an..

$39.95 Ex Tax: $39.95

OCR to Any Converter for Windows

OCR to Any Converter for Windows

OCR to Any Converter for Windows is a Windows desktop application which can be used to extract tex..

$29.95 Ex Tax: $29.95

Save
12%

DWG to PDF Converter Command Line

DWG to PDF Converter Command Line

AutoCAD DWG to PDF Converter Command Line is a DWG and DXF to PDF conversion tool, you can use it ..

$175.00 $199.00 Ex Tax: $175.00

Screen Capture & Screenshot Tool for Windows

Screen Capture & Screenshot Tool for Windows

VeryUtils Screen Capture is the ultimate Screen Capture Tool for Windows. You can use Screen Captu..

$29.95 Ex Tax: $29.95

PDF Margin Cropper (GUI + Command Line)

PDF Margin Cropper (GUI + Command Line)

PDF Margin Cropper Tool can be used to remove excessive white borders and margins around PDF pages..

$79.00 Ex Tax: $79.00

Image Watermark Software

Image Watermark Software

VeryUtils Image Watermark software does add watermarks to image files quickly. Batch watermark tho..

$39.95 Ex Tax: $39.95

Java PDF Reader Custom Build Service

Java PDF Reader Custom Build Service

Java PDF Reader (Windows, Mac, Linux) Java PDF Reader is Java Visual Component to Display PDF, Offi..

$5,000.00 Ex Tax: $5,000.00

DWG to Any Converter Command Line

DWG to Any Converter Command Line

DWG to Any Converter Command Line allows you to convert DWG and DXF files to PDF, EMF, WMF, JPEG, ..

$299.00 Ex Tax: $299.00

TrueType TTF Font to SVG Converter Command Line

TrueType TTF Font to SVG Converter Command Line

TrueType TTF Font to SVG Converter Command Line is a software that can convert a single character ..

$79.95 Ex Tax: $79.95

Save
17%

EMF to Vector Converter Command Line

EMF to Vector Converter Command Line

EMF to Vector Converter Command Line Software can be used to convert from EMF and WMF Metafile file..

$245.00 $295.00 Ex Tax: $245.00

Tags: scanocr, scan ocr, ocr software, ocr soft, ocrsoft, ocr scanner, ocr image, image ocr, tiff ocr, ocr tiff, ocr jpg, ocr png, ocr to text, ocr to word, ocr to rtf, ocr to excel, invoice ocr, extract text, text extraction, data extraction, data capture, ocr software, ocr pdf, ocr program, ocr scanned pdf