Why Every Business Needs OCR

Your boss gives you a hardcopy of a company document that needs updating. Your client hands you a printed magazine article and asks you to create an editable text version. You receive an electronic image of a brochure and need to update the text.

What do all these situations have in common? They could all involve you spending hours retyping manually and correcting typos. Or you could take a more modern approach and convert any and all of them into a digital format with fully editable text in a matter of minutes.

All you need is a scanner or digital camera (to create an image file of any printed document) or an electronic image (if you’ve already got a .PDF, .jpg, .eps, .png or similar file, you’re in business), and Optical Character Recognition (OCR) software, like the OCR software that comes standard in Foxit PhantomPDF software.

What is OCR?

OCR is a software technology that enables you to convert scanned document into documents with “live text,” aka readable, searchable text that you can change, copy, edit and basically do anything you regularly do to text.

How does OCR work?

There are two methods used for OCR: Matrix matching (the simpler and more common) and feature extraction.

Matrix Matching compares what your OCR software detects as a character with a library of character templates. When it finds a match, bingo! The OCR software matches that image to its corresponding ASCII character.

Feature Extraction is OCR that uses computer intelligence to look for general features such as open areas, closed shapes, diagonal lines, line intersections, etc. It’s a much more versatile method, but it has more requirements for a successful outcome, such as a clean, straight image and minimum 300-dpi resolution. Matrix matching can still work well on less-than-ideal images and it’s what’s most common in PDF software like PhantomPDF.

Advantages of OCR

From faster searches and easier editing to saving digital and physical storage space, you’ll find many benefits to using OCR software to turn document images into searchable, editable text:

  • Au revoir retyping – Unless you’re a fan of extra time at the keyboard recreating documents that exist in printed or scanned format, you’ll love the time savings you get when converting those image files into searchable, editable text via OCR.
  • Speedy digital searches – By converting scanned text into a word processing file, OCR lets you search through documents using keywords or phrases. Got a few hundred invoices? Let your PC search for the client name you need faster than you can say “coffee break.”
  • Typing new text – If you need that image of a document to function like real text, where you can add new paragraphs, copy and paste, edit out an old reference, etc., OCR lets you do it. It’s ideal for everything from updating contracts to making changes to your archive of family recipes.
  • Saving space – If you’ve got reams of paper documents taking up space in your office, you can scan them into PDF files with the confidence that your OCR software will let you retrieve any of the text you need to work with, whenever you may need it. Goodbye big file cabinets, hello tidy little CDs of archived documents.
  • Accessibility – If you or someone you know is vision-impaired, OCR software can help turn books, magazines and other printed documents into accessible files that they can listen to with the help of a combination of word processing software and computer voice-over utilities.

So why not use the power of OCR in your PDF software to increase efficiency in your office? Once you start using it, you’re guaranteed to find numerous ways to use it. And you’ll wonder how you ever worked without it.

To learn more about how to scan and OCR documents, visit the Foxit PhantomPDF product page.

Advertisements

PDF/A Makes Long-term Document Archiving a Snap

For most documents, PDF is usually all you need to enable everyday readability and sharing. If you need to ensure those documents are readable over the long-term, however, PDF/A should be your go-to format in your PDF software. Here’s why.

What is PDF/A?

PDF/A is an ISO-standardized version of the Portable Document Format (PDF) created specifically to enable you to preserve electronic documents over the long haul. PDF/A ensures that any documents you archive will keep their appearance and readability, regardless of which PDF software applications or systems you use to create them. You’ll get more predictable, consistent results from PDF/A, because of its ISO-standard compliance, than you will from plain PDF. That’s exceptionally useful if you’re expecting those documents to be accessed far into the future.

How is PDF/A different from PDF?

PDF/A eliminates a few features that don’t work well for long-term archiving.

While standard PDF doesn’t embed fonts into the file, PDF/A requires font embedding to ensure that you’ll be able to read text. That means if you’re planning to archive your document, it’s important to choose fonts that can be embedded into PDF, such as Times, Helvetica and Arial, and save fancy fonts that can’t be embedded for other uses.

Likewise, audio, video, and transparency features, encryption, and JavaScript, and executable file launches are a no-go in PDF/A because they aren’t suited to long-term archiving. PDF/A files need to be 100% self-contained, so they require that all components necessary to make them readable and viewable are embedded within them. This helps ensure that all information you need to display the document is intact so that it looks the same any time it’s opened. That means text, raster images, vector graphics, fonts, and color information go where your PDF/A file goes to ensure documents look the way they’re supposed to look, no matter which PDF software you use to create or open them.

Also, while it can be handy to have a lot of different software applications that can create PDFs today, it can lead to issues. With its ISO standardization, PDF/A helps ensure that a wider variety of PDF readers will be able to read your file in the distant future. It also provides a user interface for reading embedded annotations, ensuring that notes are retained, so you can use PDF annotate (commenting) features. And ISO-standard PDF/A adheres to color management guidelines that PDF doesn’t always, which helps guarantee that documents opened in 2053 look the way they looked in 2013.

If you need to sign documents, PDF/A does support digital signatures but there’s a caveat. They have to conform to PDF/A requirements for visual appearance. That means that when you sign documents, you have to use embeddable fonts and device-independent color in the signature. Not all commercial digital signatures tools follow these requirements so make sure you check documentation first.

When should you use PDF/A?

Any time you anticipate that the document you’re creating is going to be archived for a long time, and you want to ensure its look and feel is intact when it’s opened, you should use PDF/A.

That means everyday documents, such as spreadsheets, presentations and reports, that you use and change frequently are probably fine saved as PDF. But electronic documents that you’re preserving for posterity—such as historical, legal, government and court records—could benefit from the PDF/A archival format.

How does PDF/A compare to PDF/E and PDF/X? That’s a topic for upcoming articles, so stay tuned.

To learn more about how to create and validate PDF/A documents, visit the Foxit PhantomPDF product page.