Imaging and Document-Management

With recent improvements in scanning and imaging equipment, some offices are starting to look at the possibility of archiving documents electronically rather than in paper form. Even if this is not envisaged, there is a still frequent need to convert "legacy" paper documents and forms for use with modern media such as wordprocessors or email.

Hardware for basic Imaging of documents need not be expensive. Scanners are available from as little as £50 which will do a perfectly adequate job for small batches.  For larger batches, automatic sheet-feeding is a great benefit, and this will involve a more costly scanner. When reviewing scanners, don't be led-on by claims for silly DPI resolution-figures. 600DPI or above will meet all normal needs. Any more is just "specmanship." For larger batches, speed of scanning becomes a key factor, and this should be taken into account when choosing a model.

A scanner will produce its output as an image (picture) usually in a graphic-file format such as TIFF, JPG, BMP or GIF. This is not the same thing as a document - A fact which is often not properly understood, and which leads to all sorts of misunderstandings and errors. This "bitmap" copy of the document will be much larger than the equivalent wordprocessor file, usually of several megabytes. It can be printed as-is, however it cannot be edited in a wordprocessor. Furthermore, because of the sheer size of such raw scans, attempting to store them on a fileserver may soon clog that server's storage-capacity.

To convert this "bitmap" image of the paper form into an editable electronic document calls for OCR (Optical Character Recognition) software. Tip: Forget the OCR packages that come bundled with scanners -they're mostly a waste of time. Buy a professional-grade OCR solution if you're at all serious about electronic document storage.

Highly recommended OCR software is Abbyy FineReader. This new package has taken the market by storm, causing a major upset for the established OCR suppliers. It's a real revelation to use, being able to recognise even rough faxes with very few errors. What's more, in most cases it can preserve the original spatial layout of the text on the page, something that was previously viewed as impossible.

If buying a packaged document-management system or specialised scanner, always ask for a demonstration in a real working environment, and ensure that it can actually do what you need.  It's all too easy to get caught in the trap of buying equipment with a very high paper specification, but which cannot actually perform the functions you need. As an cautionary tale, let's take the firm which ordered a £12,000 combined scanner/printer/fax machine, with a view to using this for electronic document archival. (The machine in question had, inevitably, been recommended by a salesman with an eye to a commission, and  who presumably had not the foggiest idea what electronic filing actually involved.)

On examining the newly-uncrated machine, it was realised that whilst it did everything claimed of it, the only means of saving scans electronically was.. on a floppy disk. So, were the staff to transfer thousands of scanned images to floppies, take those floppies one-by-one to a computer for OCR conversion? You bet they weren't. It was a total waste of £12000, purely because the feasibility of the solution hadn't been properly checked before purchase.  Not only that, it was a prodigious waste of money on a project which could have been comfortably completed with gear costing no more than £1500-£3000.
 


Synopsis:
 

  • Review your objectives carefully before ordering anything.

  •  
  • Ask for a live demo before committing yourself to a packaged system.

  •  
  • For scanners, speed is more important than specification.

  •  
  • There is currently only one OCR package worth considering.

  •  
  • Document management software can be of assistance, but choose carefully, and remember that programming support will almost certainly be needed.

  •