A little over eleven years ago I had a need for an OCR library for a project at work. The commercial ones we had available weren’t doing what we needed (such as “not crash”, “read characters accurately”, and “be reasonably priced”) so I did a little digging around and found something called ‘OCRchie‘, a Computer Science project at Berkeley.
The source code was available, so I took a copy and gave it a try. It was a mix of (as I recall) C++ and Tcl/Tk (for the graphical front end).
I didn’t need a front end, and saw some easy optimizations (block allocation of memory once for the entire image instead of for each raster, for example). In ripping out the front end and the optimizations, I ended up reimplementing the engine.
In the end, I had the tool I needed. I rolled it into a library (Windows DLL) that I could connect to from my Delphi program. Given the small dictionary I needed — I only needed to read card numbers in a single font — this was pretty workable. I had sent my work back to the professor who had directed the original project. He made it available to other people via the project web site, along with my notes.
If I were to do it over today, I can see a number of things I would do differently… but I no longer have a need for this particular tool.
However, someone else does. He managed to track me down via this website (persistent!) and sent me a note today. I had the fairly distinct pleasure of being reminded of something I’d done that long ago.
I’ve reviewed the code and was reminded that when I wrote it, it wasn’t necessary to specify the std:: namespace when using the standard library. Apart from that, the entire thing builds cleanly. There are a few things I would do differently code-wise now (namely how I was handling inline functions and class members, and I might look for a replacement for my macro-based ‘properties’ hack), but I’m still reasonably satisfied with the code.
Thanks Faliakis, this was fun.
I did document the use of the libraries created by this project, but that was associated with the project at work and I no longer have access to it. I was allowed to keep the code since it was written largely on my own time, but the documentation was written on contracted time.
Okay, this is an area I would certainly do differently now. I think the code is quite clear, the obvious documentation just reflects what is evident (to me) in the code, but I’d still do a better job of documenting it. However, in the interest of posterity — and of getting back to what I was planning to do tonight 🙂 — I’m going to let it stand as it is. If anyone has any questions they’re welcome to contact me via the contact form, and if I find there’s enough interest I’ll flesh out the documentation.
In short, though, the ITK library provides the ability to read, write, and perform limited transformations on TIFF images. Very limited, I was working only with TIFF images and only with bitonal images. Greyscale, color, and paletted images have no real support in this library. There is groundwork in some places to support broader color depths, but since I had no need for them I never implemented them.
The code in this directory is primarily to create a library. There is a test program that will take an input bitonal TIFF file and rotated it by tenths of degrees from -19.9 through 19.9 degrees. This was used primarily to generate test files to be used by the OCR library in the other directory.
The OCR library here was pretty straightforward.
Given an input file and a dictionary mapping bitmaps of glyphs (very low-resolution, 5×6) it would deskew the image, segment it into lines and characters, examine each character and generate a new glyph, then look for the best fit in the loaded dictionary and print that out.
Given a reference input bitonal image and a file containing the characters matching the glyphs that would be found in the reference image, it would generate the same glyphs, but rather than output the matching text from the dictionary, would add the glyph bitmaps and matching text to the dictionary.
In this case, the reference image was the test.tif image from the ITK directory.
I hope someone can find this useful, or at least interesting. It was a simple library and OCR engine, far from a complete package, but I didn’t need a complete package. If you do find it useful or have any comments, please share them.