Which OCR toolset is good and why : A comparative study


  • Pooja Jain Panjab University, Chandigarh
  • Kavita Taneja Panjab University, Chandigarh
  • Harmunish Taneja DAV college, Sector-10, Chandigarh




ABBYY finereader, Calamari, Google Docs, OCR, Tesseract


Optical Character Recognition (OCR) is a very active research area in many scientific disciplines like pattern recognition, natural language processing (NLP), computer vision, biomedical informatics, machine learning and artificial intelligence. This computational technology extracts the text in editable format ( MS Word/Excel, text files etc.) from  PDF files, scanned  or hand-written documents, images ( photographs, advertisements etc.) for further processing and has been utilized in many real world applications including banking, education, insurance, finance, healthcare and keyword based search in documents etc. Many OCR toolsets are available under various categories including open source, proprietary and online services. This research paper provides a comparative study of various OCR toolsets considering a variety of parameters.

