Pytessaract: OCR Tool in Python

Pytessaract: ORC Tool in Python

Prepare: Tesseract

Tesseract install manual

In Linux

sudo apt install tesseract-ocr
sudo apt install libtesseract-dev

# usage:
tesseract 123.pmg 123.txt

Pytessaract

Reference: © Sandun Amarathunga

pip install Pytessaract
import cv2
import pytesseract

img = cv2.imread(“images/002.png”) # read an image

text = pytesseract.image_to_string(img) # extract text
print(text)

Other languages support

Github

When you don’t have the language model:

tesseract -l chi_sim test.png  test.txt
Error opening data file /usr/share/tesseract-ocr/4.00/tessdata/chi_sim.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'chi_sim'
Tesseract couldn't load any languages!
Could not initialize tesseract.
cd /usr/share/tesseract-ocr/4.00/
mv tessdata tessdata_bc
git clone https://github.com/tesseract-ocr/tessdata.git
Author

Karobben

Posted on

2022-08-02

Updated on

2024-01-11

Licensed under

Comments