IndicPhotoOCR Documentation
End-2-End OCR Class
- class IndicPhotoOCR.ocr.OCR(device='cuda:0', identifier_lang='hindi', verbose=False)[source]
Bases:
object
Optical Character Recognition (OCR) pipeline for text detection, script identification, and text recognition.
- Parameters:
device (str) – Device to use for inference (‘cuda:0’ or ‘cpu’).
identifier_lang (str) – Default script identifier model to use. Valid options: [‘hindi’, ‘bengali’, ‘tamil’, ‘telugu’, ‘malayalam’, ‘kannada’, ‘gujarati’, ‘marathi’, ‘punjabi’, ‘odia’, ‘assamese’, ‘urdu’, ‘meitei’, ‘auto’].
verbose (bool) – Whether to print detailed processing information.
- detect(image_path)[source]
Detect text regions in the input image.
- Parameters:
image_path (str) – Path to the image file.
- Returns:
Detected text bounding boxes.
- Return type:
list
- identify(cropped_path)[source]
Identify the script of the text present in a cropped image.
- Parameters:
cropped_path (str) – Path to the cropped image containing text.
- Returns:
Predicted script language of the text in the image.
- Return type:
str
- ocr(image_path)[source]
Perform end-to-end OCR: detect text, identify script, and recognize text.
- Parameters:
image_path (str) – Path to the input image.
- Returns:
Recognized text with corresponding bounding boxes.
- Return type:
dict
- recognise(cropped_image_path, script_lang)[source]
Recognize text in a cropped image using the identified script model.
- Parameters:
cropped_image_path (str) – Path to the cropped image.
script_lang (str) – Identified script language. Valid options: [‘hindi’, ‘bengali’, ‘tamil’, ‘telugu’, ‘malayalam’, ‘kannada’, ‘gujarati’, ‘marathi’, ‘punjabi’, ‘odia’, ‘assamese’]
- Returns:
Recognized text.
- Return type:
str
- visualize_detection(image_path, detections, save_path=None, show=False)[source]
Visualize and optionally save the detected text bounding boxes on an image.
- Parameters:
image_path (str) – Path to the image file.
detections (list) – List of bounding boxes.
save_path (str, optional) – Path to save the output image.
show (bool) – Whether to display the image.
Detection
- class IndicPhotoOCR.detection.textbpn.textbpnpp_detector.TextBPNpp_detector(model_name='textbpnpp', backbone='resnet50', device='cpu')[source]
Bases:
object
- detect(image_path)[source]
Detect text regions in the input image.
- Parameters:
image_path (str) – Path to the image file.
- Returns:
- Detected text bounding boxes in the format:
{“detections”: [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], …]}
- Return type:
dict
- visualize_detections(image_path, bbox_result_dict, output_path='output.png')[source]
Visualize and save detected text bounding boxes on an image.
- Parameters:
image_path (str) – Path to the input image.
bbox_result_dict (dict) – Dictionary containing detected text bounding boxes. Format: {“detections”: [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], …]}.
output_path (str) – Path to save the visualized image (default: “output.png”).
- Returns:
None
Recognition
- class IndicPhotoOCR.recognition.parseq_recogniser.PARseqrecogniser[source]
Bases:
object
- bstr(language, image_dir, save_dir)[source]
Runs the OCR model to process images and save the output as a JSON file.
- Parameters:
checkpoint (str) – Path to the model checkpoint file.
language (str) – Language code (e.g., ‘hindi’, ‘english’).
image_dir (str) – Directory containing the images to process.
save_dir (str) – Directory where the output JSON file will be saved.
- Example usage:
python your_script.py –checkpoint /path/to/checkpoint.ckpt –language hindi –image_dir /path/to/images –save_dir /path/to/save
- bstr_onImage(language, image_path)[source]
Runs the OCR model to process images and save the output as a JSON file.
- Parameters:
checkpoint (str) – Path to the model checkpoint file.
language (str) – Language code (e.g., ‘hindi’, ‘english’).
image_dir (str) – Directory containing the images to process.
save_dir (str) – Directory where the output JSON file will be saved.
- Example usage:
python your_script.py –checkpoint /path/to/checkpoint.ckpt –language hindi –image_dir /path/to/images –save_dir /path/to/save
- recognise(checkpoint: str, image_path: str, language: str, verbose: bool, device: str) str [source]
Loads the desired model and returns the recognized word from the specified image.
- Parameters:
checkpoint (str) – Path to the model checkpoint file.
language (str) – Language code (e.g., ‘hindi’, ‘english’).
image_path (str) – Path to the image for which text recognition is needed.
- Returns:
The recognized text from the image.
- Return type:
str
Script Identification
- class IndicPhotoOCR.script_identification.vit.vit_infer.VIT_identifier[source]
Bases:
object
A class for script identification using a ViT (Vision Transformer) model.
- identify(image_path, model_name, device)[source]
Identifies the script in a given image using a ViT model.
- Parameters:
image_path (str) – Path to the input image.
model_name (str) – Name of the model to be used.
device (int) – Device to run the model on (e.g., 0 for GPU, -1 for CPU).
- Returns:
The predicted script label.
- Return type:
str
- predict_batch(image_dir, model_name, time_show, output_csv='prediction.csv')[source]
Processes a batch of images in a directory and predicts the script for each image.
- Parameters:
image_dir (str) – Directory containing images.
model_name (str) – Name of the model to be used.
time_show (bool) – Whether to print processing time.
output_csv (str, optional) – Path to save the predictions as a CSV file. Defaults to “prediction.csv”.
- Returns:
The output CSV file path containing predictions.
- Return type:
str