IndicPhotoOCR Documentation

End-2-End OCR Class

class IndicPhotoOCR.ocr.OCR(device='cuda:0', identifier_lang='hindi', verbose=False)[source]

Bases: object

Optical Character Recognition (OCR) pipeline for text detection, script identification, and text recognition.

Parameters:
  • device (str) – Device to use for inference (‘cuda:0’ or ‘cpu’).

  • identifier_lang (str) – Default script identifier model to use. Valid options: [‘hindi’, ‘bengali’, ‘tamil’, ‘telugu’, ‘malayalam’, ‘kannada’, ‘gujarati’, ‘marathi’, ‘punjabi’, ‘odia’, ‘assamese’, ‘urdu’, ‘meitei’, ‘auto’].

  • verbose (bool) – Whether to print detailed processing information.

detect(image_path)[source]

Detect text regions in the input image.

Parameters:

image_path (str) – Path to the image file.

Returns:

Detected text bounding boxes.

Return type:

list

identify(cropped_path)[source]

Identify the script of the text present in a cropped image.

Parameters:

cropped_path (str) – Path to the cropped image containing text.

Returns:

Predicted script language of the text in the image.

Return type:

str

ocr(image_path)[source]

Perform end-to-end OCR: detect text, identify script, and recognize text.

Parameters:

image_path (str) – Path to the input image.

Returns:

Recognized text with corresponding bounding boxes.

Return type:

dict

recognise(cropped_image_path, script_lang)[source]

Recognize text in a cropped image using the identified script model.

Parameters:
  • cropped_image_path (str) – Path to the cropped image.

  • script_lang (str) – Identified script language. Valid options: [‘hindi’, ‘bengali’, ‘tamil’, ‘telugu’, ‘malayalam’, ‘kannada’, ‘gujarati’, ‘marathi’, ‘punjabi’, ‘odia’, ‘assamese’]

Returns:

Recognized text.

Return type:

str

visualize_detection(image_path, detections, save_path=None, show=False)[source]

Visualize and optionally save the detected text bounding boxes on an image.

Parameters:
  • image_path (str) – Path to the image file.

  • detections (list) – List of bounding boxes.

  • save_path (str, optional) – Path to save the output image.

  • show (bool) – Whether to display the image.

Detection

class IndicPhotoOCR.detection.textbpn.textbpnpp_detector.TextBPNpp_detector(model_name='textbpnpp', backbone='resnet50', device='cpu')[source]

Bases: object

detect(image_path)[source]

Detect text regions in the input image.

Parameters:

image_path (str) – Path to the image file.

Returns:

Detected text bounding boxes in the format:

{“detections”: [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], …]}

Return type:

dict

visualize_detections(image_path, bbox_result_dict, output_path='output.png')[source]

Visualize and save detected text bounding boxes on an image.

Parameters:
  • image_path (str) – Path to the input image.

  • bbox_result_dict (dict) – Dictionary containing detected text bounding boxes. Format: {“detections”: [[[x1, y1], [x2, y2], [x3, y3], [x4, y4]], …]}.

  • output_path (str) – Path to save the visualized image (default: “output.png”).

Returns:

None

Recognition

class IndicPhotoOCR.recognition.parseq_recogniser.PARseqrecogniser[source]

Bases: object

bstr(language, image_dir, save_dir)[source]

Runs the OCR model to process images and save the output as a JSON file.

Parameters:
  • checkpoint (str) – Path to the model checkpoint file.

  • language (str) – Language code (e.g., ‘hindi’, ‘english’).

  • image_dir (str) – Directory containing the images to process.

  • save_dir (str) – Directory where the output JSON file will be saved.

Example usage:

python your_script.py –checkpoint /path/to/checkpoint.ckpt –language hindi –image_dir /path/to/images –save_dir /path/to/save

bstr_onImage(language, image_path)[source]

Runs the OCR model to process images and save the output as a JSON file.

Parameters:
  • checkpoint (str) – Path to the model checkpoint file.

  • language (str) – Language code (e.g., ‘hindi’, ‘english’).

  • image_dir (str) – Directory containing the images to process.

  • save_dir (str) – Directory where the output JSON file will be saved.

Example usage:

python your_script.py –checkpoint /path/to/checkpoint.ckpt –language hindi –image_dir /path/to/images –save_dir /path/to/save

recognise(checkpoint: str, image_path: str, language: str, verbose: bool, device: str) str[source]

Loads the desired model and returns the recognized word from the specified image.

Parameters:
  • checkpoint (str) – Path to the model checkpoint file.

  • language (str) – Language code (e.g., ‘hindi’, ‘english’).

  • image_path (str) – Path to the image for which text recognition is needed.

Returns:

The recognized text from the image.

Return type:

str

Script Identification

class IndicPhotoOCR.script_identification.vit.vit_infer.VIT_identifier[source]

Bases: object

A class for script identification using a ViT (Vision Transformer) model.

identify(image_path, model_name, device)[source]

Identifies the script in a given image using a ViT model.

Parameters:
  • image_path (str) – Path to the input image.

  • model_name (str) – Name of the model to be used.

  • device (int) – Device to run the model on (e.g., 0 for GPU, -1 for CPU).

Returns:

The predicted script label.

Return type:

str

predict_batch(image_dir, model_name, time_show, output_csv='prediction.csv')[source]

Processes a batch of images in a directory and predicts the script for each image.

Parameters:
  • image_dir (str) – Directory containing images.

  • model_name (str) – Name of the model to be used.

  • time_show (bool) – Whether to print processing time.

  • output_csv (str, optional) – Path to save the predictions as a CSV file. Defaults to “prediction.csv”.

Returns:

The output CSV file path containing predictions.

Return type:

str