Skip to main content

Text Detection and Recognition - PaddleOCR

S100 only

This example is only applicable to RDK S100. The RDK S600 image does not include the corresponding hbm models, and the relevant sample code is only released with the system image on the S100; it is not supported on the S600.

This example runs the PaddleOCR model for text detection and recognition based on the BPU inference engine, supporting OCR recognition and visualization in Chinese scenarios. The sample code is located in the /app/cdev_demo/bpu/08_OCR_sample/01_paddleOCR/ directory.

Model Description

  • Introduction:

    This example implements Chinese text detection and recognition (two-stage OCR) based on PaddleOCR v3. The overall process includes detecting text regions (detection model) and recognizing text content region by region (recognition model).

  • HBM Model Names:

  • Detection Model: cn_PP-OCRv3_det_infer-deploy_640x640_nv12.hbm

  • Recognition Model: cn_PP-OCRv3_rec_infer-deploy_48x320_rgb.hbm

  • Input Format:

    • Detection Model: BGR image → resized to 640×640, converted to NV12 format (Y and UV separate)

    • Recognition Model: Rotated and cropped BGR text block image → resized to 48×320, normalized, converted to RGB format

  • Output:

    • Detection Model: Segmentation probability map (1×1×H×W), post-processed to obtain text box coordinates

    • Recognition Model: Logits of character tokens, decoded via CTC to obtain the recognized text string

Functionality Description

  • Model Loading

    Loads the text detection and recognition models and parses input/output-related information.

  • Input Preprocessing

    • Detection Model: Resizes the original image to 640×640 and converts it to NV12 format (for BPU inference).

    • Recognition Model: Resizes each rotated and cropped text block to 48×320, converts it to RGB format, normalizes it, and finally transforms it into an NCHW structure.

  • Inference Execution

    Calls the .infer() method for forward inference, outputting a probability map (detection) and logits (recognition).

  • Result Post-Processing

    • Detection Model:

      • Binarizes the probability map (using a set threshold)

      • Finds contours of text regions and expands them

      • Extracts rotated bounding boxes and crops the image regions

    • Recognition Model:

      • Decodes the logits using CTCLabelDecode to map them to text strings

    Finally, the recognition results are annotated in red text on a blank canvas and visualized alongside the original image.

Environment Dependencies

Before compiling and running, ensure the following dependencies are installed:

sudo apt update
sudo apt install -y libgflags-dev libpolyclipping-dev

Directory Structure

.
|-- CMakeLists.txt # CMake build script: targets/dependencies/include paths/link libraries
|-- FangSong.ttf # Chinese font (for rendering recognized text on the visualization canvas)
|-- README.md # Usage instructions (this file)
|-- inc
| `-- paddleOCR.hpp # OCR encapsulation header: detection/recognition class interfaces (load/preprocess/inference/postprocess)
`-- src
|-- main.cc # Program entry: parse arguments → detect → crop → recognize → visualize → save
`-- paddleOCR.cc # Concrete implementation: polygon box generation, cropping, CTC decoding, text rendering

Compiling the Project

  • Configuration and Compilation
    mkdir build && cd build
    cmake ..
    make -j$(nproc)

Model Download

If the models are not found when running the program, you can download them using the following commands:

# Detection model
wget https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_s100/paddle_ocr/cn_PP-OCRv3_det_infer-deploy_640x640_nv12.hbm
# Recognition model
wget https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_s100/paddle_ocr/cn_PP-OCRv3_rec_infer-deploy_48x320_rgb.hbm

Parameter Description

Parameter NameDescriptionDefault Value
--det_model_pathPath to the text detection model (.hbm)/opt/hobot/model/s100/basic/cn_PP-OCRv3_det_infer-deploy_640x640_nv12.hbm
--rec_model_pathPath to the text recognition model (.hbm)/opt/hobot/model/s100/basic/cn_PP-OCRv3_rec_infer-deploy_48x320_rgb.hbm
--test_imagePath to the input test image/app/res/assets/gt_2322.jpg
--label_filePath to the recognition label file/app/res/labels/ppocr_keys_v1.txt
--thresholdBinarization threshold for text regions (for detection post-processing)0.5
--ratio_primeExpansion factor for text boxes (for detection post-processing, affects polygon expansion)2.7

Quick Run

  • Run the model

    • Ensure you are in the build directory
    • Use default parameters:
      ./paddleOCR
    • Run with specified parameters:
      ./paddleOCR \
      --det_model_path /opt/hobot/model/s100/basic/cn_PP-OCRv3_det_infer-deploy_640x640_nv12.hbm \
      --rec_model_path /opt/hobot/model/s100/basic/cn_PP-OCRv3_rec_infer-deploy_48x320_rgb.hbm \
      --test_image /app/res/assets/gt_2322.jpg \
      --label_file /app/res/labels/ppocr_keys_v1.txt \
      --threshold 0.5 \
      --ratio_prime 2.7
  • View the results

    After successful execution, the results will be drawn on the original image and saved as build/result.jpg.

    [Saved] Result saved to: result.jpg

Notes

  • The output result is saved as result.jpg, which you can view.

  • For more information on deployment methods or model support, please refer to the official documentation or contact platform technical support.