Skip to main content

4.1.3 RDK X5 Model Zoo Usage Guide

Branch and System Requirements

RDK X5 uses the rdk_x5 branch as the main delivery branch. Recommended system version: RDK OS >= 3.5.0. Python samples in this branch uniformly use the hbm_runtime inference interface.

git clone https://github.com/D-Robotics/rdk_model_zoo.git
cd rdk_model_zoo
git checkout rdk_x5
tip

The rdk_x5 branch is the main delivery branch for RDK X5 and is recommended for priority use. The original main branch has been renamed to rdk_x5_legacy and is used only for historical demo archiving.

Repository Directory Structure

The rdk_x5 branch uses a standardized directory structure, organized by domain and model:

rdk_model_zoo/
|-- samples/
| `-- vision/
| |-- clip/ # Image-text multimodal matching
| |-- convnext/ # Image classification
| |-- edgenext/ # Image classification
| |-- efficientformer/ # Image classification
| |-- efficientformerv2/ # Image classification
| |-- efficientnet/ # Image classification
| |-- efficientvit/ # Image classification
| |-- fasternet/ # Image classification
| |-- fastvit/ # Image classification
| |-- fcos/ # Object detection
| |-- googlenet/ # Image classification
| |-- lprnet/ # License plate recognition
| |-- mobilenetv1/ # Image classification
| |-- mobilenetv2/ # Image classification
| |-- mobilenetv3/ # Image classification
| |-- mobilenetv4/ # Image classification
| |-- mobileone/ # Image classification
| |-- modnet/ # Image matting
| |-- paddleocr/ # OCR text detection and recognition
| |-- repghost/ # Image classification
| |-- repvgg/ # Image classification
| |-- repvit/ # Image classification
| |-- resnet/ # Image classification
| |-- resnext/ # Image classification
| |-- ultralytics_yolo/ # Detection, segmentation, pose, classification
| |-- ultralytics_yolo26/ # Detection, segmentation, pose, OBB, classification
| |-- vargconvnet/ # Image classification
| |-- yoloe/ # Instance segmentation
| |-- yolov5/ # Object detection
| `-- yoloworld/ # Open-vocabulary object detection
|-- docs/ # Project specs and reference documentation
|-- datasets/ # Datasets and download scripts
|-- tros/ # TROS integration guides and examples
`-- utils/ # Shared Python / C++ utilities

Individual Sample Structure

Each RDK X5 sample contains the following standardized directories:

sample_name/
|-- README.md # English documentation
|-- README_cn.md # Chinese documentation
|-- conversion/ # ONNX → HBM/BIN conversion configs
|-- evaluator/ # Accuracy and performance evaluation
|-- model/ # Pre-compiled .bin models + download scripts
|-- runtime/
| |-- python/ # Python inference (main.py, <model>.py, run.sh)
| `-- cpp/ # C++ inference (src/main.cc, CMakeLists.txt, run.sh)
`-- test_data/ # Test images and inference results

Inference Interface

The rdk_x5 branch Python samples uniformly use the hbm_runtime inference interface. For the complete interface reference, see RDK X5 hbm_runtime Python API Documentation.

C++ inference interface documentation: hb_dnn C/C++ Inference Interface Documentation 👉 Runtime Development Docs

hbm_runtime Basic Call Flow

Load Model

import hbm_runtime

model = hbm_runtime.HB_HBMRuntime("../../model/yolo11x_detect_bayese_640x640_nv12.bin")
model_name = model.model_names[0]
input_names = model.input_names[model_name]
output_names = model.output_names[model_name]
input_shapes = model.input_shapes[model_name]

Configure Scheduling Parameters

hbm_runtime supports specifying inference priority and BPU core:

model.set_scheduling_params(
priority={model_name: 0},
bpu_cores={model_name: [0]},
)

Command-line parameter equivalents:

--priority 0 --bpu-cores 0

Prepare Inputs

RDK X5 vision samples commonly use packed NV12 format input. The wrapper's pre_process() handles resize, BGR-to-NV12 conversion, data packing, etc.:

inputs = {
model_name: {
input_names[0]: input_array,
}
}

Run Inference

outputs = model.run(inputs)
raw_outputs = outputs[model_name]
output_tensor = raw_outputs[output_names[0]]

Model Zoo Wrapper Flow

RDK X5 samples follow the Config + Model + predict() pattern:

config = YOLOv5Config(
model_path="../../model/yolov5n_tag_v7.0_detect_640x640_bayese_nv12.bin",
classes_num=80,
score_thres=0.25,
nms_thres=0.45,
)

model = YOLOv5Detect(config)
model.set_scheduling_params(priority=0, bpu_cores=[0])
results = model.predict(image)

The wrapper executes in the following order:

  1. pre_process(): Generate model inputs
  2. forward(): Call hbm_runtime.run()
  3. post_process(): Parse detection boxes, classification results, segmentation masks, or pose keypoints
  4. predict(): Chain the complete flow

Quick Start

Run the Ultralytics YOLO11x Detection Sample

# Download model
cd samples/vision/ultralytics_yolo/model
wget -nc https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_x5/ultralytics_YOLO/yolo11x_detect_bayese_640x640_nv12.bin

# Run inference
cd ../runtime/python
python3 main.py \
--task detect \
--model-path ../../model/yolo11x_detect_bayese_640x640_nv12.bin \
--test-img ../../../../../datasets/coco/assets/bus.jpg \
--img-save-path ../../test_data/inference_yolo11x.jpg

Using run.sh for One-Click Execution

Each sample provides a run.sh script in its runtime/python/ and runtime/cpp/ directories for one-click environment setup, model download, and inference:

# Python inference
cd samples/vision/yolov5/runtime/python
bash run.sh

# C++ inference
cd samples/vision/yolov5/runtime/cpp
bash run.sh

Model Coverage

Image Classification

ModelSample Directory
ConvNeXtsamples/vision/convnext
EdgeNeXtsamples/vision/edgenext
EfficientFormersamples/vision/efficientformer
EfficientFormerV2samples/vision/efficientformerv2
EfficientNetsamples/vision/efficientnet
EfficientViTsamples/vision/efficientvit
FasterNetsamples/vision/fasternet
FastViTsamples/vision/fastvit
GoogLeNetsamples/vision/googlenet
MobileNetV1samples/vision/mobilenetv1
MobileNetV2samples/vision/mobilenetv2
MobileNetV3samples/vision/mobilenetv3
MobileNetV4samples/vision/mobilenetv4
MobileOnesamples/vision/mobileone
RepGhostsamples/vision/repghost
RepVGGsamples/vision/repvgg
RepViTsamples/vision/repvit
ResNetsamples/vision/resnet
ResNeXtsamples/vision/resnext
VargConvNetsamples/vision/vargconvnet

Object Detection

ModelSample Directory
FCOSsamples/vision/fcos
YOLOv5samples/vision/yolov5
Ultralytics YOLO (YOLOv5u / YOLOv8 / YOLOv9 / YOLOv10 / YOLO11 / YOLO12 / YOLO13)samples/vision/ultralytics_yolo
Ultralytics YOLO26samples/vision/ultralytics_yolo26

Instance Segmentation / Matting

ModelSample Directory
YOLOEsamples/vision/yoloe
MODNetsamples/vision/modnet

OCR / Recognition

ModelSample Directory
PaddleOCRsamples/vision/paddleocr
LPRNetsamples/vision/lprnet

Multimodal

ModelSample Directory
CLIPsamples/vision/clip
YOLOWorldsamples/vision/yoloworld

rdk_x5_legacy Branch

When using RDK X5 with the rdk_x5_legacy branch:

git clone https://github.com/D-Robotics/rdk_model_zoo.git
cd rdk_model_zoo
git checkout rdk_x5_legacy

After switching to rdk_x5_legacy, enter the target demo directory, read its README first, then follow the commands in that README.

For demos that use bpu_infer_lib_x5, install with:

wget -nc https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_x5/bpu_infer_lib_x5-1.0.3-py3-none-any.whl
pip install bpu_infer_lib_x5-1.0.3-py3-none-any.whl

Demos that use hobot_dnn.pyeasy_dnn use the board-builtin interface directly.

caution

rdk_x5_legacy is a legacy archive branch and is no longer actively maintained. New projects should use the rdk_x5 branch.

bpu_infer_lib_x5 and hobot_dnn.pyeasy_dnn have poor support for featuremap input models. If you need to use featuremap input models, please use the hbm_runtime inference interface from the rdk_x5 branch.

Shared Utilities (utils/)

The rdk_x5 branch provides the following shared Python utilities (utils/py_utils/):

Utility ModuleFunction
file_ioModel download, image loading, class name loading
preprocessBGR to NV12, resize (direct/letterbox), NV12 splitting
postprocessNMS, YOLO box/mask/keypoint/OBB decoding, coordinate scaling
visualizeDetection box, segmentation mask, rotated box, pose keypoint, classification result rendering
inspectSoC name detection, model info printing
nn_mathSigmoid, z-score normalization