Vision Language Model
Introduction
This section describes how to experience on-device Vision Language Model (VLM) on the RDK platform. Thanks to the excellent results of InternVL and SmolVLM, we have achieved quantization and deployment on the RDK platform. This demo leverages the powerful KV Cache management in llama.cpp, combined with the computational advantages of the RDK platform's BPU module, to enable local VLM model deployment.
Code repository: (https://github.com/D-Robotics/hobot_llamacpp.git)
Supported Platforms
| Platform | OS / Method | Demo Functionality |
|---|---|---|
| RDK X5, RDK X5 Module | Ubuntu 22.04 (Humble) | On-device Vision Language Model |
| RDK S100, RDK S100P | Ubuntu 22.04 (Humble) | On-device Vision Language Model |
Supported Models
| Model Name | Para | Platform | Image Encoder | Language Encoder and Decoder |
|---|---|---|---|---|
| InternVL2_5 | 1B | X5 | vit_model_int16_v2.bin | Qwen2.5-0.5B-Instruct-Q4_0.gguf |
| InternVL2_5 | 1B | S100 | vit_model_int16.hbm | Qwen2.5-0.5B-Instruct-Q4_0.gguf |
| InternVL3 | 1B | X5 | vit_model_int16_VL3_1B_Instruct_X5.bin | qwen2_5_q8_0_InternVL3_1B_Instruct.gguf |
| InternVL3 | 1B | S100 | vit_model_int16_VL3_1B_Instruct.hbm | qwen2_5_q8_0_InternVL3_1B_Instruct.gguf |
| InternVL3 | 2B | X5 | vit_model_int16_VL3_2B_Instruct.bin | qwen2_5_1.5b_q8_0_InternVL3_2B_Instruct.gguf |
| InternVL3 | 2B | S100 | vit_model_int16_VL3_2B_Instruct.hbm | qwen2_5_1.5b_q8_0_InternVL3_2B_Instruct.gguf |
| SmolVLM2 | 256M | X5 | SigLip_int16_SmolVLM2_256M_Instruct_MLP_C1_UP_X5.bin | SmolVLM2-256M-Video-Instruct-Q8_0.gguf |
| SmolVLM2 | 256M | S100 | SigLip_int16_SmolVLM2_256M_Instruct_S100.hbm | SmolVLM2-256M-Video-Instruct-Q8_0.gguf |
| SmolVLM2 | 500M | X5 | SigLip_int16_SmolVLM2_500M_Instruct_MLP_C1_UP_X5.bin | SmolVLM2-500M-Video-Instruct-Q8_0.gguf |
| SmolVLM2 | 500M | S100 | SigLip_int16_SmolVLM2_500M_Instruct_S100.hbm | SmolVLM2-500M-Video-Instruct-Q8_0.gguf |
Preparation
RDK Platform
- RDK should be flashed with the Ubuntu 22.04 system image.
- TogetheROS.Bot must be successfully installed on the RDK.
- Install package
sudo apt update
sudo apt install tros-humble-hobot-llamacpp
- System Setup
Use command srpi-config for setting ION memory larger than 1.6GB, then restart and setup the maximum CPU frequency.
-
For RDK X5:
-
srpi-configUsage Performance Options -
CPU Frequency: CPU Frequency Setting
-
-
For RDK S100:
-
srpi-configUsage Performance Options -
CPU Frequency: CPU Frequency Setting
-
Usage
RDK Platform
Use the srpi-config command to set the ION memory size to 1.6GB. For details, refer to the Performance Options section in the RDK User Manual.
After rebooting, set the CPU maximum frequency to 1.5GHz and the scheduling mode to performance with the following commands:
sudo bash -c 'echo 1 > /sys/devices/system/cpu/cpufreq/boost'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor'
Currently, two demo modes are provided: direct terminal input (image and text), or subscribing to image and text messages and publishing the results as text.
InternVL
Before running the program, download the model files to the working directory with the following commands:
# Download model files
wget https://huggingface.co/D-Robotics/InternVL2_5-1B-GGUF-BPU/blob/main/Qwen2.5-0.5B-Instruct-Q4_0.gguf
wget https://huggingface.co/D-Robotics/InternVL2_5-1B-GGUF-BPU/blob/main/rdkx5/vit_model_int16_v2.bin
source /opt/tros/humble/setup.bash
cp -r /opt/tros/${TROS_DISTRO}/lib/hobot_llamacpp/config/ .
ros2 run hobot_llamacpp hobot_llamacpp --ros-args -p feed_type:=0 -p image:=config/image2.jpg -p image_type:=0 -p user_prompt:="Describe the image." -p model_file_name:=vit_model_int16_v2.bin -p llm_model_name:=Qwen2.5-0.5B-Instruct-Q4_0.gguf
After starting the program, you can use a local image and custom prompt for output.

SmolVLM
Before running the program, download the model files to the working directory with the following commands:
# Download model files
wget https://huggingface.co/D-Robotics/SmolVLM2-256M-Video-Instruct-GGUF-BPU/resolve/main/rdkx5/SigLip_int16_SmolVLM2_256M_Instruct_MLP_C1_UP_X5.bin
wget https://huggingface.co/D-Robotics/SmolVLM2-256M-Video-Instruct-GGUF-BPU/resolve/main/SmolVLM2-256M-Video-Instruct-Q8_0.gguf
source /opt/tros/humble/setup.bash
cp -r /opt/tros/${TROS_DISTRO}/lib/hobot_llamacpp/config/ .
ros2 run hobot_llamacpp hobot_llamacpp --ros-args -p feed_type:=0 -p model_type:=1 -p image:=config/image2.jpg -p image_type:=0 -p user_prompt:="Describe the image." -p model_file_name:=SigLip_int16_SmolVLM2_256M_Instruct_MLP_C1_UP_X5.bin -p llm_model_name:=SmolVLM2-256M-Video-Instruct-Q8_0.gguf
After starting the program, you can use a local image and custom prompt for output.

Notes
Ensure the development board more than 1.6GB size of ION memory, otherwise the model may fail to load.