5.2.10 Vision Language Model
Introduction
This section describes how to experience on-device Vision Language Model (VLM) on the RDK platform. Thanks to the excellent results of InternVL2_5-1B, we have achieved quantization and deployment on the RDK platform. This demo leverages the powerful KV Cache management in llama.cpp, combined with the computational advantages of the RDK platform's BPU module, to enable local VLM model deployment.
Code repository: (https://github.com/D-Robotics/hobot_llamacpp.git)
Supported Platforms
Platform | OS / Method | Demo Functionality |
---|---|---|
RDK X5 (4GB RAM) | Ubuntu 22.04 (Humble) | On-device Vision Language Model |
Note: Only supported on RDK X5 with 4GB RAM.
Preparation
RDK Platform
- RDK must be the 4GB RAM version.
- RDK should be flashed with the Ubuntu 22.04 system image.
- TogetheROS.Bot must be successfully installed on the RDK.
Usage
RDK Platform
Before running the program, download the model files to the working directory with the following commands:
# Download model files
wget https://hf-mirror.com/D-Robotics/InternVL2_5-1B-GGUF-BPU/blob/main/Qwen2.5-0.5B-Instruct-Q4_0.gguf
wget https://hf-mirror.com/D-Robotics/InternVL2_5-1B-GGUF-BPU/blob/main/rdkx5/vit_model_int16_v2.bin
Use the srpi-config
command to set the ION memory size to 2.5GB. For details, refer to the Performance Options section in the RDK User Manual.
After rebooting, set the CPU maximum frequency to 1.5GHz and the scheduling mode to performance
with the following commands:
sudo bash -c 'echo 1 > /sys/devices/system/cpu/cpufreq/boost'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor'
sudo bash -c 'echo performance >/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor'
Currently, two demo modes are provided: direct terminal input (image and text), or subscribing to image and text messages and publishing the results as text.
Single Image Inference Demo
# Set up tros.b environment
source /opt/tros/humble/setup.bash
cp -r /opt/tros/${TROS_DISTRO}/lib/hobot_llamacpp/config/ .
ros2 run hobot_llamacpp hobot_llamacpp --ros-args -p feed_type:=0 -p image:=config/image2.jpg -p image_type:=0 -p user_prompt:="Describe this image."
After starting the program, you can use a local image and custom prompt for output.
Notes
Ensure the development board has 4GB RAM and the ION memory size is set to 2.5GB, otherwise the model may fail to load.