5.2.5 Data Communication
Zero-Copy
Overview
TogetheROS.Bot provides flexible and efficient zero-copy functionality that can significantly reduce communication latency and CPU usage for large data transfers. tros.b integrates the performance_test tool to conveniently benchmark performance differences before and after enabling zero-copy. The performance_test tool supports configuration of subscriber count, message size, QoS, and other parameters to evaluate communication performance in different scenarios. The main performance metrics are as follows:
- Latency: the transmission time from pub to sub for each message
- CPU usage: the percentage of CPU used by communication activity
- Resident memory: includes heap-allocated memory, shared memory, and stack memory used internally by the system
- Sample statistics: includes the number of messages sent, received, and lost in each experiment
Code repositories:
- https://github.com/D-Robotics/rclcpp
- https://github.com/D-Robotics/rcl_interfaces
- https://github.com/D-Robotics/benchmark
- The tros.b Foxy version adds the "zero-copy" feature based on ROS2 Foxy.
- The tros.b Humble version and later versions use the ROS2 "zero-copy" feature.
Supported Platforms
| Platform | Runtime Environment |
|---|---|
| RDK X3, RDK X3 Module | Ubuntu 20.04 (Foxy), Ubuntu 22.04 (Humble) |
| RDK X5, RDK X5 Module, RDK S100 | Ubuntu 22.04 (Humble) |
| RDK S600 | Ubuntu 24.04 (Jazzy) |
Prerequisites
RDK
-
Before testing, set the RDK to performance mode to ensure accurate test results. Run the following command:
echo performance > /sys/devices/system/cpu/cpufreq/policy0/scaling_governorFor more configuration details, refer to the System Configuration section.
For more configuration details, refer to the System Configuration section.
For more configuration details, refer to the System Configuration section.
For more configuration details, refer to the System Configuration section.
-
The performance_test package has been successfully installed on RDK. Installation command:
- Foxy
- Humble
- Jazzy
sudo apt update
sudo apt install tros-performance-testsudo apt update
sudo apt install tros-humble-performance-testsudo apt update
sudo apt install tros-jazzy-performance-test- Humble
sudo apt update
sudo apt install tros-humble-performance-test- Jazzy
sudo apt update
sudo apt install tros-jazzy-performance-test
If the sudo apt update command fails or reports an error, please refer to the FAQ section Q10: How to handle apt update command failure or error? for resolution.
If the sudo apt update command fails or reports an error, please refer to the FAQ section Q10: How to handle apt update command failure or error? for resolution.
If the sudo apt update command fails or reports an error, please refer to the FAQ section Q6: How to handle apt update command failure or error? for resolution.
If the sudo apt update command fails or reports an error, please refer to the FAQ section Q6: How to handle apt update command failure or error? for resolution.
Usage
RDK Platform
- 4M data transfer test without zero-copy enabled. Run the following command:
- Foxy
- Humble
- Jazzy
source /opt/tros/setup.bash
ros2 run performance_test perf_test --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
source /opt/tros/humble/setup.bash
ros2 run performance_test perf_test --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
source /opt/tros/jazzy/setup.bash
ros2 run performance_test perf_test --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
- Humble
source /opt/tros/humble/setup.bash
ros2 run performance_test perf_test --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
- Jazzy
source /opt/tros/jazzy/setup.bash
ros2 run performance_test perf_test --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
Test results are as follows:
run time
+--------------+-----------+--------+----------+
| T_experiment | 30.982817 | T_loop | 1.000126 |
+--------------+-----------+--------+----------+
samples latency
+------+------+------+-----------+---------------+ +----------+----------+----------+----------+
| recv | sent | lost | data_recv | relative_loss | | min | max | mean | variance |
+------+------+------+-----------+---------------+ +----------+----------+----------+----------+
| 99 | 100 | 0 | 418505326 | 0.000000 | | 0.004327 | 0.005605 | 0.004546 | 0.000000 |
+------+------+------+-----------+---------------+ +----------+----------+----------+----------+
publisher loop subscriber loop
+----------+----------+----------+----------+ +----------+----------+----------+----------+
| min | max | mean | variance | | min | max | mean | variance |
+----------+----------+----------+----------+ +----------+----------+----------+----------+
| 0.007260 | 0.008229 | 0.008057 | 0.000000 | | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
+----------+----------+----------+----------+ +----------+----------+----------+----------+
system usage
+-------------+-----------+---------+--------+--------+----------+--------+--------+
| utime | stime | maxrss | ixrss | idrss | isrss | minflt | majflt |
+-------------+-----------+---------+--------+--------+----------+--------+--------+
| 23120954000 | 121597000 | 65092 | 0 | 0 | 0 | 11578 | 2 |
+-------------+-----------+---------+--------+--------+----------+--------+--------+
| nswap | inblock | oublock | msgsnd | msgrcv | nsignals | nvcsw | nivcsw |
+-------------+-----------+---------+--------+--------+----------+--------+--------+
| 0 | 0 | 0 | 0 | 0 | 0 | 9885 | 7193 |
+-------------+-----------+---------+--------+--------+----------+--------+--------+
Maximum runtime reached. Exiting.
- 4M data transfer test with zero-copy enabled (add the
--zero-copyparameter). Run the following command:
- Foxy
- Humble
- Jazzy
source /opt/tros/setup.bash
ros2 run performance_test perf_test --zero-copy --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
source /opt/tros/humble/setup.bash
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
export FASTRTPS_DEFAULT_PROFILES_FILE=/opt/tros/humble/lib/hobot_shm/config/shm_fastdds.xml
export RMW_FASTRTPS_USE_QOS_FROM_XML=1
export ROS_DISABLE_LOANED_MESSAGES=0
ros2 run performance_test perf_test --zero-copy --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
source /opt/tros/jazzy/setup.bash
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
export FASTRTPS_DEFAULT_PROFILES_FILE=/opt/tros/jazzy/lib/hobot_shm/config/shm_fastdds.xml
export RMW_FASTRTPS_USE_QOS_FROM_XML=1
export ROS_DISABLE_LOANED_MESSAGES=0
ros2 run performance_test perf_test --zero-copy --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
- Humble
source /opt/tros/humble/setup.bash
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
export FASTRTPS_DEFAULT_PROFILES_FILE=/opt/tros/humble/lib/hobot_shm/config/shm_fastdds.xml
export RMW_FASTRTPS_USE_QOS_FROM_XML=1
export ROS_DISABLE_LOANED_MESSAGES=0
ros2 run performance_test perf_test --zero-copy --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
- Jazzy
source /opt/tros/jazzy/setup.bash
export RMW_IMPLEMENTATION=rmw_fastrtps_cpp
export FASTRTPS_DEFAULT_PROFILES_FILE=/opt/tros/jazzy/lib/hobot_shm/config/shm_fastdds.xml
export RMW_FASTRTPS_USE_QOS_FROM_XML=1
export ROS_DISABLE_LOANED_MESSAGES=0
ros2 run performance_test perf_test --zero-copy --reliable --keep-last --history-depth 10 -s 1 -m Array4m -r 100 --max-runtime 30
Test results are as follows:
run time
+--------------+-----------+--------+----------+
| T_experiment | 30.554773 | T_loop | 1.000084 |
+--------------+-----------+--------+----------+
samples latency
+------+------+------+-----------+---------------+ +----------+----------+----------+----------+
| recv | sent | lost | data_recv | relative_loss | | min | max | mean | variance |
+------+------+------+-----------+---------------+ +----------+----------+----------+----------+
| 99 | 99 | 0 | 418701472 | 0.000000 | | 0.000146 | 0.000381 | 0.000195 | 0.000000 |
+------+------+------+-----------+---------------+ +----------+----------+----------+----------+
publisher loop subscriber loop
+----------+----------+----------+----------+ +----------+----------+----------+----------+
| min | max | mean | variance | | min | max | mean | variance |
+----------+----------+----------+----------+ +----------+----------+----------+----------+
| 0.009812 | 0.009895 | 0.009877 | 0.000000 | | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
+----------+----------+----------+----------+ +----------+----------+----------+----------+
system usage
+------------+-----------+---------+--------+--------+----------+--------+--------+
| utime | stime | maxrss | ixrss | idrss | isrss | minflt | majflt |
+------------+-----------+---------+--------+--------+----------+--------+--------+
| 8727113000 | 307920000 | 46224 | 0 | 0 | 0 | 6440 | 0 |
+------------+-----------+---------+--------+--------+----------+--------+--------+
| nswap | inblock | oublock | msgsnd | msgrcv | nsignals | nvcsw | nivcsw |
+------------+-----------+---------+--------+--------+----------+--------+--------+
| 0 | 0 | 0 | 0 | 0 | 0 | 9734 | 2544 |
+------------+-----------+---------+--------+--------+----------+--------+--------+
Maximum runtime reached. Exiting.
Result Analysis
The performance_test tool outputs various types of statistical results. The following mainly compares differences in latency and system usage:
latency Comparing the average communication latency with "zero-copy" disabled and enabled, the values are 4.546ms and 0.195ms respectively, showing that the "zero-copy" feature significantly reduces communication latency.
system usage
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| utime | stime | maxrss | ixrss | idrss | isrss | minflt | majflt |
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| userspace time (Hz)| system time (Hz)| resident memory size (Byte) | 0 | 0 | 0 | minor page fault count | major page fault count |
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| nswap | inblock | oublock | msgsnd | msgrcv | nsignals | nvcsw | nivcsw |
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| 0 | 0 | 0 | 0 | 0 | 0 | voluntary context switch count| involuntary context switch count|
+------------------+---------------+-------------------+--------+--------+----------+------------------+---------------------+
| Communication Mode | latency | utime+stime | maxrss | minflt | majflt | nvcsw | nivcsw |
|---|---|---|---|---|---|---|---|
| Non-"zero-copy" | 0.004546 | 23242551000 | 65092 | 11578 | 2 | 9885 | 7193 |
| "zero-copy" | 0.000381 | 9035033000 | 46224 | 6440 | 0 | 9734 | 2544 |
Comparison shows:
- The sum of "zero-copy" utime and stime is significantly lower than non-"zero-copy", indicating that "zero-copy" consumes fewer CPU resources
- "zero-copy" maxrss is less than non-"zero-copy", indicating that "zero-copy" uses less memory
- "zero-copy" minflt and majflt are significantly less than non-"zero-copy", indicating less communication jitter with "zero-copy"
- "zero-copy" nvcsw and nivcsw are significantly less than non-"zero-copy", indicating less communication jitter with "zero-copy"
Overall, for large data communication, "zero-copy" is significantly better than non-"zero-copy" in terms of CPU consumption, memory usage, and communication latency jitter