CLIP
Introduction
CLIP is a multimodal machine learning model proposed by OpenAI. This model uses contrastive learning on large-scale image-text pairs to process both images and text, mapping them into a shared vector space. This example demonstrates the functionality of using CLIP for image management and text query on the RDK platform.
Code repository: (https://github.com/D-Robotics/hobot_clip.git)
Application scenario: Using CLIP image feature extractor to manage images, usr text or image to query images, etc.
Component
The project consists of four parts.
-
clip_encode_image: an dnn node for the CLIP image encoder, currently supporting two modes:
- Local mode: Supports input backpropagation, outputting text encoding features.
- Service mode: Based on ROS Action Server, supports client nodes sending inference requests and calculating the returned text encoding features.
-
clip_encode_text: an dnn node for the CLIP text encoder, currently supporting two modes:
- Local mode: Supports input backpropagation, outputting text encoding features.
- Service mode: Based on ROS Action Server, supports client nodes sending inference requests and calculating the returned text encoding features.
-
clip_manage: CLIP relay node responsible for clienting and servicing. Currently, it supports two modes:
- Storage mode: Send encoding requests to the image encoding node clip_encode_image, retrieve image encoding features from the target folder, and store the image encoding features in the local SQLite database.
- Query mode: Send an encoding request to the text encoding node clip_encode_text to obtain the encoding features of the target text. Next step, match the text features with image features in the database to obtain the matching results.
-
clip_msgs: CLIP app topic definition, action server control msg definition。
Supported Platforms
| Platform | System | Function |
|---|---|---|
| RDK X5 | Ubuntu 22.04 (Humble) | Start CLIP Storage/Query mode, Storage database saved locally while query results display on the Web |
| RDK S100, RDK S100P | Ubuntu 22.04 (Humble) | Start CLIP Storage/Query mode, Storage database saved locally while query results display on the Web |
Preparation
RDK
-
The RDK has burned the Ubuntu 22.04 system image provided by D-Robotics.
-
The RDK has successfully installed TogetheROS.Bot.
Dependency Installation
pip3 install onnxruntime
pip3 install ftfy
pip3 install wcwidth
pip3 install regex
Model Download
# Download the model file from the web.
wget http://archive.d-robotics.cc/models/clip_encode_text/text_encoder.tar.gz
sudo tar -xf text_encoder.tar.gz -C config