Automatic Speech Recognition - ASR
This sample runs a speech recognition model using the BPU inference engine to automatically transcribe .wav audio files and output the corresponding text. The sample code is located in /app/cdev_demo/bpu/07_speech_sample/01_asr/.
The current RDK S100 system image does not include the asr.hbm model. Before running this sample, you must download it manually (see the download command in "Model Download" below) and place it at the default path /opt/hobot/model/s100/basic/asr.hbm, or specify another path with --model_path.
This sample runs a speech recognition model using the BPU inference engine to automatically transcribe .wav audio files and output the corresponding text. The sample code is located in /app/cdev_demo/bpu/speech_sample/asr/.
Model Description
-
Overview:
ASR (Automatic Speech Recognition) models convert audio signals into text. The input is single-channel speech waveforms (after sample rate conversion and standardization), and the output is character-level token sequences. Combined with a vocabulary (vocab) file, Chinese speech transcription can be achieved. This sample uses a quantized
.hbmmodel. -
HBM model name: asr.hbm
-
Input format: audio waveform, single channel, sample rate 16kHz, maximum length 30000 (sample points)
-
Output: character token probability distribution (logits); recognized text is obtained by argmax decoding and mapping
Feature Overview
-
Model loading
Load the ASR model and automatically parse model input/output shapes and quantization information.
-
Input preprocessing
Read audio with SoundFile (supports
.wav). The audio is:- Converted to single channel
- Resampled to the target sample rate (default 16kHz)
- Standardized to zero mean and unit variance (z-score)
- Padded or truncated to a fixed length (for example, 30000)
- Supports generator-based processing of long audio for streaming recognition
-
Inference execution
Complete inference using the
.infer()method. -
Result post-processing
Obtain token indices from output logits, map them to characters using the vocab dictionary file (JSON format), and output the final recognized text.
Environment Dependencies
Before building and running, ensure the following dependencies are installed:
sudo apt update
sudo apt install -y libgflags-dev libsndfile1-dev libsamplerate0-dev
Directory Structure
.
|-- CMakeLists.txt # CMake build script: target/dependency/include/link configuration
|-- README.md # Usage instructions (this file)
|-- inc
| |-- asr.hpp # ASR inference wrapper header (load/preprocess/infer/postprocess interfaces)
| `-- audio_chunk_reader.hpp # Audio chunk reader: read file → resample → output chunks
`-- src
|-- asr.cc # ASR inference implementation: input write, forward pass, CTC decode, etc.
|-- audio_chunk_reader.cc # Chunk reader implementation: libsndfile + libsamplerate streaming chunks
`-- main.cc # Program entry: parse args → loop over chunks → infer → concatenate transcription
Build the Project
- Configure and build
mkdir build && cd build
cmake ..
make -j$(nproc)
Model Download
If the model is not found at runtime, download it with the following command:
wget https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_s100/asr/asr.hbm
wget https://archive.d-robotics.cc/downloads/rdk_model_zoo/rdk_s600/asr/asr.hbm
Parameter Reference
| Parameter | Description | Default Value |
|---|---|---|
--model_path | Model file path (.hbm) | /opt/hobot/model/s100/basic/asr.hbm |
--test_sound | Input audio file path (.wav) | /app/res/assets/chi_sound.wav |
--vocab_file | Vocabulary (JSON), mapping class id → token | /app/res/labels/vocab.json |
| Parameter | Description | Default Value |
|---|---|---|
--model_path | Model file path (.hbm) | /opt/hobot/model/s600/basic/asr.hbm |
--test_sound | Input audio file path (.wav) | /app/res/assets/chi_sound.wav |
--vocab_file | Vocabulary (JSON), mapping class id → token | /app/res/labels/vocab.json |
Quick Start
-
Run the model
-
Make sure you are in the
builddirectory -
Use default parameters
./asr -
Run with custom parameters
./asr \
--model_path /opt/hobot/model/s100/basic/asr.hbm \
--test_sound /app/res/assets/chi_sound.wav \
--vocab_file /app/res/labels/vocab.json./asr \
--model_path /opt/hobot/model/s600/basic/asr.hbm \
--test_sound /app/res/assets/chi_sound.wav \
--vocab_file /app/res/labels/vocab.json
-
-
View the results
After successful execution, the result will be printed.
我是来自阿里云的大规模语言磨型过叫通意千问||
Notes
- For more deployment options or model support information, refer to the official documentation or contact platform technical support.