Overview
ENERZAi Optimium 1.58-bit Model Optimizer
Qualcomm-containers_optimium_whisper
The Qualcomm-containers_optimium_whisper container is designed to facilitate the deployment and usage of speech to text models powered by our own proprietary inference engine Optimium on edge devices. It provides a pre-configured environment to run high-efficiency AI inference tasks seamlessly.
Key Features
- Architecture: Built exclusively for Linux Arm64.
- Runtime: Includes full support for the Optimium Runtime.
- Optimization: Models are specifically optimized for the Qualcomm QCS6490 chipset.
- Language Support: Supports English (en) and Chinese (zh).
Supported Host Devices
- Devices based on Qualcomm QCS6490.
Prerequisites
Ensure the following software is installed on the Host OS before deploying:
Software Components
The container image comes pre-installed with:
- Optimium Runtime
- Python 3.10
Quick Start Guide
Follow these steps to set up and run the project:Github Repository for Quick Start
Best Practices & Known Limitations
- Audio Overlap: The model is designed to process audio in 10-second units. For continuous audio exceeding 10 seconds, a 2-second overlap is recommended to achieve optimal performance. If the audio segments are not continuous, model.overlap_tokens must be reset.
- Model Performance: This model is designed to run exclusively on the CPU.
- For organizations seeking enhanced real-time capabilities, please contact ENERZAi.
Performance Benchmarking
The following table compares the performance of Optimium against Fast-Whisper on the target hardware.
| AI Backend | Model Size | Audio Duration | Threads | Memory (KB) | Encoder Time (ms) | Decoder Time (Tokens / ms per token) | Total Time (ms) | Notes |
|---|---|---|---|---|---|---|---|---|
| Optimium (CPU only) | Small | 10s | 6 | 160,130 | 1040.23 | 34 / 45.11 | 2746.56 | |
| Optimium (CPU only) | Small | 10s | 4 | - | 3067.67 | 34 / 36.33 | 4682.54 | |
| Optimium (CPU only) | Small | 10s | Enc: 6 Dec: 4 |
- | 1051.40 | 34 / 36.33 | 2660.36 | Optimized for 4 performance cores + 4 efficiency cores architecture |
| Fast-Whisper (CTranslate2) | Small (INT8) | 10s | 6 | 731,796 | 2386 | 33 / 82.1515 | 5098 |
Table 1. Performance comparison of AI backends on the target hardware.
