Catalog

Overview

ENERZAi Optimium 1.58-bit Model Optimizer

Qualcomm-containers_optimium_whisper

The Qualcomm-containers_optimium_whisper container is designed to facilitate the deployment and usage of speech to text models powered by our own proprietary inference engine Optimium on edge devices. It provides a pre-configured environment to run high-efficiency AI inference tasks seamlessly.


Key Features

  • Architecture: Built exclusively for Linux Arm64.
  • Runtime: Includes full support for the Optimium Runtime.
  • Optimization: Models are specifically optimized for the Qualcomm QCS6490 chipset.
  • Language Support: Supports English (en) and Chinese (zh).

Supported Host Devices

  • Devices based on Qualcomm QCS6490.

Prerequisites

Ensure the following software is installed on the Host OS before deploying:


Software Components

The container image comes pre-installed with:

  • Optimium Runtime
  • Python 3.10

Quick Start Guide

Follow these steps to set up and run the project:Github Repository for Quick Start


Best Practices & Known Limitations

  • Audio Overlap: The model is designed to process audio in 10-second units. For continuous audio exceeding 10 seconds, a 2-second overlap is recommended to achieve optimal performance. If the audio segments are not continuous, model.overlap_tokens must be reset.
  • Model Performance: This model is designed to run exclusively on the CPU.
  • For organizations seeking enhanced real-time capabilities, please contact ENERZAi.

Performance Benchmarking

The following table compares the performance of Optimium against Fast-Whisper on the target hardware.

AI Backend Model Size Audio Duration Threads Memory (KB) Encoder Time (ms) Decoder Time (Tokens / ms per token) Total Time (ms) Notes
Optimium (CPU only) Small 10s 6 160,130 1040.23 34 / 45.11 2746.56
Optimium (CPU only) Small 10s 4 - 3067.67 34 / 36.33 4682.54
Optimium (CPU only) Small 10s Enc: 6
Dec: 4
- 1051.40 34 / 36.33 2660.36 Optimized for 4 performance cores + 4 efficiency cores architecture
Fast-Whisper (CTranslate2) Small (INT8) 10s 6 731,796 2386 33 / 82.1515 5098

Table 1. Performance comparison of AI backends on the target hardware.