Overview

ENERZAi Optimium 1.58-bit Model Optimizer

Qualcomm-containers_optimium_whisper

The Qualcomm-containers_optimium_whisper container is designed to facilitate the deployment and usage of speech to text models powered by our own proprietary inference engine Optimium on edge devices. It provides a pre-configured environment to run high-efficiency AI inference tasks seamlessly.

Key Features

Architecture: Built exclusively for Linux Arm64.
Runtime: Includes full support for the Optimium Runtime.
Optimization: Models are specifically optimized for the Qualcomm QCS6490 chipset.
Language Support: Supports English (en) and Chinese (zh).

Supported Host Devices

Devices based on Qualcomm QCS6490.

Prerequisites

Ensure the following software is installed on the Host OS before deploying:

Software Components

The container image comes pre-installed with:

Optimium Runtime
Python 3.10

Quick Start Guide

Follow these steps to set up and run the project:Github Repository for Quick Start

Best Practices & Known Limitations

Audio Overlap: The model is designed to process audio in 10-second units. For continuous audio exceeding 10 seconds, a 2-second overlap is recommended to achieve optimal performance. If the audio segments are not continuous, model.overlap_tokens must be reset.
Model Performance: This model is designed to run exclusively on the CPU.
For organizations seeking enhanced real-time capabilities, please contact ENERZAi.

Performance Benchmarking

The following table compares the performance of Optimium against Fast-Whisper on the target hardware.

AI Backend	Model Size	Audio Duration	Threads	Memory (KB)	Encoder Time (ms)	Decoder Time (Tokens / ms per token)	Total Time (ms)	Notes
Optimium (CPU only)	Small	10s	6	160,130	1040.23	34 / 45.11	2746.56
Optimium (CPU only)	Small	10s	4	-	3067.67	34 / 36.33	4682.54
Optimium (CPU only)	Small	10s	Enc: 6 Dec: 4	-	1051.40	34 / 36.33	2660.36	Optimized for 4 performance cores + 4 efficiency cores architecture
Fast-Whisper (CTranslate2)	Small (INT8)	10s	6	731,796	2386	33 / 82.1515	5098

Table 1. Performance comparison of AI backends on the target hardware.