Overview
NPU Passthrough on NXP i.MX8M Plus
Overview
The NPU Passthrough on NXP i.MX8M Plus container image provides a comprehensive environment for building and deploying AI applications on NXP i.MX 8MP hardware. This container features full hardware acceleration support, optimized AI frameworks, and industrial-grade reliability. With this container, developers can quickly prototype and deploy AI use cases such as computer vision without the burden of solving time-consuming dependency issues or manually setting up complex toolchains. All required runtimes, libraries, and drivers are pre-configured, ensuring seamless integration with NXP AI acceleration stack.
Key Features
-
Complete AI Framework Stack: Pre-integrated runtimes including LiteRT for seamless execution of a wide variety of model formats ( .tflite). Developers can deploy models without worrying about low-level compatibility issues.
-
Edge AI Capabilities: Optimized support for computer vision leveraging NXP NPU acceleration.
-
Hardware Acceleration: Direct passthrough access to NPU hardware ensures high-performance and low-latency inference with minimal power consumption.
-
Preconfigured Environment: Eliminates time-consuming setup by bundling drivers, toolchains, and AI libraries, so developers can focus directly on building applications.
-
Rapid Prototyping & Deployment: Ideal for quickly testing AI models, validating PoCs, and deployment without rebuilding from scratch.
Hardware Specifications
| Component | Specification |
|---|---|
| Target Hardware | Advantech EPC-R3720 |
| SoC | NXP i.MX8MPlus |
| GPU | Vivante GC7000UL |
| NPU | Vivante GC7000UL |
| Memory | 6 GB LPDDR4 |
Operating System
| Environment | Operating System |
|---|---|
| Device Host | Yocto 4.0 (LTS) (5.15-kirkstone) |
| Container | Ubuntu:22.04 |
Software Components
| Component | Version | Description |
|---|---|---|
| LiteRT | 2.9.1 | Provides TFLite Delegate support for GPU and NPU acceleration |
| GStreamer | 1.20.0 | Multimedia framework for building flexible audio/video pipelines |
| NNStreamer | 2.1.1 | NNStreamer is a pipeline-centric multimedia framework enabling flexible audio/video processing integrated with TFLite and other ML backends. |
| Python | 3.10 | Python runtime for building applications |
Supported AI Capabilities
Vision Models
| Model | Format | Note |
|---|---|---|
| PoseNet (ResNet50) | TFLite | Provide By NXP Demo Experience |
| MobileNet V1 | TFLite | Provide By NXP Demo Experience |
| SSD Mobilenet V2 | TFLite | Provide By NXP Demo Experience |
| Facenet | TFLite | Provide By NXP Demo Experience |
Note: The above tables highlight a subset of commonly used models validated for this environment. Other transformer-based or vision models may also be supported depending on runtime compatibility and hardware resources. For the most detailed and updated list of supported models and runtimes, please refer to the NXP official NXP Demo Experience.
Supported AI Model Formats
| Runtime | Format | Compatible Versions |
|---|---|---|
| LiteRT | .tflite | 2.9.1 |
Hardware Acceleration Support
| Accelerator | Support Level | Compatible Libraries |
|---|---|---|
| NPU | INT8 (primary), limited mixed precision, (INT16/FP16 → quantized internally) | TensorFlow Lite (VX Delegate), NNStreamer |
| GPU | FP32 / FP16 | TensorFlow Lite (GPU delegate) |
Precision Support
| Precision | Support Level | Notes |
|---|---|---|
| FP32 | CPU, GPU | Highest accuracy, slower performance |
| FP16 | NPU | Not directly exposed via NPU |
| INT16 | NPU (internal mixed ops) | Some operators quantized into 16-bit internally |
| INT8 | NPU,CPU | Primary mode for NPU acceleration, best performance-per-watt |
Possible Use Cases
| Domain | Applications |
|---|---|
| Smart Surveillance & Security | Real-time object detection and person tracking using SSD MobileNet V2 models; intrusion detection, face recognition, and abnormal behavior monitoring on edge devices without cloud dependency. |
| Industrial Automation & Robotics | Defect detection in manufacturing lines with computer vision; gesture or pose estimation for human–robot collaboration; autonomous navigation and obstacle avoidance for robots and drones. |
| Healthcare & Wellness | Contactless vital sign monitoring using vision models; fall detection and activity recognition for elderly care; medical imaging assistance with lightweight segmentation models. |
| Retail & Smart Spaces | Customer flow analysis, heatmap generation, and people counting; shelf stock monitoring and automated checkout solutions; emotion detection for personalized customer experiences. |
| Transportation & Mobility | Driver monitoring (drowsiness, distraction detection); traffic analysis and smart signaling; vehicle and license plate recognition at the edge. |
Quick Start Guide
Prerequisites
- Please ensure docker & docker compose are available and accessible on device host OS
- Since default eMMC boot provides only 16 GB storage which is in-sufficient to run/build the container image, it is required to boot the Host OS using a 32 GB (minimum) SD card.
For container quick start, including the docker-compose file and more, please refer to Advantech Container Github Repository
Best Practices
Precision Selection
| Topic | Description |
|---|---|
| Prefer INT8 for NPU acceleration | The i.MX8MP NPU is optimized for quantized INT8 models. Always convert to INT8 using post-training quantization or quantization-aware training for maximum performance and efficiency. |
| Fallback to FP16/FP32 when INT8 unsupported | If some operators cannot be quantized, LiteRT may run those parts on CPU/GPU in FP32. FP16 is not natively accelerated but can sometimes reduce memory usage. |
| Accuracy validation post-quantization | Benchmark the INT8 model against FP32 baseline on-device using NNStreamer pipelines to ensure accuracy is acceptable before deployment. |
Model Optimization
| Topic | Description |
|---|---|
| Use lightweight backbones | Models like MobileNetV1/V2, EfficientNet-Lite, SSD-MobileNet, or YOLOv4-Tiny quantized are best suited for real-time workloads on i.MX8MP. |
| Leverage pre-tested models | Start with NXP-provided sample models (mobilenet, ssd_mobilenet_v2, etc.) or Ultralytics-exported YOLO models already verified with TFLite. |
| Prune and compress | Reduce model size via pruning and weight clustering to lower memory footprint and improve NPU throughput without major accuracy loss. |
Known Limitations
| Topic | Description |
|---|---|
| Minimum storage | Storage required for running Docker containers is 32 GB. |
| LiteRT delegate coverage | Not all TFLite ops are supported by the NPU; unsupported layers fall back to CPU/GPU, reducing FPS. |
| Operator coverage gaps | Advanced layers (attention, deformable conv, some postprocessing ops) are not supported on NPU. Custom handling is required. |
| Quantization trade-offs | INT8 is mandatory for NPU, but aggressive quantization can degrade accuracy. |
| Resource constraints | Large models (YOLOv8l, transformers >100M params) won’t run in real-time or may not fit in memory. Stick to quantized tiny/small variants. |
| Version sensitivity | BSP release defines supported NNStreamer + LiteRT versions. Using mismatched SDKs (e.g., newer TFLite) can break delegate acceleration. |
Copyright © 2025 Advantech Corporation. All rights reserved.
