Catalog

Overview

NPU Passthrough on NXP i.MX8M Plus

Overview

The NPU Passthrough on NXP i.MX8M Plus container image provides a comprehensive environment for building and deploying AI applications on NXP i.MX 8MP hardware. This container features full hardware acceleration support, optimized AI frameworks, and industrial-grade reliability. With this container, developers can quickly prototype and deploy AI use cases such as computer vision without the burden of solving time-consuming dependency issues or manually setting up complex toolchains. All required runtimes, libraries, and drivers are pre-configured, ensuring seamless integration with NXP AI acceleration stack.

Key Features

  • Complete AI Framework Stack: Pre-integrated runtimes including LiteRT for seamless execution of a wide variety of model formats ( .tflite). Developers can deploy models without worrying about low-level compatibility issues.

  • Edge AI Capabilities: Optimized support for computer vision leveraging NXP NPU acceleration.

  • Hardware Acceleration: Direct passthrough access to NPU hardware ensures high-performance and low-latency inference with minimal power consumption.

  • Preconfigured Environment: Eliminates time-consuming setup by bundling drivers, toolchains, and AI libraries, so developers can focus directly on building applications.

  • Rapid Prototyping & Deployment: Ideal for quickly testing AI models, validating PoCs, and deployment without rebuilding from scratch.

Hardware Specifications

Component Specification
Target Hardware Advantech EPC-R3720
SoC NXP i.MX8MPlus
GPU Vivante GC7000UL
NPU Vivante GC7000UL
Memory 6 GB LPDDR4

Operating System

Environment Operating System
Device Host Yocto 4.0 (LTS) (5.15-kirkstone)
Container Ubuntu:22.04

Software Components

Component Version Description
LiteRT 2.9.1 Provides TFLite Delegate support for GPU and NPU acceleration
GStreamer 1.20.0 Multimedia framework for building flexible audio/video pipelines
NNStreamer 2.1.1 NNStreamer is a pipeline-centric multimedia framework enabling flexible audio/video processing integrated with TFLite and other ML backends.
Python 3.10 Python runtime for building applications

Supported AI Capabilities

Vision Models

Model Format Note
PoseNet (ResNet50) TFLite Provide By NXP Demo Experience
MobileNet V1 TFLite Provide By NXP Demo Experience
SSD Mobilenet V2 TFLite Provide By NXP Demo Experience
Facenet TFLite Provide By NXP Demo Experience

Note: The above tables highlight a subset of commonly used models validated for this environment. Other transformer-based or vision models may also be supported depending on runtime compatibility and hardware resources. For the most detailed and updated list of supported models and runtimes, please refer to the NXP official NXP Demo Experience.

Supported AI Model Formats

Runtime Format Compatible Versions
LiteRT .tflite 2.9.1

Hardware Acceleration Support

Accelerator Support Level Compatible Libraries
NPU INT8 (primary), limited mixed precision, (INT16/FP16 → quantized internally) TensorFlow Lite (VX Delegate), NNStreamer
GPU FP32 / FP16 TensorFlow Lite (GPU delegate)

Precision Support

Precision Support Level Notes
FP32 CPU, GPU Highest accuracy, slower performance
FP16 NPU Not directly exposed via NPU
INT16 NPU (internal mixed ops) Some operators quantized into 16-bit internally
INT8 NPU,CPU Primary mode for NPU acceleration, best performance-per-watt

Possible Use Cases

Domain Applications
Smart Surveillance & Security Real-time object detection and person tracking using SSD MobileNet V2 models; intrusion detection, face recognition, and abnormal behavior monitoring on edge devices without cloud dependency.
Industrial Automation & Robotics Defect detection in manufacturing lines with computer vision; gesture or pose estimation for human–robot collaboration; autonomous navigation and obstacle avoidance for robots and drones.
Healthcare & Wellness Contactless vital sign monitoring using vision models; fall detection and activity recognition for elderly care; medical imaging assistance with lightweight segmentation models.
Retail & Smart Spaces Customer flow analysis, heatmap generation, and people counting; shelf stock monitoring and automated checkout solutions; emotion detection for personalized customer experiences.
Transportation & Mobility Driver monitoring (drowsiness, distraction detection); traffic analysis and smart signaling; vehicle and license plate recognition at the edge.

Quick Start Guide

Prerequisites

  • Please ensure docker & docker compose are available and accessible on device host OS
  • Since default eMMC boot provides only 16 GB storage which is in-sufficient to run/build the container image, it is required to boot the Host OS using a 32 GB (minimum) SD card.

For container quick start, including the docker-compose file and more, please refer to Advantech Container Github Repository

Best Practices

Precision Selection

Topic Description
Prefer INT8 for NPU acceleration The i.MX8MP NPU is optimized for quantized INT8 models. Always convert to INT8 using post-training quantization or quantization-aware training for maximum performance and efficiency.
Fallback to FP16/FP32 when INT8 unsupported If some operators cannot be quantized, LiteRT may run those parts on CPU/GPU in FP32. FP16 is not natively accelerated but can sometimes reduce memory usage.
Accuracy validation post-quantization Benchmark the INT8 model against FP32 baseline on-device using NNStreamer pipelines to ensure accuracy is acceptable before deployment.

Model Optimization

Topic Description
Use lightweight backbones Models like MobileNetV1/V2, EfficientNet-Lite, SSD-MobileNet, or YOLOv4-Tiny quantized are best suited for real-time workloads on i.MX8MP.
Leverage pre-tested models Start with NXP-provided sample models (mobilenet, ssd_mobilenet_v2, etc.) or Ultralytics-exported YOLO models already verified with TFLite.
Prune and compress Reduce model size via pruning and weight clustering to lower memory footprint and improve NPU throughput without major accuracy loss.

Known Limitations

Topic Description
Minimum storage Storage required for running Docker containers is 32 GB.
LiteRT delegate coverage Not all TFLite ops are supported by the NPU; unsupported layers fall back to CPU/GPU, reducing FPS.
Operator coverage gaps Advanced layers (attention, deformable conv, some postprocessing ops) are not supported on NPU. Custom handling is required.
Quantization trade-offs INT8 is mandatory for NPU, but aggressive quantization can degrade accuracy.
Resource constraints Large models (YOLOv8l, transformers >100M params) won’t run in real-time or may not fit in memory. Stick to quantized tiny/small variants.
Version sensitivity BSP release defines supported NNStreamer + LiteRT versions. Using mismatched SDKs (e.g., newer TFLite) can break delegate acceleration.

Copyright © 2025 Advantech Corporation. All rights reserved.