Catalog

Overview

The Qwen2.5 3B AI Agent on NVIDIA Jetson™ provides a plug-and-play AI runtime for NVIDIA Jetson™ devices, integrating the Qwen 2.5 3B model (via Ollama) with a FastAPI-based LangChain AI Agent (integrated with EdgeSync Device Library) and OpenWebUI interface. This container offers:

  • Offline, on-device LLM inference using Qwen 2.5 3B via Ollama (no internet required post-setup)
  • LangChain middleware with FastAPI for orchestrating modular pipelines
  • Built-in FAISS vector database for efficient semantic search and RAG use case
  • Agent support to enable autonomous, multi-step task execution and decision-making
  • Prompt memory and context handling for smarter conversations
  • Streaming chat UI via OpenWebUI
  • OpenAI-compatible API endpoints for seamless integration
  • Customizable model parameters via modelfile & environment variables
  • AI Agent integrated with EdgeSync Device Library for calling various peripheral functions via natural language prompts
  • Predefined LangChain tools (functions) registered for the agent to call hardware APIs

Container Demo


Use Cases

  • Predictive Maintenance Chatbots: Integrate with edge telemetry or logs to summarize anomalies, explain error codes, or recommend corrective actions using historical context.
  • Compliance and Audit Q&A: Run offline LLMs trained on local policy or compliance data to assist with audits or generate summaries of regulatory alignment—ensuring data never leaves the premises.
  • Safety Manual Conversational Agents: Deploy LLMs to provide instant answers from on-site safety manuals or procedures, reducing downtime and improving adherence to protocols.
  • Technician Support Bots: Field service engineers can interact with the bot to troubleshoot equipment based on past repair logs, parts catalogs, and service manuals.
  • Smart Edge Controllers: LLMs can translate human intent (e.g., “reduce line 2 speed by 10%”) into control commands for industrial PLCs or middleware using AI agents.
  • Conversational Retrieval (RAG): Integrate with vector databases (like FAISS and ChromaDB) to retrieve relevant context from local documents and enable conversational Q&A over your custom data.
  • Tool-Enabled Agents: Create intelligent agents that use calculators, APIs, or search tools as part of their reasoning process—LangChain handles the logic and LLM interface.
  • Factory Incident Reporting: Ingest logs or voice input → extract incident type → summarize → trigger automated alerts or next steps
  • Custom Tool-Driven Agents: Expand the system with new LangChain tools to call additional hardware functions, fetch local metrics, or trigger external workflows—all via natural language.

Key Features

  • LangChain Middleware: Agent logic with memory and modular chains
  • Ollama Integration: Lightweight inference engine for quantized models
  • Complete AI Framework Stack: PyTorch, TensorFlow, ONNX Runtime, and TensorRT™
  • Industrial Vision Support: Accelerated OpenCV and GStreamer pipelines
  • Edge AI Capabilities: Support for computer vision, LLMs, and time-series analysis
  • Performance Optimized: Tuned specifically for NVIDIA® Jetson Orin™ NX 8GB
  • EdgeSync Integration with Agent Integration of the EdgeSync Device Library with the agent to interact with low-level edge hardware components via natural language

Host Device Prerequisites

Item Specification
Compatible Hardware Advantech devices accelerated by NVIDIA Jetson™—refer to Compatible Hardware
NVIDIA Jetson™ Version 5.x
Host OS Ubuntu 20.04
Required Software Packages Refer to Below
Software Installation NVIDIA Jetson™ Software Package Installation

Container Environment Overview

Software Components on Container Image

Component Version Description
CUDA® 11.4.315 GPU computing platform
cuDNN 8.6.0 Deep Neural Network library
TensorRT™ 8.5.2.2 Inference optimizer and runtime
PyTorch 2.0.0+nv23.02 Deep learning framework
TensorFlow 2.12.0 Machine learning framework
ONNX Runtime 1.16.3 Cross-platform inference engine
OpenCV 4.5.0 Computer vision library with CUDA®
GStreamer 1.16.2 Multimedia framework
Ollama 0.5.7 LLM inference engine
LangChain 0.2.17 Orchestration layer for memory, RAG, and agent workflows
FastAPI 0.115.12 API service exposing LangChain interface
OpenWebUI 0.6.5 Web interface for chat interactions
FAISS 1.8.0.post1 Vector store for RAG pipelines
EdgeSync 1.0.0 EdgeSync is provided as part of the container image for low-level edge hardware components interaction with the AI Agent.

Quick Start Guide

For container quick start, including the docker-compose file and more, please refer to README.


Supported AI Capabilities

Language Models Recommendation

Model Family Parameters Quantization Size Performance
DeepSeek R1 1.5 B Q4_K_M 1.1 GB ~15-17 tokens/sec
DeepSeek R1 7 B Q4_K_M 4.7 GB ~5-7 tokens/sec
DeepSeek Coder 1.3 B Q4_0 776 MB ~20-25 tokens/sec
Llama 3.2 1 B Q8_0 1.3 GB ~17-20 tokens/sec
Llama 3.2 Instruct 1 B Q4_0 ~0.8 GB ~17-20 tokens/sec
Llama 3.2 3 B Q4_K_M 2 GB ~10-12 tokens/sec
Llama 2 7 B Q4_0 3.8 GB ~5-7 tokens/sec
Tinyllama 1.1 B Q4_0 637 MB ~22-27 tokens/sec
Qwen 2.5 0.5 B Q4_K_M 398 MB ~25-30 tokens/sec
Qwen 2.5 1.5 B Q4_K_M 986 MB ~15-17 tokens/sec
Qwen 2.5 Coder 0.5 B Q8_0 531 MB ~25-30 tokens/sec
Qwen 2.5 Coder 1.5 B Q4_K_M 986 MB ~15-17 tokens/sec
Qwen 0.5 B Q4_0 395 MB ~25-30 tokens/sec
Qwen 1.8 B Q4_0 1.1 GB ~15-20 tokens/sec
Gemma 2 2 B Q4_0 1.6 GB ~10-12 tokens/sec
Mistral 7 B Q4_0 4.1 GB ~5-7 tokens/sec

Tuning Tips for Efficient RAG and Agent Workflows:*

  • Use asynchronous chains and streaming response handlers to reduce latency in FastAPI endpoints.
  • For RAG pipelines, use efficient vector stores (e.g., FAISS with cosine or inner product) and pre-filter data when possible.
  • Avoid long chain dependencies; break workflows into smaller composable components.
  • Cache prompt templates and tool results when applicable to reduce unnecessary recomputation
  • For agent-based flows, limit tool calls per loop to avoid runaway execution or high memory usage.
  • Log intermediate steps (using LangChain’s callbacks) for better debugging and observability
  • Use models with ≥3B parameters (e.g., Llama 3.2 3B or larger) for agent development to ensure better reasoning depth and tool usage reliability.

Copyright © Advantech Corporation. All rights reserved.