Catalog

Overview

Edge RAG AI Agent on AMD Ryzen™ with ROCm™

Short summary: Enterprise-grade LLM inference with Retrieval-Augmented Generation (RAG) for building intelligent AI agents on AMD Ryzen systems, featuring offline capability, document QA, and hardware acceleration.

About Advantech Container Catalog (ACC)

Advantech Container Catalog is a comprehensive collection of ready-to-use, containerized software packages designed to accelerate the development and deployment of Edge AI applications. By offering pre-integrated solutions optimized for embedded hardware, it simplifies the challenges often faced with software and hardware compatibility, especially in GPU/NPU-accelerated environments.

Feature / Benefit Description
Accelerated Edge AI Development Ready-to-use containerized solutions for faster prototyping and deployment
Hardware Compatible Reduces hardware and package incompatibility issues
GPU/NPU Access Ready Supports passthrough for efficient hardware acceleration
Model Conversion & Optimization Built-in model conversion and quantization recommendations
Optimized for CV & LLM Applications Optimized stacks for vision and language workloads

Container Overview

This container delivers a modular, high-performance AI agent solution tailored for AMD Ryzen ROCm devices. It combines Ollama for efficient on-device LLM inference, LangChain for orchestration and tool integration, FastAPI middleware for API exposure, and OpenWebUI for an intuitive interface. The architecture supports Retrieval-Augmented Generation (RAG), multi-turn conversations, custom tool integration, and offline-first operations—ideal for building intelligent, context-aware agents that process PDF documents and respond with retrieved information.

Demo

RAG Query Demo:

Use Case

  • Legal document assistants with offline data privacy
  • Internal SOP (Standard Operating Procedure) assistants
  • Medical protocol access for healthcare professionals
  • Compliance and audit Q&A systems
  • Safety manual conversational agents
  • Technician support and troubleshooting bots
  • Industrial edge controllers using AI agents
  • Retrieval-augmented generation for domain-specific QA
  • Tool-enabled intelligent agents with reasoning capabilities
  • Corporate knowledge base chatbots

Key Features

  • DeepSeek-R1 1.5B Inference: Lightweight, efficient on-device LLM with minimal memory footprint
  • LangChain Integration: Modular framework for building complex AI workflows and agents
  • RAG Capability: Retrieval-Augmented Generation for document-based question answering
  • FastAPI Middleware: RESTful APIs for seamless integration with frontends and services
  • OpenWebUI Interface: User-friendly chat interface for real-time LLM interaction
  • Offline Operation: Fully functional after initial setup; no internet required
  • Tool Integration: Support for external tools, calculators, and search capabilities
  • Conversational Memory: Multi-turn conversations with context retention
  • Hardware Acceleration: Optimized for AMD Ryzen ROCm GPUs (Radeon 780M)
  • Model Customization: Easy model switching and fine-tuning via Ollama
  • Streaming Support: Real-time response streaming for interactive UX

Host Device Prerequisites

Item Specification
Compatible Hardware AMD Ryzen systems with ROCm support (e.g., Ryzen 7 PRO 8845HS with Radeon 780M)
GPU AMD Radeon 780M or compatible ROCm-supported GPU
Memory 4GB minimum, 8GB+ recommended for optimal performance
Host OS Linux (Ubuntu 22.04+ recommended)
Required Packages Docker, Docker Compose, ROCm Runtime

Required Software Packages on Host Device

Component Version Description
Docker 28.1.1+ Container runtime platform
Docker Compose 2.39.1+ Multi-container orchestration
ROCm Runtime Latest AMD GPU acceleration framework
ROCm Driver Latest GPU driver for hardware support

Container Environment Overview

Software Components in the Image

Component Version Description
Ollama 0.17.5 LLM inference engine
LangChain 0.2.17 LLM orchestration and agent framework
FastAPI 0.115.12 REST API framework for middleware
OpenWebUI 0.6.5 Web-based chat interface
DeepSeek-R1 1.5B Lightweight language model
ONNX Runtime 1.16.3 Cross-platform inference engine
GStreamer 1.24.2 Multimedia framework
FAISS 1.8.0+ Vector store for RAG and similarity search
Sentence-T5-Base Latest Embedding model from HuggingFace

Container Quick Start Guide

For installation, setup, build scripts, and detailed usage instructions, please refer to the Advantech Containers Github Repository in the repository.


Supported AI Capabilities

Model Information

Item Description
Primary Model DeepSeek-R1 1.5B
Model Architecture Qwen2-based
Quantization Q4_K_M
Ollama Command ollama pull deepseek-r1:1.5b
Parameters ~1.78 Billion
Model Size ~1.1 GB
Context Window 2048 tokens (configurable)

Model Customization Options

Users can switch models within the .env file:

  • deepseek-r1:1.5b (default)
  • qwen2.5:0.5b (ultra-lightweight)
  • qwen2.5:1.5b (similar size)
  • qwen3:1.7b (larger context)

Document Types

Attribute Details
Supported Format PDF (text-based)
Maximum File Size 170 MB (~4,300 pages)
Unsupported Scanned/image PDFs, encrypted files, Word/CSV documents
Multi-Document Multiple PDFs supported simultaneously
Language English language documents

RAG & Retrieval Features

Feature Support
Vector Store FAISS with pre-computed embeddings
Embedding Model Sentence-T5-Base
Similarity Search Cosine and inner product similarity
Score Thresholding Configurable via SCORE_THRESHOLD environment variable
Persistent Storage FAISS saved index for container restarts
Chunk Processing Automatic document chunking and splitting

AI Agent Capabilities

Capability Description
Tool Integration Custom tools and function calling
Memory Types Buffer, summary, and vector-based recall
Streaming Real-time response streaming
Async Support Non-blocking pipeline execution
Conversation Chains Multi-turn dialogue with context
Reasoning Step-by-step reasoning with tool usage

Hardware Acceleration Support

Accelerator Support Level Compatible Libraries Notes
AMD ROCm GPU Full Ollama, PyTorch, ONNX Runtime Primary acceleration target (Radeon 780M)
CPU Fallback Full LangChain, FastAPI, FAISS Configurable via OLLAMA_LLM_LIBRARY
Quantized Models Full Ollama, LLama.cpp Q4_K_M quantization standard

Architecture

Component Stack

  1. Ollama: Local LLM inference engine with model management
  2. DeepSeek-R1 1.5B: Efficient language model for reasoning and generation
  3. LangChain: Framework for chains, agents, and workflows
  4. FAISS: Vector database for RAG semantic search
  5. FastAPI: REST middleware between OpenWebUI and LangChain
  6. OpenWebUI: User-facing chat interface

Data Flow

Document Upload → PDF Parsing → Chunking → Embedding → Vector Store → Query Processing → Retrieval → LLM Response → Streaming UI


Best Practices for Document Preparation

  • Ensure documents are topically consistent and logically structured
  • Remove irrelevant sections (watermarks, repeated headers)
  • Prefer clean metadata and minimal formatting clutter
  • Avoid heavily stylized layouts (multi-column text, embedded visuals)
  • Don't mix multiple unrelated domains in the same document set
  • Use focused, document-specific prompts (e.g., "What are the features of X?")
  • Reference document structure explicitly in queries
  • Restart services after adding/removing/changing PDF files
  • Increase swap size if RAM is less than 8GB

Copyright © Advantech Corporation. All rights reserved.