Building an Intelligent Agent with Local LLMs and Azure OpenAI
Author: [Htunn Thu Thu]
Date: June 1, 2025 Tags: #MachineLearning #AzureOpenAI #LocalLLM #AgenticAI
Introduction
In this post, I'll walk through how I built an intelligent agent that combines the power of local LLMs with web search capabilities and Azure OpenAI integration. The project, called "Agentic LLM Search," enables users to get well-researched answers with proper citations, all while providing flexibility in model choice - from running entirely locally to leveraging Azure's powerful cloud models.
๐ Key Features
Dual Model Support: Run with local TinyLlama models or Azure OpenAI
Internet Research Capabilities: Searches the web for up-to-date information
Proper Citations: All answers include sources and references
Multiple Interfaces: Command-line, Streamlit web UI, and API options
Hardware Acceleration: Optimized for Apple Silicon with Metal GPU support
Python 3.12 Optimized: Takes advantage of the latest Python features
๐ ๏ธ Technical Architecture
The system uses a modular architecture with several key components:
Agent Layer: Coordinates model inference and search operations
Search Tool: Connects to DuckDuckGo for real-time web results
Model Layer: Provides a unified interface to local and cloud models
User Interfaces: Multiple ways to interact with the system
This modular design makes it easy to swap out components or extend functionality without major refactoring.
๐ System Flow
When a user asks a question, here's how the system processes it:
The query is received through one of the interfaces (CLI, Web UI, API)
The agent optimizes the query for search
The search tool retrieves relevant results from the web
The LLM model (local TinyLlama or Azure OpenAI) processes the search results
The model generates a comprehensive answer with citations
The response is returned to the user through the original interface
๐ป Setting Up the Project
Let's look at how to set up and run the project:
The run.sh script handles everything:
Checks Python version requirements
Sets up a virtual environment
Installs dependencies
Downloads the local TinyLlama model if needed
Lets you choose between local and Azure OpenAI models
Provides options for CLI or web interface
Make sure you have Python 3.12+ installed for the best experience. The script will warn you if your Python version is older.
๐ Configuring Model Options
One of the most powerful aspects of this project is its flexibility in model choice. You can configure it to use either:
Option 1: Local TinyLlama Model
For privacy-conscious users or those without cloud API access, the local TinyLlama model runs entirely on your machine:
The local model works great on Apple Silicon Macs with Metal GPU acceleration, making inference reasonably fast even on consumer hardware.
Option 2: Azure OpenAI Integration
For more powerful responses, you can connect to Azure OpenAI:
Using Azure OpenAI provides higher quality responses at the cost of API usage. The local model provides a great alternative when offline or for cost-sensitive applications.
๐ Behind the Scenes: Azure OpenAI Integration
Integrating with Azure OpenAI required some additional work beyond standard OpenAI API access. The key considerations were:
Deployment Name Handling: Azure OpenAI requires a deployment name that might differ from the model name
Endpoint Configuration: Each Azure OpenAI resource has its own unique endpoint
API Version Management: Ensuring compatibility with the right API version
Here's a look at how this is implemented in the code:
The key difference when using Azure OpenAI vs standard OpenAI is that requests need to use the deployment name rather than the model name:
๐ฅ๏ธ User Interfaces
The project provides multiple ways to interact with the agent:
Command Line Interface
Perfect for quick questions or scripting:
Streamlit Web UI
A user-friendly interface with additional settings:
The Streamlit UI allows:
Selecting between model providers (local or Azure OpenAI)
Adjusting search settings
Viewing results with formatted citations
Accessing query history
๐ Performance Considerations
Performance varies significantly between the local TinyLlama model and Azure OpenAI:
Speed
Moderate (depends on hardware)
Fast
Quality
Good for basic queries
Excellent for complex questions
Cost
Free (runs locally)
Pay per API call
Privacy
Full privacy (all local)
Sends data to Azure
Offline Use
โ Fully functional offline
โ Requires internet
๐ฎ Future Improvements
This project is still evolving, with several enhancements planned:
More Models: Support for additional local models like Llama 3
Memory System: Add conversation memory for follow-up questions
Tool Usage: Enable the models to use additional tools beyond search
Embedding Search: Implement vector search for more relevant results
Multi-Modal Support: Add image understanding capabilities
๐ง Lessons Learned
Developing this project taught me several valuable lessons:
Model Flexibility Matters: Having both local and cloud options provides maximum versatility
API Differences: Working with Azure OpenAI has subtle but important differences from standard OpenAI
Local Inference Optimizations: Getting good performance from local models requires hardware-specific optimizations
Search Integration: Combining search with LLMs dramatically improves the quality of responses
๐ Conclusion
Building an intelligent agent that bridges local LLMs and Azure OpenAI with web search capabilities has been an exciting journey. The system provides remarkable flexibility, allowing users to choose between privacy-focused local inference or powerful cloud-based models.
The modular architecture makes it easy to extend with new features, and the multiple interfaces ensure it's accessible for various use cases - from quick command-line queries to rich web interactions.
Feel free to check out the project on GitHub and contribute to its development!
Have questions about this project? Feel free to open an issue on GitHub or reach out directly!
๐ฉโ๐ป About the Author
I'm a technology enthusiast with a passion for AI and machine learning systems. My background spans software development, automation and cloud architecture, with particular expertise in Python development, DevSecOps, AWS and Azure cloud services. I enjoy building systems that bridge the gap between cutting-edge AI research and practical applications, especially those that leverage both local and cloud-based models. When I'm not coding, you can find me exploring the latest advancements in AI/ML, contributing to open-source projects.
Last updated