Building an Intelligent Agent with Local LLMs and Azure OpenAI
Author: [Htunn Thu Thu]
Date: June 1, 2025 Tags: #MachineLearning #AzureOpenAI #LocalLLM #AgenticAI
Introduction
In this post, I'll walk through how I built an intelligent agent that combines the power of local LLMs with web search capabilities and Azure OpenAI integration. The project, called "Agentic LLM Search," enables users to get well-researched answers with proper citations, all while providing flexibility in model choice - from running entirely locally to leveraging Azure's powerful cloud models.
๐ Key Features
Dual Model Support: Run with local TinyLlama models or Azure OpenAI
Internet Research Capabilities: Searches the web for up-to-date information
Proper Citations: All answers include sources and references
Multiple Interfaces: Command-line, Streamlit web UI, and API options
Hardware Acceleration: Optimized for Apple Silicon with Metal GPU support
Python 3.12 Optimized: Takes advantage of the latest Python features
๐ ๏ธ Technical Architecture
The system uses a modular architecture with several key components:
Agent Layer: Coordinates model inference and search operations
Search Tool: Connects to DuckDuckGo for real-time web results
Model Layer: Provides a unified interface to local and cloud models
User Interfaces: Multiple ways to interact with the system
๐ System Flow
When a user asks a question, here's how the system processes it:
The query is received through one of the interfaces (CLI, Web UI, API)
The agent optimizes the query for search
The search tool retrieves relevant results from the web
The LLM model (local TinyLlama or Azure OpenAI) processes the search results
The model generates a comprehensive answer with citations
The response is returned to the user through the original interface
๐ป Setting Up the Project
Let's look at how to set up and run the project:
# Clone the repository
git clone https://github.com/Htunn/agentic-llm-search.git
cd agentic-llm-search
# The easiest way is to use the provided setup script
chmod +x run.sh
./run.sh
The run.sh
script handles everything:
Checks Python version requirements
Sets up a virtual environment
Installs dependencies
Downloads the local TinyLlama model if needed
Lets you choose between local and Azure OpenAI models
Provides options for CLI or web interface
Make sure you have Python 3.12+ installed for the best experience. The script will warn you if your Python version is older.
๐ Configuring Model Options
One of the most powerful aspects of this project is its flexibility in model choice. You can configure it to use either:
Option 1: Local TinyLlama Model
For privacy-conscious users or those without cloud API access, the local TinyLlama model runs entirely on your machine:
# .env file configuration for local model
DEFAULT_MODEL=./src/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
MODEL_PROVIDER=huggingface
The local model works great on Apple Silicon Macs with Metal GPU acceleration, making inference reasonably fast even on consumer hardware.
Option 2: Azure OpenAI Integration
For more powerful responses, you can connect to Azure OpenAI:
# .env file configuration for Azure OpenAI
DEFAULT_MODEL=gpt-35-turbo
MODEL_PROVIDER=azure-openai
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2023-05-15
๐ Behind the Scenes: Azure OpenAI Integration
Integrating with Azure OpenAI required some additional work beyond standard OpenAI API access. The key considerations were:
Deployment Name Handling: Azure OpenAI requires a deployment name that might differ from the model name
Endpoint Configuration: Each Azure OpenAI resource has its own unique endpoint
API Version Management: Ensuring compatibility with the right API version
Here's a look at how this is implemented in the code:
class AzureOpenAIModel(OpenAIModel):
"""Azure OpenAI model wrapper"""
def __init__(self,
model_name: str = "gpt-35-turbo",
deployment_name: Optional[str] = None,
api_key: Optional[str] = None,
api_version: Optional[str] = None,
endpoint: Optional[str] = None,
**kwargs):
# Use environment variables if parameters not provided
api_key = api_key or os.getenv("AZURE_OPENAI_API_KEY")
api_version = api_version or os.getenv("AZURE_OPENAI_API_VERSION", "2023-05-15")
endpoint = endpoint or os.getenv("AZURE_OPENAI_ENDPOINT")
# If deployment name not provided, try to get from env or use model_name
deployment_name = deployment_name or os.getenv("AZURE_OPENAI_DEPLOYMENT", model_name)
super().__init__(model_name, api_key, **kwargs)
self.deployment_name = deployment_name
# Create Azure OpenAI client instead of standard OpenAI client
self.client = AzureOpenAI(
api_key=self.api_key,
api_version=api_version,
azure_endpoint=endpoint
)
The key difference when using Azure OpenAI vs standard OpenAI is that requests need to use the deployment name rather than the model name:
# When generating responses
response = self.client.chat.completions.create(
model=self.deployment_name, # Use deployment name for Azure
messages=messages,
temperature=self.temperature,
max_tokens=self.max_tokens
)
๐ฅ๏ธ User Interfaces
The project provides multiple ways to interact with the agent:
Command Line Interface
Perfect for quick questions or scripting:
python3 test_agentic_search.py
Streamlit Web UI
A user-friendly interface with additional settings:
streamlit run app.py
The Streamlit UI allows:
Selecting between model providers (local or Azure OpenAI)
Adjusting search settings
Viewing results with formatted citations
Accessing query history
๐ Performance Considerations
Performance varies significantly between the local TinyLlama model and Azure OpenAI:
Speed
Moderate (depends on hardware)
Fast
Quality
Good for basic queries
Excellent for complex questions
Cost
Free (runs locally)
Pay per API call
Privacy
Full privacy (all local)
Sends data to Azure
Offline Use
โ Fully functional offline
โ Requires internet
๐ฎ Future Improvements
This project is still evolving, with several enhancements planned:
More Models: Support for additional local models like Llama 3
Memory System: Add conversation memory for follow-up questions
Tool Usage: Enable the models to use additional tools beyond search
Embedding Search: Implement vector search for more relevant results
Multi-Modal Support: Add image understanding capabilities
๐ง Lessons Learned
Developing this project taught me several valuable lessons:
Model Flexibility Matters: Having both local and cloud options provides maximum versatility
API Differences: Working with Azure OpenAI has subtle but important differences from standard OpenAI
Local Inference Optimizations: Getting good performance from local models requires hardware-specific optimizations
Search Integration: Combining search with LLMs dramatically improves the quality of responses
๐ Conclusion
Building an intelligent agent that bridges local LLMs and Azure OpenAI with web search capabilities has been an exciting journey. The system provides remarkable flexibility, allowing users to choose between privacy-focused local inference or powerful cloud-based models.
The modular architecture makes it easy to extend with new features, and the multiple interfaces ensure it's accessible for various use cases - from quick command-line queries to rich web interactions.
Feel free to check out the project on GitHub and contribute to its development!
๐ฉโ๐ป About the Author
I'm a technology enthusiast with a passion for AI and machine learning systems. My background spans software development, automation and cloud architecture, with particular expertise in Python development, DevSecOps, AWS and Azure cloud services. I enjoy building systems that bridge the gap between cutting-edge AI research and practical applications, especially those that leverage both local and cloud-based models. When I'm not coding, you can find me exploring the latest advancements in AI/ML, contributing to open-source projects.
Last updated