Building an Intelligent Agent with Local LLMs and Azure OpenAI

Author: [Htunn Thu Thu]
Date: June 1, 2025 Tags: #MachineLearning #AzureOpenAI #LocalLLM #AgenticAI

Introduction

In this post, I'll walk through how I built an intelligent agent that combines the power of local LLMs with web search capabilities and Azure OpenAI integration. The project, called "Agentic LLM Search," enables users to get well-researched answers with proper citations, all while providing flexibility in model choice - from running entirely locally to leveraging Azure's powerful cloud models.

🌟 Key Features

Dual Model Support: Run with local TinyLlama models or Azure OpenAI
Internet Research Capabilities: Searches the web for up-to-date information
Proper Citations: All answers include sources and references
Multiple Interfaces: Command-line, Streamlit web UI, and API options
Hardware Acceleration: Optimized for Apple Silicon with Metal GPU support
Python 3.12 Optimized: Takes advantage of the latest Python features

🛠️ Technical Architecture

The system uses a modular architecture with several key components:

Agent Layer: Coordinates model inference and search operations
Search Tool: Connects to DuckDuckGo for real-time web results
Model Layer: Provides a unified interface to local and cloud models
User Interfaces: Multiple ways to interact with the system

This modular design makes it easy to swap out components or extend functionality without major refactoring.

📊 System Flow

When a user asks a question, here's how the system processes it:

The query is received through one of the interfaces (CLI, Web UI, API)
The agent optimizes the query for search
The search tool retrieves relevant results from the web
The LLM model (local TinyLlama or Azure OpenAI) processes the search results
The model generates a comprehensive answer with citations
The response is returned to the user through the original interface

💻 Setting Up the Project

Let's look at how to set up and run the project:

# Clone the repository
git clone https://github.com/Htunn/agentic-llm-search.git
cd agentic-llm-search

# The easiest way is to use the provided setup script
chmod +x run.sh
./run.sh

The run.sh script handles everything:

Checks Python version requirements
Sets up a virtual environment
Installs dependencies
Downloads the local TinyLlama model if needed
Lets you choose between local and Azure OpenAI models
Provides options for CLI or web interface

Make sure you have Python 3.12+ installed for the best experience. The script will warn you if your Python version is older.

🔄 Configuring Model Options

One of the most powerful aspects of this project is its flexibility in model choice. You can configure it to use either:

Option 1: Local TinyLlama Model

For privacy-conscious users or those without cloud API access, the local TinyLlama model runs entirely on your machine:

# .env file configuration for local model
DEFAULT_MODEL=./src/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
MODEL_PROVIDER=huggingface

The local model works great on Apple Silicon Macs with Metal GPU acceleration, making inference reasonably fast even on consumer hardware.

Option 2: Azure OpenAI Integration

For more powerful responses, you can connect to Azure OpenAI:

# .env file configuration for Azure OpenAI
DEFAULT_MODEL=gpt-35-turbo
MODEL_PROVIDER=azure-openai
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2023-05-15

Using Azure OpenAI provides higher quality responses at the cost of API usage. The local model provides a great alternative when offline or for cost-sensitive applications.

🔍 Behind the Scenes: Azure OpenAI Integration

Integrating with Azure OpenAI required some additional work beyond standard OpenAI API access. The key considerations were:

Deployment Name Handling: Azure OpenAI requires a deployment name that might differ from the model name
Endpoint Configuration: Each Azure OpenAI resource has its own unique endpoint
API Version Management: Ensuring compatibility with the right API version

Here's a look at how this is implemented in the code:

class AzureOpenAIModel(OpenAIModel):
    """Azure OpenAI model wrapper"""
    
    def __init__(self, 
                 model_name: str = "gpt-35-turbo", 
                 deployment_name: Optional[str] = None,
                 api_key: Optional[str] = None, 
                 api_version: Optional[str] = None,
                 endpoint: Optional[str] = None,
                 **kwargs):
        # Use environment variables if parameters not provided
        api_key = api_key or os.getenv("AZURE_OPENAI_API_KEY") 
        api_version = api_version or os.getenv("AZURE_OPENAI_API_VERSION", "2023-05-15")
        endpoint = endpoint or os.getenv("AZURE_OPENAI_ENDPOINT")
        # If deployment name not provided, try to get from env or use model_name
        deployment_name = deployment_name or os.getenv("AZURE_OPENAI_DEPLOYMENT", model_name)
        
        super().__init__(model_name, api_key, **kwargs)
        self.deployment_name = deployment_name
        
        # Create Azure OpenAI client instead of standard OpenAI client
        self.client = AzureOpenAI(
            api_key=self.api_key,
            api_version=api_version,
            azure_endpoint=endpoint
        )

The key difference when using Azure OpenAI vs standard OpenAI is that requests need to use the deployment name rather than the model name:

# When generating responses
response = self.client.chat.completions.create(
    model=self.deployment_name,  # Use deployment name for Azure
    messages=messages,
    temperature=self.temperature,
    max_tokens=self.max_tokens
)

🖥️ User Interfaces

The project provides multiple ways to interact with the agent:

Command Line Interface

Perfect for quick questions or scripting:

python3 test_agentic_search.py

Streamlit Web UI

A user-friendly interface with additional settings:

streamlit run app.py

The Streamlit UI allows:

Selecting between model providers (local or Azure OpenAI)
Adjusting search settings
Viewing results with formatted citations
Accessing query history

📈 Performance Considerations

Performance varies significantly between the local TinyLlama model and Azure OpenAI:

Feature

Local TinyLlama

Azure OpenAI

Speed

Moderate (depends on hardware)

Fast

Quality

Good for basic queries

Excellent for complex questions

Cost

Free (runs locally)

Pay per API call

Privacy

Full privacy (all local)

Sends data to Azure

Offline Use

✅ Fully functional offline

❌ Requires internet

🔮 Future Improvements

This project is still evolving, with several enhancements planned:

More Models: Support for additional local models like Llama 3
Memory System: Add conversation memory for follow-up questions
Tool Usage: Enable the models to use additional tools beyond search
Embedding Search: Implement vector search for more relevant results
Multi-Modal Support: Add image understanding capabilities

🧠 Lessons Learned

Developing this project taught me several valuable lessons:

Model Flexibility Matters: Having both local and cloud options provides maximum versatility
API Differences: Working with Azure OpenAI has subtle but important differences from standard OpenAI
Local Inference Optimizations: Getting good performance from local models requires hardware-specific optimizations
Search Integration: Combining search with LLMs dramatically improves the quality of responses

📋 Conclusion

Building an intelligent agent that bridges local LLMs and Azure OpenAI with web search capabilities has been an exciting journey. The system provides remarkable flexibility, allowing users to choose between privacy-focused local inference or powerful cloud-based models.

The modular architecture makes it easy to extend with new features, and the multiple interfaces ensure it's accessible for various use cases - from quick command-line queries to rich web interactions.

Feel free to check out the project on GitHub and contribute to its development!

Have questions about this project? Feel free to open an issue on GitHub or reach out directly!

👩‍💻 About the Author

I'm a technology enthusiast with a passion for AI and machine learning systems. My background spans software development, automation and cloud architecture, with particular expertise in Python development, DevSecOps, AWS and Azure cloud services. I enjoy building systems that bridge the gap between cutting-edge AI research and practical applications, especially those that leverage both local and cloud-based models. When I'm not coding, you can find me exploring the latest advancements in AI/ML, contributing to open-source projects.

PreviousBlog Content NextRevolutionizing IoT Monitoring: My Personal Journey with LLM-Powered Observability

Last updated 18 days ago

Building an Intelligent Agent with Local LLMs and Azure OpenAI

Author: [Htunn Thu Thu]
Date: June 1, 2025 Tags: #MachineLearning #AzureOpenAI #LocalLLM #AgenticAI

Introduction

🌟 Key Features

Dual Model Support: Run with local TinyLlama models or Azure OpenAI
Internet Research Capabilities: Searches the web for up-to-date information
Proper Citations: All answers include sources and references
Multiple Interfaces: Command-line, Streamlit web UI, and API options
Hardware Acceleration: Optimized for Apple Silicon with Metal GPU support
Python 3.12 Optimized: Takes advantage of the latest Python features

🛠️ Technical Architecture

The system uses a modular architecture with several key components:

Agent Layer: Coordinates model inference and search operations
Search Tool: Connects to DuckDuckGo for real-time web results
Model Layer: Provides a unified interface to local and cloud models
User Interfaces: Multiple ways to interact with the system

This modular design makes it easy to swap out components or extend functionality without major refactoring.

📊 System Flow

When a user asks a question, here's how the system processes it:

The query is received through one of the interfaces (CLI, Web UI, API)
The agent optimizes the query for search
The search tool retrieves relevant results from the web
The LLM model (local TinyLlama or Azure OpenAI) processes the search results
The model generates a comprehensive answer with citations
The response is returned to the user through the original interface

💻 Setting Up the Project

Let's look at how to set up and run the project:

# Clone the repository
git clone https://github.com/Htunn/agentic-llm-search.git
cd agentic-llm-search

# The easiest way is to use the provided setup script
chmod +x run.sh
./run.sh

The run.sh script handles everything:

Checks Python version requirements
Sets up a virtual environment
Installs dependencies
Downloads the local TinyLlama model if needed
Lets you choose between local and Azure OpenAI models
Provides options for CLI or web interface

Make sure you have Python 3.12+ installed for the best experience. The script will warn you if your Python version is older.

🔄 Configuring Model Options

One of the most powerful aspects of this project is its flexibility in model choice. You can configure it to use either:

Option 1: Local TinyLlama Model

For privacy-conscious users or those without cloud API access, the local TinyLlama model runs entirely on your machine:

# .env file configuration for local model
DEFAULT_MODEL=./src/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
MODEL_PROVIDER=huggingface

The local model works great on Apple Silicon Macs with Metal GPU acceleration, making inference reasonably fast even on consumer hardware.

Option 2: Azure OpenAI Integration

For more powerful responses, you can connect to Azure OpenAI:

# .env file configuration for Azure OpenAI
DEFAULT_MODEL=gpt-35-turbo
MODEL_PROVIDER=azure-openai
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2023-05-15

Using Azure OpenAI provides higher quality responses at the cost of API usage. The local model provides a great alternative when offline or for cost-sensitive applications.

🔍 Behind the Scenes: Azure OpenAI Integration

Integrating with Azure OpenAI required some additional work beyond standard OpenAI API access. The key considerations were:

Deployment Name Handling: Azure OpenAI requires a deployment name that might differ from the model name
Endpoint Configuration: Each Azure OpenAI resource has its own unique endpoint
API Version Management: Ensuring compatibility with the right API version

Here's a look at how this is implemented in the code:

class AzureOpenAIModel(OpenAIModel):
    """Azure OpenAI model wrapper"""
    
    def __init__(self, 
                 model_name: str = "gpt-35-turbo", 
                 deployment_name: Optional[str] = None,
                 api_key: Optional[str] = None, 
                 api_version: Optional[str] = None,
                 endpoint: Optional[str] = None,
                 **kwargs):
        # Use environment variables if parameters not provided
        api_key = api_key or os.getenv("AZURE_OPENAI_API_KEY") 
        api_version = api_version or os.getenv("AZURE_OPENAI_API_VERSION", "2023-05-15")
        endpoint = endpoint or os.getenv("AZURE_OPENAI_ENDPOINT")
        # If deployment name not provided, try to get from env or use model_name
        deployment_name = deployment_name or os.getenv("AZURE_OPENAI_DEPLOYMENT", model_name)
        
        super().__init__(model_name, api_key, **kwargs)
        self.deployment_name = deployment_name
        
        # Create Azure OpenAI client instead of standard OpenAI client
        self.client = AzureOpenAI(
            api_key=self.api_key,
            api_version=api_version,
            azure_endpoint=endpoint
        )

The key difference when using Azure OpenAI vs standard OpenAI is that requests need to use the deployment name rather than the model name:

# When generating responses
response = self.client.chat.completions.create(
    model=self.deployment_name,  # Use deployment name for Azure
    messages=messages,
    temperature=self.temperature,
    max_tokens=self.max_tokens
)

🖥️ User Interfaces

The project provides multiple ways to interact with the agent:

Command Line Interface

Perfect for quick questions or scripting:

python3 test_agentic_search.py

Streamlit Web UI

A user-friendly interface with additional settings:

streamlit run app.py

The Streamlit UI allows:

Selecting between model providers (local or Azure OpenAI)
Adjusting search settings
Viewing results with formatted citations
Accessing query history

📈 Performance Considerations

Performance varies significantly between the local TinyLlama model and Azure OpenAI:

Feature

Local TinyLlama

Azure OpenAI

Speed

Moderate (depends on hardware)

Fast

Quality

Good for basic queries

Excellent for complex questions

Cost

Free (runs locally)

Pay per API call

Privacy

Full privacy (all local)

Sends data to Azure

Offline Use

✅ Fully functional offline

❌ Requires internet

🔮 Future Improvements

This project is still evolving, with several enhancements planned:

More Models: Support for additional local models like Llama 3
Memory System: Add conversation memory for follow-up questions
Tool Usage: Enable the models to use additional tools beyond search
Embedding Search: Implement vector search for more relevant results
Multi-Modal Support: Add image understanding capabilities

🧠 Lessons Learned

Developing this project taught me several valuable lessons:

Model Flexibility Matters: Having both local and cloud options provides maximum versatility
API Differences: Working with Azure OpenAI has subtle but important differences from standard OpenAI
Local Inference Optimizations: Getting good performance from local models requires hardware-specific optimizations
Search Integration: Combining search with LLMs dramatically improves the quality of responses

📋 Conclusion

Feel free to check out the project on GitHub and contribute to its development!

Have questions about this project? Feel free to open an issue on GitHub or reach out directly!

👩‍💻 About the Author

PreviousBlog Content NextRevolutionizing IoT Monitoring: My Personal Journey with LLM-Powered Observability

Last updated 18 days ago