Building an Intelligent Agent with Local LLMs and Azure OpenAI

Author: [Htunn Thu Thu]

Date: June 1, 2025 Tags: #MachineLearning #AzureOpenAI #LocalLLM #AgenticAI

Introduction

In this post, I'll walk through how I built an intelligent agent that combines the power of local LLMs with web search capabilities and Azure OpenAI integration. The project, called "Agentic LLM Search," enables users to get well-researched answers with proper citations, all while providing flexibility in model choice - from running entirely locally to leveraging Azure's powerful cloud models.

๐ŸŒŸ Key Features

  • Dual Model Support: Run with local TinyLlama models or Azure OpenAI

  • Internet Research Capabilities: Searches the web for up-to-date information

  • Proper Citations: All answers include sources and references

  • Multiple Interfaces: Command-line, Streamlit web UI, and API options

  • Hardware Acceleration: Optimized for Apple Silicon with Metal GPU support

  • Python 3.12 Optimized: Takes advantage of the latest Python features

๐Ÿ› ๏ธ Technical Architecture

The system uses a modular architecture with several key components:

  1. Agent Layer: Coordinates model inference and search operations

  2. Search Tool: Connects to DuckDuckGo for real-time web results

  3. Model Layer: Provides a unified interface to local and cloud models

  4. User Interfaces: Multiple ways to interact with the system

circle-info

This modular design makes it easy to swap out components or extend functionality without major refactoring.

๐Ÿ“Š System Flow

When a user asks a question, here's how the system processes it:

  1. The query is received through one of the interfaces (CLI, Web UI, API)

  2. The agent optimizes the query for search

  3. The search tool retrieves relevant results from the web

  4. The LLM model (local TinyLlama or Azure OpenAI) processes the search results

  5. The model generates a comprehensive answer with citations

  6. The response is returned to the user through the original interface

๐Ÿ’ป Setting Up the Project

Let's look at how to set up and run the project:

The run.sh script handles everything:

  • Checks Python version requirements

  • Sets up a virtual environment

  • Installs dependencies

  • Downloads the local TinyLlama model if needed

  • Lets you choose between local and Azure OpenAI models

  • Provides options for CLI or web interface

circle-exclamation

๐Ÿ”„ Configuring Model Options

One of the most powerful aspects of this project is its flexibility in model choice. You can configure it to use either:

Option 1: Local TinyLlama Model

For privacy-conscious users or those without cloud API access, the local TinyLlama model runs entirely on your machine:

The local model works great on Apple Silicon Macs with Metal GPU acceleration, making inference reasonably fast even on consumer hardware.

Option 2: Azure OpenAI Integration

For more powerful responses, you can connect to Azure OpenAI:

circle-info

Using Azure OpenAI provides higher quality responses at the cost of API usage. The local model provides a great alternative when offline or for cost-sensitive applications.

๐Ÿ” Behind the Scenes: Azure OpenAI Integration

Integrating with Azure OpenAI required some additional work beyond standard OpenAI API access. The key considerations were:

  1. Deployment Name Handling: Azure OpenAI requires a deployment name that might differ from the model name

  2. Endpoint Configuration: Each Azure OpenAI resource has its own unique endpoint

  3. API Version Management: Ensuring compatibility with the right API version

Here's a look at how this is implemented in the code:

The key difference when using Azure OpenAI vs standard OpenAI is that requests need to use the deployment name rather than the model name:

๐Ÿ–ฅ๏ธ User Interfaces

The project provides multiple ways to interact with the agent:

Command Line Interface

Perfect for quick questions or scripting:

Streamlit Web UI

A user-friendly interface with additional settings:

The Streamlit UI allows:

  • Selecting between model providers (local or Azure OpenAI)

  • Adjusting search settings

  • Viewing results with formatted citations

  • Accessing query history

๐Ÿ“ˆ Performance Considerations

Performance varies significantly between the local TinyLlama model and Azure OpenAI:

Feature
Local TinyLlama
Azure OpenAI

Speed

Moderate (depends on hardware)

Fast

Quality

Good for basic queries

Excellent for complex questions

Cost

Free (runs locally)

Pay per API call

Privacy

Full privacy (all local)

Sends data to Azure

Offline Use

โœ… Fully functional offline

โŒ Requires internet

๐Ÿ”ฎ Future Improvements

This project is still evolving, with several enhancements planned:

  1. More Models: Support for additional local models like Llama 3

  2. Memory System: Add conversation memory for follow-up questions

  3. Tool Usage: Enable the models to use additional tools beyond search

  4. Embedding Search: Implement vector search for more relevant results

  5. Multi-Modal Support: Add image understanding capabilities

๐Ÿง  Lessons Learned

Developing this project taught me several valuable lessons:

  1. Model Flexibility Matters: Having both local and cloud options provides maximum versatility

  2. API Differences: Working with Azure OpenAI has subtle but important differences from standard OpenAI

  3. Local Inference Optimizations: Getting good performance from local models requires hardware-specific optimizations

  4. Search Integration: Combining search with LLMs dramatically improves the quality of responses

๐Ÿ“‹ Conclusion

Building an intelligent agent that bridges local LLMs and Azure OpenAI with web search capabilities has been an exciting journey. The system provides remarkable flexibility, allowing users to choose between privacy-focused local inference or powerful cloud-based models.

The modular architecture makes it easy to extend with new features, and the multiple interfaces ensure it's accessible for various use cases - from quick command-line queries to rich web interactions.

Feel free to check out the project on GitHubarrow-up-right and contribute to its development!

circle-info

Have questions about this project? Feel free to open an issue on GitHub or reach out directly!

๐Ÿ‘ฉโ€๐Ÿ’ป About the Author

I'm a technology enthusiast with a passion for AI and machine learning systems. My background spans software development, automation and cloud architecture, with particular expertise in Python development, DevSecOps, AWS and Azure cloud services. I enjoy building systems that bridge the gap between cutting-edge AI research and practical applications, especially those that leverage both local and cloud-based models. When I'm not coding, you can find me exploring the latest advancements in AI/ML, contributing to open-source projects.


Last updated