Tech With Htunn
  • Blog Content
  • ๐Ÿค–Artificial Intelligence
    • ๐Ÿง Building an Intelligent Agent with Local LLMs and Azure OpenAI
    • ๐Ÿ“ŠRevolutionizing IoT Monitoring: My Personal Journey with LLM-Powered Observability
  • ๐Ÿ“˜Core Concepts
    • ๐Ÿ”„Understanding DevSecOps
    • โฌ…๏ธShifting Left in DevSecOps
    • ๐Ÿ“ฆUnderstanding Containerization
    • โš™๏ธWhat is Site Reliability Engineering?
    • โฑ๏ธUnderstanding Toil in SRE
    • ๐Ÿ”What is Identity and Access Management?
    • ๐Ÿ“ŠMicrosoft Graph API: An Overview
    • ๐Ÿ”„Understanding Identity Brokers
  • ๐Ÿ”ŽSecurity Testing
    • ๐Ÿ”SAST vs DAST: Understanding the Differences
    • ๐ŸงฉSoftware Composition Analysis (SCA)
    • ๐Ÿ“‹Software Bill of Materials (SBOM)
    • ๐ŸงชDependency Scanning in DevSecOps
    • ๐ŸณContainer Scanning in DevSecOps
  • ๐Ÿ”„CI/CD Pipeline
    • ๐Ÿ”My Journey with Continuous Integration in DevOps
    • ๐Ÿš€My Journey with Continuous Delivery and Deployment in DevOps
  • ๐ŸงฎFundamentals
    • ๐Ÿ’พWhat is Data Engineering?
    • ๐Ÿ”„Understanding DataOps
    • ๐Ÿ‘ทThe Role of a Cloud Architect
    • ๐Ÿ›๏ธCloud Native Architecture
    • ๐Ÿ’ปCloud Native Applications
  • ๐Ÿ›๏ธArchitecture & Patterns
    • ๐Ÿ…Medallion Architecture in Data Engineering
    • ๐Ÿ”„ETL vs ELT Pipeline: Understanding the Differences
  • ๐Ÿ”’Authentication & Authorization
    • ๐Ÿ”‘OAuth 2.0 vs OIDC: Key Differences
    • ๐Ÿ”Understanding PKCE in OAuth 2.0
    • ๐Ÿ”„Service Provider vs Identity Provider Initiated SAML Flows
  • ๐Ÿ“‹Provisioning Standards
    • ๐Ÿ“ŠSCIM in Identity and Access Management
    • ๐Ÿ“กUnderstanding SCIM Streaming
  • ๐Ÿ—๏ธDesign Patterns
    • โšกEvent-Driven Architecture
    • ๐Ÿ”’Web Application Firewalls
  • ๐Ÿ“ŠReliability Metrics
    • ๐Ÿ’ฐError Budgets in SRE
    • ๐Ÿ“SLA vs SLO vs SLI: Understanding the Differences
    • โฑ๏ธMean Time to Recovery (MTTR)
Powered by GitBook
On this page
  • Introduction
  • ๐ŸŒŸ Key Features
  • ๐Ÿ› ๏ธ Technical Architecture
  • ๐Ÿ“Š System Flow
  • ๐Ÿ’ป Setting Up the Project
  • ๐Ÿ”„ Configuring Model Options
  • Option 1: Local TinyLlama Model
  • Option 2: Azure OpenAI Integration
  • ๐Ÿ” Behind the Scenes: Azure OpenAI Integration
  • ๐Ÿ–ฅ๏ธ User Interfaces
  • Command Line Interface
  • Streamlit Web UI
  • ๐Ÿ“ˆ Performance Considerations
  • ๐Ÿ”ฎ Future Improvements
  • ๐Ÿง  Lessons Learned
  • ๐Ÿ“‹ Conclusion
  • ๐Ÿ‘ฉโ€๐Ÿ’ป About the Author
  1. Artificial Intelligence

Building an Intelligent Agent with Local LLMs and Azure OpenAI

Author: [Htunn Thu Thu]

Date: June 1, 2025 Tags: #MachineLearning #AzureOpenAI #LocalLLM #AgenticAI

Introduction

In this post, I'll walk through how I built an intelligent agent that combines the power of local LLMs with web search capabilities and Azure OpenAI integration. The project, called "Agentic LLM Search," enables users to get well-researched answers with proper citations, all while providing flexibility in model choice - from running entirely locally to leveraging Azure's powerful cloud models.

๐ŸŒŸ Key Features

  • Dual Model Support: Run with local TinyLlama models or Azure OpenAI

  • Internet Research Capabilities: Searches the web for up-to-date information

  • Proper Citations: All answers include sources and references

  • Multiple Interfaces: Command-line, Streamlit web UI, and API options

  • Hardware Acceleration: Optimized for Apple Silicon with Metal GPU support

  • Python 3.12 Optimized: Takes advantage of the latest Python features

๐Ÿ› ๏ธ Technical Architecture

The system uses a modular architecture with several key components:

  1. Agent Layer: Coordinates model inference and search operations

  2. Search Tool: Connects to DuckDuckGo for real-time web results

  3. Model Layer: Provides a unified interface to local and cloud models

  4. User Interfaces: Multiple ways to interact with the system

This modular design makes it easy to swap out components or extend functionality without major refactoring.

๐Ÿ“Š System Flow

When a user asks a question, here's how the system processes it:

  1. The query is received through one of the interfaces (CLI, Web UI, API)

  2. The agent optimizes the query for search

  3. The search tool retrieves relevant results from the web

  4. The LLM model (local TinyLlama or Azure OpenAI) processes the search results

  5. The model generates a comprehensive answer with citations

  6. The response is returned to the user through the original interface

๐Ÿ’ป Setting Up the Project

Let's look at how to set up and run the project:

# Clone the repository
git clone https://github.com/Htunn/agentic-llm-search.git
cd agentic-llm-search

# The easiest way is to use the provided setup script
chmod +x run.sh
./run.sh

The run.sh script handles everything:

  • Checks Python version requirements

  • Sets up a virtual environment

  • Installs dependencies

  • Downloads the local TinyLlama model if needed

  • Lets you choose between local and Azure OpenAI models

  • Provides options for CLI or web interface

Make sure you have Python 3.12+ installed for the best experience. The script will warn you if your Python version is older.

๐Ÿ”„ Configuring Model Options

One of the most powerful aspects of this project is its flexibility in model choice. You can configure it to use either:

Option 1: Local TinyLlama Model

For privacy-conscious users or those without cloud API access, the local TinyLlama model runs entirely on your machine:

# .env file configuration for local model
DEFAULT_MODEL=./src/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf
MODEL_PROVIDER=huggingface

The local model works great on Apple Silicon Macs with Metal GPU acceleration, making inference reasonably fast even on consumer hardware.

Option 2: Azure OpenAI Integration

For more powerful responses, you can connect to Azure OpenAI:

# .env file configuration for Azure OpenAI
DEFAULT_MODEL=gpt-35-turbo
MODEL_PROVIDER=azure-openai
AZURE_OPENAI_DEPLOYMENT=your-deployment-name
AZURE_OPENAI_API_KEY=your-api-key
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_API_VERSION=2023-05-15

Using Azure OpenAI provides higher quality responses at the cost of API usage. The local model provides a great alternative when offline or for cost-sensitive applications.

๐Ÿ” Behind the Scenes: Azure OpenAI Integration

Integrating with Azure OpenAI required some additional work beyond standard OpenAI API access. The key considerations were:

  1. Deployment Name Handling: Azure OpenAI requires a deployment name that might differ from the model name

  2. Endpoint Configuration: Each Azure OpenAI resource has its own unique endpoint

  3. API Version Management: Ensuring compatibility with the right API version

Here's a look at how this is implemented in the code:

class AzureOpenAIModel(OpenAIModel):
    """Azure OpenAI model wrapper"""
    
    def __init__(self, 
                 model_name: str = "gpt-35-turbo", 
                 deployment_name: Optional[str] = None,
                 api_key: Optional[str] = None, 
                 api_version: Optional[str] = None,
                 endpoint: Optional[str] = None,
                 **kwargs):
        # Use environment variables if parameters not provided
        api_key = api_key or os.getenv("AZURE_OPENAI_API_KEY") 
        api_version = api_version or os.getenv("AZURE_OPENAI_API_VERSION", "2023-05-15")
        endpoint = endpoint or os.getenv("AZURE_OPENAI_ENDPOINT")
        # If deployment name not provided, try to get from env or use model_name
        deployment_name = deployment_name or os.getenv("AZURE_OPENAI_DEPLOYMENT", model_name)
        
        super().__init__(model_name, api_key, **kwargs)
        self.deployment_name = deployment_name
        
        # Create Azure OpenAI client instead of standard OpenAI client
        self.client = AzureOpenAI(
            api_key=self.api_key,
            api_version=api_version,
            azure_endpoint=endpoint
        )

The key difference when using Azure OpenAI vs standard OpenAI is that requests need to use the deployment name rather than the model name:

# When generating responses
response = self.client.chat.completions.create(
    model=self.deployment_name,  # Use deployment name for Azure
    messages=messages,
    temperature=self.temperature,
    max_tokens=self.max_tokens
)

๐Ÿ–ฅ๏ธ User Interfaces

The project provides multiple ways to interact with the agent:

Command Line Interface

Perfect for quick questions or scripting:

python3 test_agentic_search.py

Streamlit Web UI

A user-friendly interface with additional settings:

streamlit run app.py

The Streamlit UI allows:

  • Selecting between model providers (local or Azure OpenAI)

  • Adjusting search settings

  • Viewing results with formatted citations

  • Accessing query history

๐Ÿ“ˆ Performance Considerations

Performance varies significantly between the local TinyLlama model and Azure OpenAI:

Feature
Local TinyLlama
Azure OpenAI

Speed

Moderate (depends on hardware)

Fast

Quality

Good for basic queries

Excellent for complex questions

Cost

Free (runs locally)

Pay per API call

Privacy

Full privacy (all local)

Sends data to Azure

Offline Use

โœ… Fully functional offline

โŒ Requires internet

๐Ÿ”ฎ Future Improvements

This project is still evolving, with several enhancements planned:

  1. More Models: Support for additional local models like Llama 3

  2. Memory System: Add conversation memory for follow-up questions

  3. Tool Usage: Enable the models to use additional tools beyond search

  4. Embedding Search: Implement vector search for more relevant results

  5. Multi-Modal Support: Add image understanding capabilities

๐Ÿง  Lessons Learned

Developing this project taught me several valuable lessons:

  1. Model Flexibility Matters: Having both local and cloud options provides maximum versatility

  2. API Differences: Working with Azure OpenAI has subtle but important differences from standard OpenAI

  3. Local Inference Optimizations: Getting good performance from local models requires hardware-specific optimizations

  4. Search Integration: Combining search with LLMs dramatically improves the quality of responses

๐Ÿ“‹ Conclusion

Building an intelligent agent that bridges local LLMs and Azure OpenAI with web search capabilities has been an exciting journey. The system provides remarkable flexibility, allowing users to choose between privacy-focused local inference or powerful cloud-based models.

The modular architecture makes it easy to extend with new features, and the multiple interfaces ensure it's accessible for various use cases - from quick command-line queries to rich web interactions.

Feel free to check out the project on GitHub and contribute to its development!

Have questions about this project? Feel free to open an issue on GitHub or reach out directly!

๐Ÿ‘ฉโ€๐Ÿ’ป About the Author

I'm a technology enthusiast with a passion for AI and machine learning systems. My background spans software development, automation and cloud architecture, with particular expertise in Python development, DevSecOps, AWS and Azure cloud services. I enjoy building systems that bridge the gap between cutting-edge AI research and practical applications, especially those that leverage both local and cloud-based models. When I'm not coding, you can find me exploring the latest advancements in AI/ML, contributing to open-source projects.


PreviousBlog ContentNextRevolutionizing IoT Monitoring: My Personal Journey with LLM-Powered Observability

Last updated 18 days ago

๐Ÿค–
๐Ÿง