Part 1: Introduction to LLM APIs and Claude

Part of the LLM API Development 101 Series

My First Claude API Call

I remember my first time calling an LLM API. OpenAI GPT-3 had just become available. Built a simple script, made the call, got back gibberish.

Not because the model was bad. Because I set temperature=2.5 thinking higher was better. Learned quickly: temperature controls randomness, and values above 1.0 produce nonsense.

Fast forward to Claude: Same mistakes, different API. Sent a 150K token document, hit rate limits immediately. Burned through $50 in API credits in an hour of testing.

Now I know better. Let me save you the expensive lessons.

Understanding LLM APIs

What is an LLM API?

LLM API = REST endpoint to powerful language models. You send text (prompt), get back generated text (completion).

Key providers:

OpenAI - GPT-4, GPT-3.5 (most popular)
Anthropic - Claude (best for long context)
Google - PaLM, Gemini
Cohere - Command models
Open source - Llama, Mistral (self-hosted)

Why Claude?

I've used them all in production. Claude stands out for:

1. Massive context window - 200K tokens (≈150K words)

Entire codebases in one prompt
Full document analysis
Long conversation histories

2. Instruction following - Best at complex, multi-step tasks

Structured output generation
Following specific formatting rules
Consistent behavior

3. Safety and reliability - Constitutional AI approach

Reduces harmful outputs
More predictable responses
Better at refusing inappropriate requests

4. Price/performance - Competitive pricing

Claude 3 Opus: $15/1M input tokens, $75/1M output
Claude 3.5 Sonnet: $3/1M input, $15/1M output
Claude 3 Haiku: $0.25/1M input, $1.25/1M output

I use Claude for production applications because of this combination.

Setting Up Claude API

Get API Key

1. Create Anthropic account:

https://console.anthropic.com/

2. Generate API key:

Go to Settings → API Keys
Click "Create Key"
Copy and save securely (shown once!)

3. Set up billing:

Add payment method
Set usage limits (protect against accidents!)

Install Python SDK

pip install anthropic

I always use virtual environments:

python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install anthropic python-dotenv

Environment Setup

Never hardcode API keys! Use environment variables.

Create .env file:

ANTHROPIC_API_KEY=sk-ant-api03-...your-key-here...

Add to .gitignore:

echo ".env" >> .gitignore

Load in Python:

from dotenv import load_dotenv
import os

load_dotenv()
api_key = os.getenv("ANTHROPIC_API_KEY")

Your First API Call

Basic Example

import anthropic
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Initialize client
client = anthropic.Anthropic(
    api_key=os.getenv("ANTHROPIC_API_KEY")
)

# Make API call
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "What is FastAPI and why use it?"}
    ]
)

# Print response
print(message.content[0].text)

Output:

FastAPI is a modern, high-performance web framework for building APIs with Python 3.7+.
Key benefits include:
- Fast: Very high performance, on par with NodeJS and Go
- Easy: Designed to be easy to use and learn
- Robust: Production-ready code with automatic interactive documentation
- Standards-based: Based on OpenAPI and JSON Schema
...

This worked on my first try - Claude is reliable.

Understanding the Parameters

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",  # Which Claude model
    max_tokens=1024,                      # Max output length
    messages=[                            # Conversation history
        {"role": "user", "content": "..."}
    ],
    temperature=1.0,                      # Randomness (0-1)
    system="You are a helpful assistant." # System prompt
)

Key parameters:

model - Which Claude version:

claude-3-opus-20240229 - Most capable, expensive
claude-3-5-sonnet-20241022 - Best balance (I use this)
claude-3-haiku-20240307 - Fastest, cheapest

max_tokens - Maximum response length:

1 token ≈ 0.75 words
1024 tokens ≈ 750 words
You pay for tokens used, not max_tokens
Set high enough but not wasteful

temperature - Response randomness:

0.0 = Deterministic, focused
1.0 = Creative, varied (default)
I use 0.0 for code generation, 0.7 for writing

system - Sets behavior and context:

Define role and expertise
Set output format requirements
Provide background knowledge
Critical for consistent behavior

Working with Messages

Claude uses message format:

messages = [
    {"role": "user", "content": "Hi, I'm building an API."},
    {"role": "assistant", "content": "Great! What kind of API?"},
    {"role": "user", "content": "A chatbot API with FastAPI."}
]

Roles:

user - Your inputs
assistant - Claude's responses
Messages alternate between user/assistant

Complete example with conversation:

import anthropic
import os

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

# Conversation history
conversation = []

def chat(user_message):
    """Send message and get response."""
    # Add user message
    conversation.append({
        "role": "user",
        "content": user_message
    })
    
    # Get response
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=2048,
        messages=conversation
    )
    
    # Extract assistant message
    assistant_message = response.content[0].text
    
    # Add to conversation
    conversation.append({
        "role": "assistant",
        "content": assistant_message
    })
    
    return assistant_message

# Multi-turn conversation
print(chat("I'm building a FastAPI application."))
print(chat("What dependencies do I need?"))
print(chat("How do I handle authentication?"))

This pattern maintains context across multiple exchanges.

System Prompts

System prompts are powerful - they set Claude's behavior.

Basic System Prompt

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are an expert Python developer specializing in FastAPI.",
    messages=[
        {"role": "user", "content": "How do I create a POST endpoint?"}
    ]
)

Advanced System Prompt

From my production chatbot:

system_prompt = """You are a technical support assistant for our SaaS platform.

Key responsibilities:
- Answer questions about our API documentation
- Help debug integration issues
- Provide code examples in Python
- Stay professional and concise

Guidelines:
- Always provide working code examples
- If you don't know something, say so clearly
- Never make up API endpoints or features
- Format code with proper syntax highlighting
- Keep responses under 500 words unless asked for detail

Context:
- Our API uses FastAPI
- Authentication via JWT tokens
- Base URL: https://api.example.com/v1
- Documentation: https://docs.example.com
"""

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=2048,
    system=system_prompt,
    messages=[
        {"role": "user", "content": "How do I authenticate API requests?"}
    ]
)

This dramatically improved response quality in my support chatbot.

Understanding Costs

LLM APIs charge per token. Understanding costs prevents surprises.

Token Counting

import anthropic

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain FastAPI in detail."}
    ]
)

# Token usage
print(f"Input tokens: {message.usage.input_tokens}")
print(f"Output tokens: {message.usage.output_tokens}")
print(f"Total tokens: {message.usage.input_tokens + message.usage.output_tokens}")

Output:

Input tokens: 8
Output tokens: 342
Total tokens: 350

Cost Calculation

def calculate_cost(input_tokens, output_tokens, model="sonnet"):
    """Calculate API call cost."""
    
    # Prices per 1M tokens (as of 2024)
    prices = {
        "opus": {"input": 15.00, "output": 75.00},
        "sonnet": {"input": 3.00, "output": 15.00},
        "haiku": {"input": 0.25, "output": 1.25}
    }
    
    rates = prices[model]
    
    input_cost = (input_tokens / 1_000_000) * rates["input"]
    output_cost = (output_tokens / 1_000_000) * rates["output"]
    total_cost = input_cost + output_cost
    
    return {
        "input_cost": input_cost,
        "output_cost": output_cost,
        "total_cost": total_cost
    }

# Example calculation
cost = calculate_cost(1000, 2000, model="sonnet")
print(f"Total cost: ${cost['total_cost']:.6f}")
# Output: Total cost: $0.000033

My production app processes ~1M tokens/day:

Input: 600K tokens = $1.80
Output: 400K tokens = $6.00
Daily cost: ~$7.80

Cost Optimization Tips

From my experience:

1. Choose the right model:

# Expensive (but most capable)
model = "claude-3-opus-20240229"

# Balanced (my default)
model = "claude-3-5-sonnet-20241022"

# Cheap (for simple tasks)
model = "claude-3-haiku-20240307"

2. Limit max_tokens appropriately:

# Bad - wasteful
max_tokens = 4096  # For a yes/no answer

# Good - appropriate
max_tokens = 50    # For a yes/no answer

3. Use caching (covered in Part 4)

4. Batch similar requests

Error Handling

APIs fail. Always handle errors.

Basic Error Handling

import anthropic
from anthropic import APIError, APIConnectionError, RateLimitError

client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

try:
    message = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        messages=[
            {"role": "user", "content": "Hello!"}
        ]
    )
    print(message.content[0].text)
    
except RateLimitError:
    print("Rate limit exceeded. Slow down!")
    
except APIConnectionError:
    print("Network error. Check connection.")
    
except APIError as e:
    print(f"API error: {e}")
    
except Exception as e:
    print(f"Unexpected error: {e}")

Production Error Handling

From my API wrapper:

import anthropic
from anthropic import APIError
import time
import logging

logger = logging.getLogger(__name__)

def call_claude_with_retry(
    client,
    messages,
    max_retries=3,
    initial_delay=1
):
    """Call Claude API with exponential backoff retry."""
    
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-3-5-sonnet-20241022",
                max_tokens=2048,
                messages=messages
            )
            return response
            
        except anthropic.RateLimitError:
            if attempt == max_retries - 1:
                raise
            
            delay = initial_delay * (2 ** attempt)
            logger.warning(f"Rate limited. Retrying in {delay}s...")
            time.sleep(delay)
            
        except anthropic.APIConnectionError:
            if attempt == max_retries - 1:
                raise
            
            delay = initial_delay * (2 ** attempt)
            logger.warning(f"Connection error. Retrying in {delay}s...")
            time.sleep(delay)
            
        except APIError as e:
            logger.error(f"API error: {e}")
            raise

# Usage
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))
messages = [{"role": "user", "content": "Hello!"}]

response = call_claude_with_retry(client, messages)
print(response.content[0].text)

This saved me during a rate limit spike - automatic recovery instead of crashes.

Complete Example

Putting it all together - production-ready script:

#!/usr/bin/env python3
"""
Claude API interaction example with best practices.
"""
import anthropic
import os
import sys
from dotenv import load_dotenv
import logging

# Setup logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
logger = logging.getLogger(__name__)

# Load environment
load_dotenv()

class ClaudeClient:
    """Wrapper for Claude API with best practices."""
    
    def __init__(self, api_key=None):
        self.api_key = api_key or os.getenv("ANTHROPIC_API_KEY")
        if not self.api_key:
            raise ValueError("ANTHROPIC_API_KEY not found")
        
        self.client = anthropic.Anthropic(api_key=self.api_key)
        self.conversation = []
    
    def chat(
        self,
        message,
        system_prompt=None,
        model="claude-3-5-sonnet-20241022",
        temperature=1.0,
        max_tokens=2048
    ):
        """Send message and get response."""
        
        # Add user message
        self.conversation.append({
            "role": "user",
            "content": message
        })
        
        try:
            # API call
            response = self.client.messages.create(
                model=model,
                max_tokens=max_tokens,
                temperature=temperature,
                system=system_prompt or "",
                messages=self.conversation
            )
            
            # Extract response
            assistant_message = response.content[0].text
            
            # Add to conversation
            self.conversation.append({
                "role": "assistant",
                "content": assistant_message
            })
            
            # Log usage
            logger.info(
                f"Tokens - Input: {response.usage.input_tokens}, "
                f"Output: {response.usage.output_tokens}"
            )
            
            return assistant_message
            
        except Exception as e:
            logger.error(f"Error calling Claude API: {e}")
            raise
    
    def reset_conversation(self):
        """Clear conversation history."""
        self.conversation = []
        logger.info("Conversation reset")

def main():
    """Example usage."""
    
    # Initialize client
    claude = ClaudeClient()
    
    # System prompt
    system = """You are a helpful Python programming assistant.
    Provide clear, concise code examples.
    Always include error handling in examples."""
    
    # Chat
    print("Claude API Example\n" + "="*50)
    
    response1 = claude.chat(
        "How do I read a JSON file in Python?",
        system_prompt=system
    )
    print(f"\nClaude: {response1}\n")
    
    response2 = claude.chat(
        "What if the file doesn't exist?"
    )
    print(f"\nClaude: {response2}\n")
    
    # Show conversation
    print(f"\nConversation length: {len(claude.conversation)} messages")

if __name__ == "__main__":
    main()

Run it:

python claude_example.py

This is my template for every new Claude integration.

Best Practices

From building production LLM applications:

1. Always use environment variables:

# Good
api_key = os.getenv("ANTHROPIC_API_KEY")

# Bad
api_key = "sk-ant-api03-..."  # Never!

2. Set appropriate max_tokens:

# Match to use case
max_tokens = 50      # For short answers
max_tokens = 500     # For paragraphs
max_tokens = 2000    # For detailed responses
max_tokens = 4096    # For long-form content

3. Use system prompts consistently:

# Define once, reuse
SYSTEM_PROMPT = "You are a helpful assistant..."

response = client.messages.create(
    system=SYSTEM_PROMPT,
    ...
)

4. Monitor token usage:

# Track costs
total_tokens = response.usage.input_tokens + response.usage.output_tokens
logger.info(f"Request used {total_tokens} tokens")

5. Handle errors gracefully:

# Always wrap in try/except
try:
    response = client.messages.create(...)
except Exception as e:
    logger.error(f"Failed: {e}")
    # Have a fallback!

Common Mistakes

Mistakes I made (and you'll avoid):

1. Hardcoding API keys ❌

client = anthropic.Anthropic(api_key="sk-ant-...")  # Never!

2. No error handling ❌

# Will crash in production
response = client.messages.create(...)

3. Setting temperature too high ❌

temperature=2.0  # Produces gibberish

4. Not tracking costs ❌

# Surprised by $500 bill

5. Blocking calls in async code ❌

# We'll fix this in Part 2

What's Next?

You now know how to use Claude API. In Part 2, we'll build a complete FastAPI application that wraps Claude with proper async handling, request validation, and error handling.

Next: Part 2 - Building FastAPI Applications with Claude

Series Home: LLM API Development 101

This article is part of the LLM API Development 101 series. All examples use Python 3 and are based on real production applications.

PreviousLLM API Development 101 NextPart 2: Building FastAPI Applications with Claude

Last updated 2 days ago

hashtagMy First Claude API Call

hashtagUnderstanding LLM APIs

hashtagWhat is an LLM API?

hashtagWhy Claude?

hashtagSetting Up Claude API

hashtagGet API Key

hashtagInstall Python SDK

hashtagEnvironment Setup

hashtagYour First API Call

hashtagBasic Example

hashtagUnderstanding the Parameters

hashtagWorking with Messages

hashtagSystem Prompts

hashtagBasic System Prompt

hashtagAdvanced System Prompt

hashtagUnderstanding Costs

hashtagToken Counting

hashtagCost Calculation

hashtagCost Optimization Tips

hashtagError Handling

hashtagBasic Error Handling

hashtagProduction Error Handling

hashtagComplete Example

hashtagBest Practices

hashtagCommon Mistakes

hashtagWhat's Next?