Part 1: Introduction to LLM APIs and Claude
Part of the LLM API Development 101 Series
My First Claude API Call
I remember my first time calling an LLM API. OpenAI GPT-3 had just become available. Built a simple script, made the call, got back gibberish.
Not because the model was bad. Because I set temperature=2.5 thinking higher was better. Learned quickly: temperature controls randomness, and values above 1.0 produce nonsense.
Fast forward to Claude: Same mistakes, different API. Sent a 150K token document, hit rate limits immediately. Burned through $50 in API credits in an hour of testing.
Now I know better. Let me save you the expensive lessons.
Understanding LLM APIs
What is an LLM API?
LLM API = REST endpoint to powerful language models. You send text (prompt), get back generated text (completion).
Key providers:
OpenAI - GPT-4, GPT-3.5 (most popular)
Anthropic - Claude (best for long context)
Google - PaLM, Gemini
Cohere - Command models
Open source - Llama, Mistral (self-hosted)
Why Claude?
I've used them all in production. Claude stands out for:
1. Massive context window - 200K tokens (≈150K words)
Entire codebases in one prompt
Full document analysis
Long conversation histories
2. Instruction following - Best at complex, multi-step tasks
Structured output generation
Following specific formatting rules
Consistent behavior
3. Safety and reliability - Constitutional AI approach
Reduces harmful outputs
More predictable responses
Better at refusing inappropriate requests
4. Price/performance - Competitive pricing
Claude 3 Opus: $15/1M input tokens, $75/1M output
Claude 3.5 Sonnet: $3/1M input, $15/1M output
Claude 3 Haiku: $0.25/1M input, $1.25/1M output
I use Claude for production applications because of this combination.
Setting Up Claude API
Get API Key
1. Create Anthropic account:
2. Generate API key:
Go to Settings → API Keys
Click "Create Key"
Copy and save securely (shown once!)
3. Set up billing:
Add payment method
Set usage limits (protect against accidents!)
Install Python SDK
I always use virtual environments:
Environment Setup
Never hardcode API keys! Use environment variables.
Create .env file:
Add to .gitignore:
Load in Python:
Your First API Call
Basic Example
Output:
This worked on my first try - Claude is reliable.
Understanding the Parameters
Key parameters:
model - Which Claude version:
claude-3-opus-20240229- Most capable, expensiveclaude-3-5-sonnet-20241022- Best balance (I use this)claude-3-haiku-20240307- Fastest, cheapest
max_tokens - Maximum response length:
1 token ≈ 0.75 words
1024 tokens ≈ 750 words
You pay for tokens used, not max_tokens
Set high enough but not wasteful
temperature - Response randomness:
0.0 = Deterministic, focused
1.0 = Creative, varied (default)
I use 0.0 for code generation, 0.7 for writing
system - Sets behavior and context:
Define role and expertise
Set output format requirements
Provide background knowledge
Critical for consistent behavior
Working with Messages
Claude uses message format:
Roles:
user- Your inputsassistant- Claude's responsesMessages alternate between user/assistant
Complete example with conversation:
This pattern maintains context across multiple exchanges.
System Prompts
System prompts are powerful - they set Claude's behavior.
Basic System Prompt
Advanced System Prompt
From my production chatbot:
This dramatically improved response quality in my support chatbot.
Understanding Costs
LLM APIs charge per token. Understanding costs prevents surprises.
Token Counting
Output:
Cost Calculation
My production app processes ~1M tokens/day:
Input: 600K tokens = $1.80
Output: 400K tokens = $6.00
Daily cost: ~$7.80
Cost Optimization Tips
From my experience:
1. Choose the right model:
2. Limit max_tokens appropriately:
3. Use caching (covered in Part 4)
4. Batch similar requests
Error Handling
APIs fail. Always handle errors.
Basic Error Handling
Production Error Handling
From my API wrapper:
This saved me during a rate limit spike - automatic recovery instead of crashes.
Complete Example
Putting it all together - production-ready script:
Run it:
This is my template for every new Claude integration.
Best Practices
From building production LLM applications:
1. Always use environment variables:
2. Set appropriate max_tokens:
3. Use system prompts consistently:
4. Monitor token usage:
5. Handle errors gracefully:
Common Mistakes
Mistakes I made (and you'll avoid):
1. Hardcoding API keys ❌
2. No error handling ❌
3. Setting temperature too high ❌
4. Not tracking costs ❌
5. Blocking calls in async code ❌
What's Next?
You now know how to use Claude API. In Part 2, we'll build a complete FastAPI application that wraps Claude with proper async handling, request validation, and error handling.
Next: Part 2 - Building FastAPI Applications with Claude
Series Home: LLM API Development 101
This article is part of the LLM API Development 101 series. All examples use Python 3 and are based on real production applications.
Last updated