Part 1: Introduction to LLM APIs and Claude

Part of the LLM API Development 101 Series

My First Claude API Call

I remember my first time calling an LLM API. OpenAI GPT-3 had just become available. Built a simple script, made the call, got back gibberish.

Not because the model was bad. Because I set temperature=2.5 thinking higher was better. Learned quickly: temperature controls randomness, and values above 1.0 produce nonsense.

Fast forward to Claude: Same mistakes, different API. Sent a 150K token document, hit rate limits immediately. Burned through $50 in API credits in an hour of testing.

Now I know better. Let me save you the expensive lessons.

Understanding LLM APIs

What is an LLM API?

LLM API = REST endpoint to powerful language models. You send text (prompt), get back generated text (completion).

Key providers:

  • OpenAI - GPT-4, GPT-3.5 (most popular)

  • Anthropic - Claude (best for long context)

  • Google - PaLM, Gemini

  • Cohere - Command models

  • Open source - Llama, Mistral (self-hosted)

Why Claude?

I've used them all in production. Claude stands out for:

1. Massive context window - 200K tokens (≈150K words)

  • Entire codebases in one prompt

  • Full document analysis

  • Long conversation histories

2. Instruction following - Best at complex, multi-step tasks

  • Structured output generation

  • Following specific formatting rules

  • Consistent behavior

3. Safety and reliability - Constitutional AI approach

  • Reduces harmful outputs

  • More predictable responses

  • Better at refusing inappropriate requests

4. Price/performance - Competitive pricing

  • Claude 3 Opus: $15/1M input tokens, $75/1M output

  • Claude 3.5 Sonnet: $3/1M input, $15/1M output

  • Claude 3 Haiku: $0.25/1M input, $1.25/1M output

I use Claude for production applications because of this combination.

Setting Up Claude API

Get API Key

1. Create Anthropic account:

2. Generate API key:

  • Go to Settings → API Keys

  • Click "Create Key"

  • Copy and save securely (shown once!)

3. Set up billing:

  • Add payment method

  • Set usage limits (protect against accidents!)

Install Python SDK

I always use virtual environments:

Environment Setup

Never hardcode API keys! Use environment variables.

Create .env file:

Add to .gitignore:

Load in Python:

Your First API Call

Basic Example

Output:

This worked on my first try - Claude is reliable.

Understanding the Parameters

Key parameters:

model - Which Claude version:

  • claude-3-opus-20240229 - Most capable, expensive

  • claude-3-5-sonnet-20241022 - Best balance (I use this)

  • claude-3-haiku-20240307 - Fastest, cheapest

max_tokens - Maximum response length:

  • 1 token ≈ 0.75 words

  • 1024 tokens ≈ 750 words

  • You pay for tokens used, not max_tokens

  • Set high enough but not wasteful

temperature - Response randomness:

  • 0.0 = Deterministic, focused

  • 1.0 = Creative, varied (default)

  • I use 0.0 for code generation, 0.7 for writing

system - Sets behavior and context:

  • Define role and expertise

  • Set output format requirements

  • Provide background knowledge

  • Critical for consistent behavior

Working with Messages

Claude uses message format:

Roles:

  • user - Your inputs

  • assistant - Claude's responses

  • Messages alternate between user/assistant

Complete example with conversation:

This pattern maintains context across multiple exchanges.

System Prompts

System prompts are powerful - they set Claude's behavior.

Basic System Prompt

Advanced System Prompt

From my production chatbot:

This dramatically improved response quality in my support chatbot.

Understanding Costs

LLM APIs charge per token. Understanding costs prevents surprises.

Token Counting

Output:

Cost Calculation

My production app processes ~1M tokens/day:

  • Input: 600K tokens = $1.80

  • Output: 400K tokens = $6.00

  • Daily cost: ~$7.80

Cost Optimization Tips

From my experience:

1. Choose the right model:

2. Limit max_tokens appropriately:

3. Use caching (covered in Part 4)

4. Batch similar requests

Error Handling

APIs fail. Always handle errors.

Basic Error Handling

Production Error Handling

From my API wrapper:

This saved me during a rate limit spike - automatic recovery instead of crashes.

Complete Example

Putting it all together - production-ready script:

Run it:

This is my template for every new Claude integration.

Best Practices

From building production LLM applications:

1. Always use environment variables:

2. Set appropriate max_tokens:

3. Use system prompts consistently:

4. Monitor token usage:

5. Handle errors gracefully:

Common Mistakes

Mistakes I made (and you'll avoid):

1. Hardcoding API keys

2. No error handling

3. Setting temperature too high

4. Not tracking costs

5. Blocking calls in async code

What's Next?

You now know how to use Claude API. In Part 2, we'll build a complete FastAPI application that wraps Claude with proper async handling, request validation, and error handling.

Next: Part 2 - Building FastAPI Applications with Claude


Series Home: LLM API Development 101

This article is part of the LLM API Development 101 series. All examples use Python 3 and are based on real production applications.

Last updated