Part 2: Vector Embeddings Fundamentals

← Part 1: Introduction | Part 3: pgvector Setup →

The Product Recommendation Disaster

I was tasked with building a "Similar Products" feature for an e-commerce site. My first attempt was embarrassingly naive:

// My terrible first attempt
function getSimilarProducts(product: Product) {
  return products.filter(p => 
    p.category === product.category && 
    p.price >= product.price * 0.8 && 
    p.price <= product.price * 1.2
  );
}

The problem? This recommended a $1,200 MacBook Pro alongside a $1,000 Windows laptop because they were "similar price." Customers complained. The recommendations made no sense.

I needed a way to capture "similarity" beyond category and price. That's when I discovered embeddings.

After implementing vector embeddings:

// Using semantic similarity
const similar = await prisma.$queryRaw`
  SELECT * FROM products
  WHERE category = ${product.category}
  ORDER BY embedding <=> ${product.embedding}::vector
  LIMIT 6
`;

Click-through rate: 1.8% → 9.2%. Recommendations finally made sense.

This article explains what embeddings are, how they work, and how to use them effectively.

What Are Vector Embeddings?

Simple definition: Embeddings are numerical representations (arrays of numbers) that capture the semantic meaning of data.

Text as Numbers

Words and sentences are converted into vectors (arrays of numbers) where similar meanings have similar numbers.

// Conceptual example (actual values from OpenAI API)
"cat"    → [0.12, -0.34, 0.56, 0.78, ...]  // 1536 numbers
"dog"    → [0.15, -0.31, 0.52, 0.81, ...]  // Similar to "cat"
"car"    → [-0.67, 0.23, -0.89, 0.12, ...] // Very different from "cat"

The magic: Words with similar meanings have similar vectors → we can use math to find similar text.

Why Embeddings Work

Embeddings place words in multi-dimensional space where:

Distance = similarity
Close vectors = similar meaning
Far vectors = different meaning

Vector Space (simplified to 2D):
         
    cat • • dog
    pet •   
              • puppy
              
              
   • car
   • truck
         • automotive

In reality, embeddings use 384, 768, 1536, or more dimensions. More dimensions = more nuanced understanding.

How I Use Embeddings in Production

1. Generate Embeddings with OpenAI API

import { OpenAI } from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

async function getEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small', // 1536 dimensions, $0.02/1M tokens
    input: text,
  });
  
  return response.data[0].embedding;
}

// Example usage
const embedding = await getEmbedding("PostgreSQL vector search tutorial");
console.log(`Embedding dimensions: ${embedding.length}`); // 1536
console.log(`First 5 values: ${embedding.slice(0, 5)}`);
// [-0.0123, 0.456, -0.789, 0.234, -0.567]

2. Batch Processing for Efficiency

OpenAI API accepts up to 2048 inputs per request. Always batch for production:

async function getBatchEmbeddings(texts: string[]): Promise<number[][]> {
  const BATCH_SIZE = 2048;
  const allEmbeddings: number[][] = [];
  
  for (let i = 0; i < texts.length; i += BATCH_SIZE) {
    const batch = texts.slice(i, i + BATCH_SIZE);
    
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch,
    });
    
    const batchEmbeddings = response.data
      .sort((a, b) => a.index - b.index) // Ensure correct order
      .map(item => item.embedding);
    
    allEmbeddings.push(...batchEmbeddings);
    
    console.log(`Processed ${i + batch.length}/${texts.length} texts`);
  }
  
  return allEmbeddings;
}

// Usage: Embed 10,000 product descriptions
const products = await prisma.product.findMany({
  select: { id: true, name: true, description: true },
});

const texts = products.map(p => `${p.name} ${p.description}`);
const embeddings = await getBatchEmbeddings(texts);

// Store embeddings
for (let i = 0; i < products.length; i++) {
  await prisma.product.update({
    where: { id: products[i].id },
    data: { embedding: embeddings[i] },
  });
}

Cost calculation:

10,000 products × 50 tokens average = 500,000 tokens
$0.02 per 1M tokens = $0.01 total

Embeddings are incredibly cheap.

3. Embeddings for Different Data Types

Product Descriptions

interface Product {
  name: string;
  description: string;
  category: string;
  features: string[];
}

function productToText(product: Product): string {
  return [
    product.name,
    product.description,
    product.category,
    product.features.join(' '),
  ].join(' ');
}

const text = productToText({
  name: "Sony WH-1000XM5",
  description: "Premium wireless noise-canceling headphones",
  category: "Audio",
  features: ["ANC", "30hr battery", "Bluetooth 5.2"],
});

const embedding = await getEmbedding(text);

Documentation Search

interface DocChunk {
  title: string;
  section: string;
  content: string;
  url: string;
}

function docToText(doc: DocChunk): string {
  return `${doc.title}\n${doc.section}\n${doc.content}`;
}

// Split large docs into chunks (important!)
function chunkDocument(content: string, chunkSize: number = 1000): string[] {
  const chunks: string[] = [];
  const paragraphs = content.split('\n\n');
  
  let currentChunk = '';
  for (const para of paragraphs) {
    if ((currentChunk + para).length > chunkSize && currentChunk) {
      chunks.push(currentChunk.trim());
      currentChunk = para;
    } else {
      currentChunk += '\n\n' + para;
    }
  }
  
  if (currentChunk) chunks.push(currentChunk.trim());
  return chunks;
}

User Queries

async function searchProducts(query: string, limit: number = 10) {
  // User query → embedding → find similar product embeddings
  const queryEmbedding = await getEmbedding(query);
  
  const results = await prisma.$queryRaw`
    SELECT 
      id, name, description, price,
      1 - (embedding <=> ${queryEmbedding}::vector) as similarity
    FROM products
    WHERE 1 - (embedding <=> ${queryEmbedding}::vector) > 0.7
    ORDER BY embedding <=> ${queryEmbedding}::vector
    LIMIT ${limit}
  `;
  
  return results;
}

// User searches: "headphones with good battery life"
const results = await searchProducts("headphones with good battery life");
// Returns: Sony WH-1000XM5, Bose QC45, etc. (semantically relevant)

Understanding Similarity Metrics

How do we measure if two vectors are "similar"? Math.

Cosine Similarity (Recommended for Text)

Measures the angle between vectors, ignoring magnitude.

function cosineSimilarity(a: number[], b: number[]): number {
  const dotProduct = a.reduce((sum, val, i) => sum + val * b[i], 0);
  const magA = Math.sqrt(a.reduce((sum, val) => sum + val * val, 0));
  const magB = Math.sqrt(b.reduce((sum, val) => sum + val * val, 0));
  
  return dotProduct / (magA * magB);
}

// Example
const vec1 = await getEmbedding("machine learning");
const vec2 = await getEmbedding("artificial intelligence");
const vec3 = await getEmbedding("cooking recipes");

console.log(cosineSimilarity(vec1, vec2)); // ~0.85 (very similar)
console.log(cosineSimilarity(vec1, vec3)); // ~0.12 (not similar)

Range: -1 (opposite) to 1 (identical) PostgreSQL pgvector operator: <=>

-- Cosine distance (1 - cosine similarity)
SELECT 1 - (embedding <=> $1::vector) as similarity FROM docs;

Euclidean Distance (L2)

Measures straight-line distance in vector space.

function euclideanDistance(a: number[], b: number[]): number {
  return Math.sqrt(
    a.reduce((sum, val, i) => sum + Math.pow(val - b[i], 2), 0)
  );
}

const dist = euclideanDistance(vec1, vec2);
console.log(dist); // Smaller = more similar

Range: 0 (identical) to ∞ (very different) PostgreSQL pgvector operator: <->

-- L2 distance
SELECT embedding <-> $1::vector as distance FROM docs;

Dot Product

Combines magnitude and angle. Use when vectors are normalized.

function dotProduct(a: number[], b: number[]): number {
  return a.reduce((sum, val, i) => sum + val * b[i], 0);
}

PostgreSQL pgvector operator: <#> (negative dot product)

-- Negative dot product (for pgvector's sorting semantics)
SELECT (embedding <#> $1::vector) * -1 as similarity FROM docs;

Which One to Use?

Similarity Metric

Best For

pgvector Operator

Cosine Similarity

Text embeddings (OpenAI, Sentence Transformers)

<=>

Euclidean Distance

Spatial data, coordinates, image embeddings

<->

Dot Product

Normalized vectors, some ML models

<#>

For text with OpenAI embeddings, use cosine similarity (<=>).

Embedding Models Comparison

OpenAI Embeddings (What I Use)

// text-embedding-3-small (Recommended)
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-small',
  input: 'Your text here',
});
// Dimensions: 1536
// Cost: $0.02 / 1M tokens
// Performance: Excellent
// Speed: ~500ms for single request

// text-embedding-3-large (More accurate, pricier)
const embedding = await openai.embeddings.create({
  model: 'text-embedding-3-large',
  input: 'Your text here',
});
// Dimensions: 3072
// Cost: $0.13 / 1M tokens
// Performance: Best
// Speed: ~700ms for single request

Pros:

State-of-the-art quality
Easy API, no infrastructure
Multilingual support
Consistent updates

Cons:

Costs money (though cheap)
API dependency
Data leaves your infrastructure

Local Embedding Models (Free, Private)

import { pipeline } from '@xenova/transformers';

// Load model once (downloads ~100MB first time)
const embedder = await pipeline(
  'feature-extraction',
  'Xenova/all-MiniLM-L6-v2'
);

async function getLocalEmbedding(text: string): Promise<number[]> {
  const output = await embedder(text, {
    pooling: 'mean',
    normalize: true,
  });
  
  return Array.from(output.data);
}

// Usage
const embedding = await getLocalEmbedding("PostgreSQL tutorial");
console.log(embedding.length); // 384 dimensions

Pros:

Free
No API calls / rate limits
Data stays private
Works offline

Cons:

Lower quality than OpenAI
Requires more setup
Slower (CPU inference)
Model management overhead

My Recommendation

Start with OpenAI text-embedding-3-small:

Production-ready immediately
Excellent quality
Very cheap ($0.02/1M tokens)
No infrastructure needed

Switch to local models if:

High volume (>100M embeddings)
Privacy requirements (healthcare, finance)
No internet access
Want to avoid API dependencies

Visualizing Embeddings (2D Projection)

Embeddings are 1536-dimensional, impossible to visualize directly. We can use dimensionality reduction to project to 2D:

import { PCA } from 'ml-pca';

interface Point2D {
  x: number;
  y: number;
  label: string;
}

function reduceTo 2D(embeddings: number[][], labels: string[]): Point2D[] {
  // PCA: Principal Component Analysis
  const pca = new PCA(embeddings);
  const reduced = pca.predict(embeddings, { nComponents: 2 });
  
  return reduced.to2DArray().map((point, i) => ({
    x: point[0],
    y: point[1],
    label: labels[i],
  }));
}

// Example: Visualize product categories
const products = [
  "MacBook Pro", "ThinkPad X1", "Dell XPS",           // Laptops
  "iPhone 14", "Samsung Galaxy", "Google Pixel",      // Phones
  "Honda Civic", "Toyota Camry", "Ford Mustang",      // Cars
];

const embeddings = await getBatchEmbeddings(products);
const points = reduceTo 2D(embeddings, products);

console.log(points);
// Laptops cluster together, phones cluster together, cars cluster together

Output visualization:

                Phones
        • iPhone 14
    • Galaxy    • Pixel
    
    
          Laptops
    • MacBook
        • ThinkPad
            • Dell XPS
            
            
                        Cars
                    • Civic
                • Camry    • Mustang

Similar items cluster together in vector space!

Practical Embedding Tips from Production

1. Text Preprocessing

function preprocessText(text: string): string {
  return text
    .toLowerCase()                          // Normalize case
    .replace(/\s+/g, ' ')                   // Normalize whitespace
    .replace(/[^\w\s.-]/g, '')              // Remove special chars
    .trim()
    .slice(0, 8000);                        // OpenAI limit: ~8K tokens
}

const embedding = await getEmbedding(
  preprocessText(product.description)
);

2. Handle Empty or Short Text

async function safeGetEmbedding(text: string): Promise<number[] | null> {
  const cleaned = preprocessText(text);
  
  if (cleaned.length < 3) {
    console.warn('Text too short for embedding');
    return null;
  }
  
  try {
    return await getEmbedding(cleaned);
  } catch (error) {
    console.error('Embedding failed:', error);
    return null;
  }
}

3. Cache Embeddings (They Don't Change)

import { createHash } from 'crypto';

const embeddingCache = new Map<string, number[]>();

async function getCachedEmbedding(text: string): Promise<number[]> {
  const hash = createHash('sha256').update(text).digest('hex');
  
  if (embeddingCache.has(hash)) {
    return embeddingCache.get(hash)!;
  }
  
  const embedding = await getEmbedding(text);
  embeddingCache.set(hash, embedding);
  
  return embedding;
}

For persistent caching, store in database:

model EmbeddingCache {
  id        Int      @id @default(autoincrement())
  textHash  String   @unique
  embedding Float[]  // or use vector type
  createdAt DateTime @default(now())
}

4. Chunk Long Documents

interface DocumentChunk {
  id: string;
  documentId: string;
  chunkIndex: number;
  content: string;
  embedding?: number[];
}

function chunkDocument(
  doc: { id: string; content: string },
  chunkSize: number = 1000,
  overlap: number = 200
): DocumentChunk[] {
  const chunks: DocumentChunk[] = [];
  let startIndex = 0;
  let chunkIndex = 0;
  
  while (startIndex < doc.content.length) {
    const endIndex = Math.min(startIndex + chunkSize, doc.content.length);
    const chunk = doc.content.slice(startIndex, endIndex);
    
    chunks.push({
      id: `${doc.id}_chunk_${chunkIndex}`,
      documentId: doc.id,
      chunkIndex,
      content: chunk,
    });
    
    startIndex += chunkSize - overlap; // Overlap to preserve context
    chunkIndex++;
  }
  
  return chunks;
}

// Usage
const doc = await prisma.document.findUnique({ where: { id: docId } });
const chapters = chunkDocument(doc, 1000, 200);

for (const chunk of chunks) {
  const embedding = await getEmbedding(chunk.content);
  
  await prisma.documentChunk.create({
    data: {
      ...chunk,
      embedding,
    },
  });
}

Why chunk?

OpenAI has ~8K token limit
Smaller chunks = more precise search
Overlap preserves context between chunks

5. Retry Logic for API Calls

async function getEmbeddingWithRetry(
  text: string,
  maxRetries: number = 3
): Promise<number[]> {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await getEmbedding(text);
    } catch (error: any) {
      if (i === maxRetries - 1) throw error;
      
      // Exponential backoff
      const delay = Math.pow(2, i) * 1000;
      console.log(`Retry ${i + 1}/${maxRetries} after ${delay}ms`);
      await new Promise(resolve => setTimeout(resolve, delay));
    }
  }
  
  throw new Error('Max retries exceeded');
}

Complete Example: Product Search with Embeddings

import { PrismaClient } from '@prisma/client';
import { OpenAI } from 'openai';

const prisma = new PrismaClient();
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

// 1. Embed all products (run once)
async function embedAllProducts() {
  const products = await prisma.product.findMany({
    where: { embedding: null }, // Only products without embeddings
  });
  
  console.log(`Embedding ${products.length} products...`);
  
  const texts = products.map(p => `${p.name} ${p.description}`);
  const embeddings = await getBatchEmbeddings(texts);
  
  for (let i = 0; i < products.length; i++) {
    await prisma.product.update({
      where: { id: products[i].id },
      data: { embedding: embeddings[i] },
    });
    
    if ((i + 1) % 100 === 0) {
      console.log(`Progress: ${i + 1}/${products.length}`);
    }
  }
  
  console.log('Done!');
}

// 2. Search products by semantic meaning
async function searchProducts(query: string, limit: number = 10) {
  const queryEmbedding = await getEmbedding(query);
  
  const results = await prisma.$queryRaw`
    SELECT 
      id,
      name,
      description,
      price,
      image_url,
      1 - (embedding <=> ${queryEmbedding}::vector) as similarity
    FROM products
    WHERE 1 - (embedding <=> ${queryEmbedding}::vector) > 0.6
    ORDER BY embedding <=> ${queryEmbedding}::vector
    LIMIT ${limit}
  `;
  
  return results;
}

// 3. API endpoint
app.get('/api/search', async (req, res) => {
  const { q } = req.query;
  
  if (!q || typeof q !== 'string') {
    return res.status(400).json({ error: 'Query required' });
  }
  
  const results = await searchProducts(q);
  res.json({ results });
});

// Usage
const results = await searchProducts("wireless headphones for running");
console.log(results);

What's Next

In this article, you learned:

✅ What embeddings are and why they work
✅ How to generate embeddings with OpenAI API
✅ Similarity metrics (cosine, euclidean, dot product)
✅ Embedding models comparison
✅ Production tips (batching, caching, chunking, retries)
✅ Complete product search example

Next: We'll set up PostgreSQL with pgvector extension, create vector columns, and implement indexes for fast similarity search.

← Part 1: Introduction | Part 3: pgvector Setup →

PreviousPart 1: Introduction to Vector Databases NextPart 3: Setting Up pgvector with PostgreSQL

Last updated 15 hours ago

hashtagThe Product Recommendation Disaster

hashtagWhat Are Vector Embeddings?

hashtagText as Numbers

hashtagWhy Embeddings Work

hashtagHow I Use Embeddings in Production

hashtag1. Generate Embeddings with OpenAI API

hashtag2. Batch Processing for Efficiency

hashtag3. Embeddings for Different Data Types

hashtagProduct Descriptions

hashtagDocumentation Search

hashtagUser Queries

hashtagUnderstanding Similarity Metrics

hashtagCosine Similarity (Recommended for Text)

hashtagEuclidean Distance (L2)

hashtagDot Product

hashtagWhich One to Use?

hashtagEmbedding Models Comparison

hashtagOpenAI Embeddings (What I Use)

hashtagLocal Embedding Models (Free, Private)

hashtagMy Recommendation

hashtagVisualizing Embeddings (2D Projection)

hashtagPractical Embedding Tips from Production

hashtag1. Text Preprocessing

hashtag2. Handle Empty or Short Text

hashtag3. Cache Embeddings (They Don't Change)

hashtag4. Chunk Long Documents

hashtag5. Retry Logic for API Calls

hashtagComplete Example: Product Search with Embeddings

hashtagWhat's Next