Threat Modeling and Risk Assessment

The $2.4M Security Breach We Didn't See Coming

October 15, 2020. 2:47 AM. My phone exploded with alerts.

"User data exposed. Elasticsearch cluster publicly accessible. 2.4 million customer records compromised."

The post-mortem was brutal. The vulnerability was embarrassingly simple: an Elasticsearch cluster deployed without authentication. The security team had never reviewed the architecture. Developers didn't know it was a risk. Nobody had asked: "What could go wrong with this design?"

The regulatory fines: $2.4 million. The customer trust lost: Incalculable. The lesson: Painful.

We had all the security tools—SAST, DAST, container scanning. But we never asked the most important question: "What are we protecting, and from whom?"

This breach taught me that security tools catch vulnerabilities in code, but threat modeling catches vulnerabilities in thinking. You can scan code all day, but if your architecture is fundamentally insecure, no amount of tooling will save you.

This article documents the threat modeling framework we built after that breach—a systematic approach to identifying security risks before writing code. In the 3 years since implementing it, we haven't had a single architecture-level breach.

What You'll Learn

STRIDE threat modeling framework
Risk assessment and prioritization
Threat modeling for microservices
Data flow diagram analysis
Security requirements from threats
Integrating threat modeling into SDLC

What is Threat Modeling?

Threat modeling is structured thinking about security:

Threat Modeling:
  
  Definition:
    "Systematic approach to identifying, evaluating, and mitigating
     security threats in system design"
  
  Core Questions:
    1. What are we building?
    2. What can go wrong?
    3. What are we going to do about it?
    4. Did we do a good job?
  
  When to Do It:
    - Design phase (before coding)
    - Architecture changes
    - New feature development
    - After security incidents
    - Quarterly for existing systems
  
  Who Should Participate:
    - Product Owner (business context)
    - Architect (system design)
    - Developers (implementation details)
    - Security Engineer (threat expertise)
    - SRE (operational risks)

Why Threat Modeling Matters

Traditional security testing catches implementation bugs. Threat modeling catches design flaws.

Example:

SAST finds: SQL injection vulnerability in login code
Threat modeling finds: Login service lacks rate limiting, enabling brute force attacks

The STRIDE Threat Modeling Framework

STRIDE is a mnemonic for six threat categories, created by Microsoft:

STRIDE Framework:

  S - Spoofing Identity:
    Description: "Pretending to be someone/something else"
    Examples:
      - Stolen authentication tokens
      - Forged JWT tokens
      - IP spoofing
    Mitigation: Strong authentication, MFA, token validation
  
  T - Tampering with Data:
    Description: "Unauthorized modification of data"
    Examples:
      - Man-in-the-middle attacks
      - Database injection
      - Message replay attacks
    Mitigation: Encryption in transit/rest, integrity checks, signing
  
  R - Repudiation:
    Description: "Denying actions without proof"
    Examples:
      - No audit logs
      - Non-attributable actions
      - Log tampering
    Mitigation: Comprehensive logging, immutable logs, digital signatures
  
  I - Information Disclosure:
    Description: "Exposing information to unauthorized users"
    Examples:
      - Data leaks
      - Verbose error messages
      - Unencrypted data
    Mitigation: Encryption, access controls, data classification
  
  D - Denial of Service:
    Description: "Making system unavailable"
    Examples:
      - Resource exhaustion
      - Infinite loops
      - Unvalidated redirects
    Mitigation: Rate limiting, input validation, resource quotas
  
  E - Elevation of Privilege:
    Description: "Gaining unauthorized access/permissions"
    Examples:
      - Privilege escalation
      - RBAC bypass
      - Container breakout
    Mitigation: Principle of least privilege, RBAC, sandboxing

Threat Modeling Process

Step 1: Diagram the System

Create a Data Flow Diagram (DFD) showing:

External entities (users, external systems)
Processes (services, functions)
Data stores (databases, caches)
Data flows (API calls, events)
Trust boundaries (security zones)

Example: E-Commerce Microservices Architecture

Step 2: Identify Threats Using STRIDE

For each component and each data flow, ask STRIDE questions:

Example: Payment Service Analysis

Component

STRIDE Category

Threat

Likelihood

Impact

Risk

Payment Service

Spoofing

Attacker impersonates payment service

Medium

Critical

High

Tampering

Order amount modified in transit

Low

Critical

Medium

Repudiation

Payment processed without audit trail

Low

High

Medium

Info Disclosure

Credit card data logged in plaintext

High

Critical

Critical

DoS

Service overwhelmed by requests

Medium

High

High

Elevation

Attacker gains admin access to payments

Low

Critical

High

Example: Data Flow Analysis (API Gateway → Payment Service)

Data Flow: API Gateway → Payment Service

Threats:
  
  Spoofing:
    - Attacker spoofs API Gateway
    - Stolen service-to-service credentials
    Mitigation: Mutual TLS, JWT validation
  
  Tampering:
    - Payment amount modified in transit
    - Message replay attack
    Mitigation: TLS encryption, request signing, nonce/timestamp
  
  Information Disclosure:
    - Payment data intercepted
    - Logs contain sensitive data
    Mitigation: TLS 1.3, tokenization, log scrubbing
  
  Denial of Service:
    - Payment service flooded with requests
    Mitigation: Rate limiting, circuit breakers

Step 3: Assess Risk

Use a risk matrix to prioritize threats:

Risk = Likelihood × Impact

Likelihood:
  - Low: Difficult to exploit, requires insider access
  - Medium: Possible with moderate effort
  - High: Easy to exploit, common attack vector

Impact:
  - Low: Minimal damage, easily recoverable
  - Medium: Moderate damage, customer impact
  - High: Severe damage, regulatory implications
  - Critical: Business-ending, massive breach

Risk Levels:
  - Critical: Immediate action required
  - High: Address within sprint
  - Medium: Address within quarter
  - Low: Accept or defer

Our Risk Matrix:

               IMPACT
           Low  Med  High  Crit
        ┌─────┬─────┬─────┬─────┐
Low     │  L  │  L  │  M  │  M  │
        ├─────┼─────┼─────┼─────┤
LIKELY  Med  │  L  │  M  │  H  │  C  │
        ├─────┼─────┼─────┼─────┤
High    │  M  │  H  │  C  │  C  │
        └─────┴─────┴─────┴─────┘

L = Low Risk (Accept)
M = Medium Risk (Mitigate)
H = High Risk (Mitigate ASAP)
C = Critical Risk (Mitigate Immediately)

Step 4: Mitigate Threats

For each identified threat, choose a mitigation strategy:

Mitigation Strategies:

  1. Redesign:
    - Change architecture to eliminate threat
    - Example: Remove direct database access, use API
  
  2. Apply Standard Mitigations:
    - Authentication, authorization, encryption
    - Example: Add JWT validation
  
  3. Implement Controls:
    - Add security controls
    - Example: Rate limiting, WAF rules
  
  4. Accept Risk:
    - Document and accept low-risk threats
    - Example: Low-impact DoS on public FAQ

Example Mitigations for Payment Service:

Threat

Mitigation

Implementation

Payment data disclosure

Encryption + Tokenization

Use Stripe tokens, never store cards

Service spoofing

Mutual TLS

mTLS between services

Amount tampering

Request signing

HMAC signature validation

DoS attacks

Rate limiting

10 req/sec per user

Missing audit trail

Comprehensive logging

Log all payment events to immutable store

Step 5: Validate Mitigations

After implementing mitigations, verify they work:

Validation Methods:

  1. Penetration Testing:
    - Hire external security firm
    - Test mitigations under attack
  
  2. Security Testing:
    - SAST/DAST scans
    - Dependency checking
  
  3. Architecture Review:
    - Peer review by security team
    - Validate design matches implementation
  
  4. Threat Modeling Review:
    - Quarterly review of threats
    - Update as architecture evolves

Real Example: Microservices Threat Model

Let me show you the actual threat model we created for our e-commerce platform.

System Overview

12 microservices, event-driven architecture:

Services:
  - api-gateway (public)
  - auth-service
  - user-service
  - product-service
  - cart-service
  - order-service
  - payment-service
  - inventory-service
  - shipping-service
  - notification-service
  - analytics-service
  - admin-service

Data Stores:
  - PostgreSQL (users, products, orders)
  - Redis (sessions, cache)
  - Elasticsearch (search, analytics)
  - S3 (product images)

External:
  - Stripe (payments)
  - SendGrid (email)
  - Twilio (SMS)

Threat Model Session

Participants: Product Manager, Lead Architect, 2 Developers, Security Engineer, SRE

Duration: 3 hours

Output: 47 identified threats, 12 critical, 18 high priority

Critical Threats Found

Threat 1: Elasticsearch Public Exposure

Component: Analytics Service → Elasticsearch Category: Information Disclosure Description: Elasticsearch cluster accessible from internet without authentication Impact: CRITICAL (2.4M customer records exposed) Likelihood: HIGH (default config, easy to exploit) Risk: CRITICAL

Mitigation:

Immediate:
  - Move Elasticsearch to private subnet
  - Enable authentication (X-Pack Security)
  - Restrict access to application network only
  - Enable audit logging

Long-term:
  - Implement network segmentation
  - Add VPN for admin access
  - Regular penetration testing

Threat 2: Service-to-Service Authentication Missing

Component: Order Service → Payment Service Category: Spoofing, Elevation of Privilege Description: Services communicate without authentication, any service can call payment Impact: CRITICAL (fraudulent payments) Likelihood: MEDIUM (requires compromised service) Risk: HIGH

Mitigation:

Implementation:
  - Mutual TLS between services
  - Service mesh (Istio) for automatic mTLS
  - JWT validation for service identity
  - Network policies (Kubernetes)

Code:
  # payment-service/middleware/verify-service-identity.ts
  export async function verifyServiceIdentity(req, res, next) {
    const clientCert = req.socket.getPeerCertificate();
    
    if (!clientCert || Object.keys(clientCert).length === 0) {
      return res.status(401).json({ error: 'Client certificate required' });
    }
    
    // Verify certificate is from allowed service
    const allowedServices = ['order-service', 'admin-service'];
    const serviceName = clientCert.subject.CN;
    
    if (!allowedServices.includes(serviceName)) {
      logger.warn('Unauthorized service attempted access', { serviceName });
      return res.status(403).json({ error: 'Unauthorized service' });
    }
    
    req.serviceIdentity = serviceName;
    next();
  }

Threat 3: Payment Amount Tampering

Component: Cart Service → Order Service → Payment Service Category: Tampering Description: Attacker modifies cart total between services Impact: HIGH (financial loss) Likelihood: MEDIUM (requires intercepting service calls) Risk: HIGH

Mitigation:

// order-service/services/create-order.ts
import crypto from 'crypto';

interface OrderRequest {
  userId: string;
  items: CartItem[];
  total: number;
  signature?: string;
}

export async function createOrder(orderReq: OrderRequest) {
  // 1. Recalculate total server-side (never trust client)
  const calculatedTotal = orderReq.items.reduce(
    (sum, item) => sum + (item.price * item.quantity),
    0
  );
  
  // 2. Verify total matches
  if (calculatedTotal !== orderReq.total) {
    throw new Error('Order total mismatch - possible tampering');
  }
  
  // 3. Sign the order for downstream services
  const orderData = {
    userId: orderReq.userId,
    items: orderReq.items,
    total: calculatedTotal,
    timestamp: Date.now()
  };
  
  const signature = crypto
    .createHmac('sha256', process.env.ORDER_SIGNING_KEY)
    .update(JSON.stringify(orderData))
    .digest('hex');
  
  // 4. Pass signed order to payment service
  return {
    ...orderData,
    signature
  };
}

// payment-service/services/process-payment.ts
export async function processPayment(order: SignedOrder) {
  // 1. Verify signature from order service
  const expectedSignature = crypto
    .createHmac('sha256', process.env.ORDER_SIGNING_KEY)
    .update(JSON.stringify({
      userId: order.userId,
      items: order.items,
      total: order.total,
      timestamp: order.timestamp
    }))
    .digest('hex');
  
  if (order.signature !== expectedSignature) {
    logger.error('Order signature invalid - possible tampering', { order });
    throw new Error('Invalid order signature');
  }
  
  // 2. Check timestamp (prevent replay attacks)
  const age = Date.now() - order.timestamp;
  if (age > 5 * 60 * 1000) {  // 5 minutes
    throw new Error('Order expired');
  }
  
  // 3. Process payment with Stripe
  return await stripe.charges.create({
    amount: order.total,
    currency: 'usd',
    source: order.paymentToken,
    metadata: {
      userId: order.userId,
      timestamp: order.timestamp
    }
  });
}

Threat Model Document

We document all threats in a structured format:

# E-Commerce Platform Threat Model

**Date**: 2023-10-15  
**Version**: 2.1  
**Participants**: [names]  
**Next Review**: 2024-01-15

## 1. System Overview

[Architecture diagram]

## 2. Assets

- Customer PII (2.4M records)
- Payment information (tokenized)
- Order history
- Product catalog
- User credentials

## 3. Trust Boundaries

- Internet ↔ Load Balancer (TLS)
- DMZ ↔ Application Network (mTLS)
- Application ↔ Data Network (VPN)
- Internal ↔ External APIs (TLS + API keys)

## 4. Identified Threats

### CRITICAL (12 threats)

#### TM-001: Elasticsearch Public Exposure
- **Category**: Information Disclosure
- **Impact**: CRITICAL
- **Likelihood**: HIGH
- **Risk**: CRITICAL
- **Status**: MITIGATED
- **Mitigation**: Network segmentation, authentication enabled
- **Owner**: Platform Team
- **Due Date**: 2023-10-20

[... 11 more critical threats]

### HIGH (18 threats)

[... details]

## 5. Mitigations

[Implementation details]

## 6. Residual Risks

[Accepted risks with justification]

Threat Modeling for Common Patterns

Pattern 1: API Gateway

Threats:
  
  Spoofing:
    - Threat: Attacker bypasses gateway, calls services directly
    - Mitigation: Network policies, service mesh, mutual TLS
  
  Tampering:
    - Threat: Request modified in transit
    - Mitigation: TLS encryption, request signing
  
  Denial of Service:
    - Threat: Gateway overwhelmed with requests
    - Mitigation: Rate limiting, WAF, DDoS protection
  
  Elevation of Privilege:
    - Threat: Bypass authorization checks
    - Mitigation: Centralized authorization, JWT validation

Pattern 2: Message Queue

Threats:
  
  Information Disclosure:
    - Threat: Messages intercepted in queue
    - Mitigation: Message encryption, TLS to queue
  
  Tampering:
    - Threat: Messages modified in queue
    - Mitigation: Message signing, immutable queues
  
  Repudiation:
    - Threat: Can't prove message origin
    - Mitigation: Digital signatures, audit trail
  
  Denial of Service:
    - Threat: Queue flooded with messages
    - Mitigation: Queue depth limits, dead letter queues

Pattern 3: Database

Threats:
  
  Information Disclosure:
    - Threat: Unauthorized database access
    - Mitigation: Network isolation, encryption at rest, access controls
  
  Tampering:
    - Threat: Data corruption or modification
    - Mitigation: RBAC, audit logging, backups
  
  Elevation of Privilege:
    - Threat: SQL injection, privilege escalation
    - Mitigation: Parameterized queries, least privilege, read replicas

Integrating Threat Modeling into SDLC

When to Threat Model

Threat Modeling Checklist

## Threat Modeling Checklist

### Preparation
- [ ] Identify participants (product, arch, dev, security, SRE)
- [ ] Schedule 2-3 hour session
- [ ] Prepare architecture diagrams
- [ ] List all assets and data flows

### Modeling Session
- [ ] Create/update data flow diagram
- [ ] Mark trust boundaries
- [ ] Apply STRIDE to each component
- [ ] Apply STRIDE to each data flow
- [ ] Assess risk (likelihood × impact)
- [ ] Prioritize threats

### Post-Session
- [ ] Document all threats
- [ ] Create mitigation tasks
- [ ] Assign owners and due dates
- [ ] Schedule follow-up review
- [ ] Share with broader team

### Validation
- [ ] Review implementation matches design
- [ ] Penetration test critical flows
- [ ] Security scan for known vulnerabilities
- [ ] Quarterly re-assessment

Real Results

After implementing threat modeling:

Before Threat Modeling (2019-2020):
  Security Incidents: 8 per year
  Average Breach Cost: $1.2M
  Architecture Flaws Found: 0 (we didn't look)
  Security in Design Phase: 0%

After Threat Modeling (2021-2023):
  Security Incidents: 0 architecture-level breaches
  Average Breach Cost: $0 (prevented)
  Architecture Flaws Found: 127 (before coding)
  Security in Design Phase: 100%
  
Cost to Fix:
  - Design phase: 1 hour (threat modeling session)
  - Development: 2 days (rework during coding)
  - Production: 2-6 months (incident response, remediation)
  
Savings: ~$3.6M over 3 years in prevented breaches

The Elasticsearch breach cost us $2.4M. Threat modeling would have caught it in a 30-minute session.

Key Takeaways

✅ Threat modeling finds design flaws that code scanning misses ✅ STRIDE framework provides systematic threat identification ✅ Fix in design is 100x cheaper than fixing in production ✅ 3-hour modeling session prevents months of incident response ✅ Include diverse participants - each perspective finds unique threats ✅ Make it routine - quarterly reviews catch architectural drift

What's Next

You now have the tools to identify threats in your architecture. The next phase of our DevSecOps journey covers application security testing—starting with SAST (Static Application Security Testing) to catch vulnerabilities in your code.

Next Article: Static Application Security Testing (SAST) →

Part of the DevSecOps 101 Series

PreviousSecurity-First CI/CD Pipeline NextStatic Application Security Testing (SAST)

Last updated 1 month ago

hashtagThe $2.4M Security Breach We Didn't See Coming

hashtagWhat You'll Learn

hashtagWhat is Threat Modeling?

hashtagWhy Threat Modeling Matters

hashtagThe STRIDE Threat Modeling Framework

hashtagThreat Modeling Process

hashtagStep 1: Diagram the System

hashtagStep 2: Identify Threats Using STRIDE

hashtagStep 3: Assess Risk

hashtagStep 4: Mitigate Threats

hashtagStep 5: Validate Mitigations

hashtagReal Example: Microservices Threat Model

hashtagSystem Overview

hashtagThreat Model Session

hashtagCritical Threats Found

hashtagThreat 1: Elasticsearch Public Exposure

hashtagThreat 2: Service-to-Service Authentication Missing

hashtagThreat 3: Payment Amount Tampering

hashtagThreat Model Document

hashtagThreat Modeling for Common Patterns

hashtagPattern 1: API Gateway

hashtagPattern 2: Message Queue

hashtagPattern 3: Database

hashtagIntegrating Threat Modeling into SDLC

hashtagWhen to Threat Model

hashtagThreat Modeling Checklist

hashtagReal Results

hashtagKey Takeaways

hashtagWhat's Next

The $2.4M Security Breach We Didn't See Coming

What You'll Learn

What is Threat Modeling?

Why Threat Modeling Matters

The STRIDE Threat Modeling Framework

Threat Modeling Process

Step 1: Diagram the System

Step 2: Identify Threats Using STRIDE

Step 3: Assess Risk

Step 4: Mitigate Threats

Step 5: Validate Mitigations

Real Example: Microservices Threat Model

System Overview

Threat Model Session

Critical Threats Found

Threat 1: Elasticsearch Public Exposure

Threat 2: Service-to-Service Authentication Missing

Threat 3: Payment Amount Tampering

Threat Model Document

Threat Modeling for Common Patterns

Pattern 1: API Gateway

Pattern 2: Message Queue

Pattern 3: Database

Integrating Threat Modeling into SDLC

When to Threat Model

Threat Modeling Checklist

Real Results

Key Takeaways

What's Next