# Part 2: Data Structures, Type Hints and Pydantic

## Introduction

Building `ansible-inspec` meant modelling quite a few domain objects: job templates, execution results, profile metadata, credential bundles, and API request/response bodies. Python's native data structures get you far, but after a certain point you want the compiler (and runtime) to catch shape mismatches early. That's where Pydantic v2 earns its keep alongside Python 3.12's improved type annotation syntax.

This part covers the core data structures I reach for and how I model domain data for a real project.

***

## Python's Built-in Data Structures

### Lists

```python
# Ordered, mutable, allows duplicates
hosts: list[str] = ["web-01", "web-02", "db-01"]

hosts.append("cache-01")          # add to end
hosts.insert(0, "lb-01")          # add at index
hosts.remove("db-01")             # remove by value
popped = hosts.pop()              # remove and return last item

# List comprehension — I use this constantly in ansible-inspec
active_hosts = [h for h in hosts if h.startswith("web")]

# Slicing
first_two = hosts[:2]
reversed_hosts = hosts[::-1]
```

### Dictionaries

The most-used structure in any Python automation code:

```python
# Ordered (3.7+), mutable, key-value store
job_template: dict[str, str | int | bool] = {
    "name": "linux-baseline",
    "profile": "dev-sec/linux-baseline",
    "timeout": 300,
    "supermarket": True,
}

# Safe access
timeout = job_template.get("timeout", 60)   # default 60 if key missing

# Merge two dicts (3.9+ syntax)
defaults = {"timeout": 60, "reporter": "cli"}
overrides = {"timeout": 300}
merged = defaults | overrides                # {"timeout": 300, "reporter": "cli"}

# dict comprehension
host_map = {h: f"192.168.1.{i+10}" for i, h in enumerate(["web-01", "web-02"])}
# {"web-01": "192.168.1.10", "web-02": "192.168.1.11"}
```

### Sets

Useful for deduplication — e.g. collecting unique failed controls across runs:

```python
run_a_failures: set[str] = {"sshd-01", "sshd-03", "pkg-audit"}
run_b_failures: set[str] = {"sshd-01", "pkg-audit", "fs-permissions"}

# Failures in both runs (intersection)
persistent = run_a_failures & run_b_failures   # {"sshd-01", "pkg-audit"}

# All unique failures (union)
all_failures = run_a_failures | run_b_failures

# Failures only in run_b (difference)
new_failures = run_b_failures - run_a_failures  # {"fs-permissions"}
```

### Tuples

Immutable sequences — I use them for fixed-shape data like `(host, port)` pairs:

```python
# Named tuple — gives fields a name without a full class
from typing import NamedTuple

class HostConnection(NamedTuple):
    host: str
    port: int
    user: str

conn = HostConnection(host="web-01", port=22, user="deploy")
print(conn.host)      # web-01
print(conn[1])        # 22  — still tuple-indexable

# Unpacking
host, port, user = conn
```

***

## Type Hints in Python 3.12

Python is dynamically typed but type hints + `mypy`/`pyright` give you static checking. I run `mypy --strict` on `ansible-inspec`.

### Basic annotations

```python
# Variables
name: str = "ansible-inspec"
version: tuple[int, int, int] = (0, 2, 12)
debug: bool = False

# Functions — always annotate return types
def build_connection_uri(host: str, port: int = 22, user: str = "root") -> str:
    return f"ssh://{user}@{host}:{port}"

# `None` return type
def log_result(message: str) -> None:
    print(f"[INFO] {message}")
```

### Union types

```python
# 3.10+ — use `|` instead of Union[X, Y]
def parse_port(value: str | int) -> int:
    if isinstance(value, str):
        return int(value)
    return value

# Optional is just X | None
def find_template(name: str) -> dict | None:
    templates = load_templates()
    return templates.get(name)
```

### Generics with built-in types (3.9+)

No more `from typing import List, Dict, Tuple, Set` — use lowercase directly:

```python
def group_by_status(
    results: list[dict[str, str]]
) -> dict[str, list[dict[str, str]]]:
    groups: dict[str, list[dict[str, str]]] = {"passed": [], "failed": [], "skipped": []}
    for r in results:
        status = r.get("status", "skipped")
        groups[status].append(r)
    return groups
```

### `TypedDict` — typed dicts without a full class

Useful for dicts that come from JSON/YAML config files:

```python
from typing import TypedDict, Required, NotRequired

class JobTemplateDict(TypedDict):
    name: str
    profile: str
    timeout: NotRequired[int]          # optional key
    supermarket: NotRequired[bool]

def validate_template(t: JobTemplateDict) -> bool:
    return bool(t.get("name") and t.get("profile"))
```

### `type` alias (3.12)

```python
# 3.12 first-class type aliases
type Hostname = str
type JobID = str
type ResultMap = dict[JobID, list[dict[str, str | bool]]]

def collect_results(job_ids: list[JobID]) -> ResultMap:
    return {jid: [] for jid in job_ids}
```

### `@overload` — when a function returns different types based on input

```python
from typing import overload

@overload
def load_config(path: str) -> dict[str, str]: ...
@overload
def load_config(path: None) -> None: ...

def load_config(path: str | None) -> dict[str, str] | None:
    if path is None:
        return None
    import tomllib
    with open(path, "rb") as f:
        return tomllib.load(f)       # tomllib is stdlib in 3.11+
```

***

## Pydantic v2

Pydantic is the validation and serialisation library that powers FastAPI. `ansible-inspec`'s API models are all Pydantic v2 models. The upgrade from v1 to v2 is significant — v2 rewrote the core in Rust and adopted a stricter configuration model.

### Basic model

```python
from pydantic import BaseModel, Field, field_validator
from datetime import datetime

class JobTemplate(BaseModel):
    name: str
    profile: str
    timeout: int = Field(default=300, ge=10, le=3600)  # 10s–1h
    supermarket: bool = False
    tags: list[str] = Field(default_factory=list)

# Instantiation validates automatically
template = JobTemplate(name="linux-baseline", profile="dev-sec/linux-baseline")
print(template.timeout)   # 300
print(template.model_dump())
# {'name': 'linux-baseline', 'profile': 'dev-sec/linux-baseline',
#  'timeout': 300, 'supermarket': False, 'tags': []}
```

### Field validators

```python
from pydantic import BaseModel, field_validator
import re

class HostConfig(BaseModel):
    hostname: str
    port: int = 22
    user: str = "root"

    @field_validator("hostname")
    @classmethod
    def validate_hostname(cls, v: str) -> str:
        # Allow hostnames and IPs
        if not re.match(r"^[\w.\-]+$", v):
            raise ValueError(f"Invalid hostname: {v}")
        return v.lower()

    @field_validator("port")
    @classmethod
    def validate_port(cls, v: int) -> int:
        if not (1 <= v <= 65535):
            raise ValueError("Port must be 1–65535")
        return v
```

### Nested models

The real power shows when you compose models:

```python
from pydantic import BaseModel, Field
from datetime import datetime
from enum import Enum

class JobStatus(str, Enum):
    PENDING = "pending"
    RUNNING = "running"
    SUCCESS = "success"
    FAILED = "failed"

class ControlResult(BaseModel):
    control_id: str
    title: str
    status: str  # "passed" | "failed" | "skipped"
    message: str | None = None

class JobResult(BaseModel):
    job_id: str
    template_name: str
    status: JobStatus = JobStatus.PENDING
    started_at: datetime | None = None
    finished_at: datetime | None = None
    host_results: dict[str, list[ControlResult]] = Field(default_factory=dict)

    @property
    def duration_seconds(self) -> float | None:
        if self.started_at and self.finished_at:
            return (self.finished_at - self.started_at).total_seconds()
        return None

    def passed_count(self) -> int:
        return sum(
            1
            for controls in self.host_results.values()
            for ctrl in controls
            if ctrl.status == "passed"
        )
```

### JSON serialisation / deserialisation

```python
import json
from pydantic import BaseModel

class JobTemplate(BaseModel):
    name: str
    profile: str
    timeout: int = 300

# Serialize to JSON string
t = JobTemplate(name="cis-docker", profile="cis/cis-docker-benchmark")
json_str = t.model_dump_json()
print(json_str)
# {"name":"cis-docker","profile":"cis/cis-docker-benchmark","timeout":300}

# Deserialize from dict or JSON string
raw = {"name": "ssh-baseline", "profile": "dev-sec/ssh-baseline", "timeout": 120}
t2 = JobTemplate.model_validate(raw)

# From JSON file
with open("template.json") as f:
    t3 = JobTemplate.model_validate_json(f.read())
```

### `model_config` — Pydantic v2 configuration

```python
from pydantic import BaseModel, ConfigDict

class StrictJobTemplate(BaseModel):
    model_config = ConfigDict(
        strict=True,          # no coercion: "300" won't become 300
        frozen=True,          # immutable after creation
        extra="forbid",       # reject unknown fields
        populate_by_name=True,
    )

    name: str
    timeout: int = 300
```

### `pydantic-settings` — environment-based config (used in `ansible-inspec`)

```python
from pydantic_settings import BaseSettings, SettingsConfigDict

class DatabaseSettings(BaseSettings):
    model_config = SettingsConfigDict(env_prefix="DATABASE__", env_file=".env")

    url: str = "postgresql://ansible:ansible@localhost:5432/ansible_inspec"
    pool_size: int = 10
    max_overflow: int = 20

class AuthSettings(BaseSettings):
    model_config = SettingsConfigDict(env_prefix="AUTH__", env_file=".env")

    enabled: bool = False
    jwt_secret: str = "change-me-in-production"
    token_expiry_days: int = 7

class AppSettings(BaseSettings):
    model_config = SettingsConfigDict(env_file=".env")

    debug: bool = False
    host: str = "0.0.0.0"
    port: int = 8080
    database: DatabaseSettings = DatabaseSettings()
    auth: AuthSettings = AuthSettings()

# Single import used everywhere
settings = AppSettings()
print(settings.port)           # 8080 or value from DATABASE__PORT env var
print(settings.auth.enabled)   # False or value from AUTH__ENABLED env var
```

This pattern keeps configuration centralised and validated. Setting `DATABASE__URL=...` in your shell or `.env` file is automatically picked up.

***

## Practical Patterns

### Parsing YAML inventories

```python
import yaml
from pathlib import Path
from pydantic import BaseModel

class InventoryHost(BaseModel):
    ansible_host: str
    ansible_port: int = 22
    ansible_user: str = "root"

class Inventory(BaseModel):
    hosts: dict[str, InventoryHost]

def load_inventory(path: str | Path) -> Inventory:
    with open(path) as f:
        raw: dict = yaml.safe_load(f)

    # Flatten the Ansible inventory format
    all_hosts = raw.get("all", {}).get("hosts", {})
    return Inventory(hosts={
        name: InventoryHost(**data or {})
        for name, data in all_hosts.items()
    })
```

### Serializing results to JSON

```python
import json
from pathlib import Path
from datetime import datetime

def save_result(result: JobResult, output_dir: Path) -> Path:
    output_dir.mkdir(parents=True, exist_ok=True)
    timestamp = datetime.now().strftime("%Y%m%d-%H%M%S")
    filename = f"{timestamp}-{result.template_name}.json"
    path = output_dir / filename

    with open(path, "w") as f:
        # model_dump_json handles datetime serialization
        f.write(result.model_dump_json(indent=2))

    return path
```

***

## Summary

| Concept              | Key point                                      |
| -------------------- | ---------------------------------------------- |
| Lists                | Ordered, mutable; use comprehensions liberally |
| Dicts                | Ordered (3.7+); `\|` merge operator (3.9+)     |
| Sets                 | Deduplication, fast membership tests           |
| `TypedDict`          | Typed dict shapes without a full class         |
| `type` alias         | 3.12 first-class alias keyword                 |
| Pydantic `BaseModel` | Validation + serialisation in one              |
| `Field()`            | Constraints, defaults, aliases                 |
| `pydantic-settings`  | Env-var driven config with validation          |

***

## What's Next

[Part 3](https://blog.htunnthuthu.com/getting-started/programming/python-101/python-101-part-3) digs into OOP, `@dataclass`, `Protocol`, and abstract base classes — the patterns that make `ansible-inspec`'s adapters and plugin architecture composable.
