# Part 1: Introduction to SOAP and XML Message Structure

## Why I Had to Learn SOAP

I spent the first few years of my career building REST APIs and assumed SOAP was something I could safely ignore. That assumption ended the day I needed to integrate a Python backend with a Danish government tax reporting service. The only API they offered was SOAP. No REST alternative, no JSON, no choice.

I had two options: complain or learn. I chose to learn, and it turned out to be genuinely useful knowledge. Enterprise banking platforms, insurance claim systems, national health record APIs — a large fraction of the world's critical data infrastructure speaks SOAP. Knowing how it works makes you a more capable engineer.

This first part covers what SOAP actually is, how its XML messages are structured, and how to set up a Python environment to start working with it.

## What is SOAP?

**SOAP** (Simple Object Access Protocol) is a messaging protocol for exchanging structured information in distributed systems. It uses XML as its message format and is transport-independent — though in practice, HTTP is by far the most common transport.

SOAP was developed by Microsoft in 1998 and became a W3C standard. At its peak in the mid-2000s, it was the dominant approach for building web services. REST later displaced it for most public-facing APIs, but SOAP never went away from enterprise environments where its formal contract model and built-in security standards are valued features, not bureaucratic overhead.

### What SOAP Gives You

* **Strict contracts**: A WSDL file formally defines every operation, message format, and data type. Both sides know exactly what to expect.
* **Language agnostic**: Any language that can produce and parse XML can use SOAP — Python, Java, C#, PHP, and many more.
* **Transport agnostic**: HTTP is standard, but SOAP can run over JMS, SMTP, or TCP.
* **WS-Security**: A message-level security standard for signing, encrypting, and authenticating SOAP messages independent of transport.
* **Built-in error protocol**: SOAP Faults are a standardised way to communicate errors across service boundaries.

## SOAP Message Structure

A SOAP message is an XML document. It always has the same outer shell and follows a strict structure.

```xml
<?xml version="1.0" encoding="UTF-8"?>
<soap:Envelope
    xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">

    <soap:Header>
        <!-- Optional: metadata, auth tokens, routing info -->
    </soap:Header>

    <soap:Body>
        <!-- Required: the actual request or response content -->
    </soap:Body>

</soap:Envelope>
```

There are four parts to understand:

### 1. Envelope

The `<soap:Envelope>` is the root element. It wraps the entire message and declares the XML namespaces that define which version of SOAP is being used.

* **SOAP 1.1**: `xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"`
* **SOAP 1.2**: `xmlns:soap="http://www.w3.org/2003/05/soap-envelope"`

The namespace is how you distinguish SOAP 1.1 from SOAP 1.2 messages.

### 2. Header

The `<soap:Header>` element is optional. When present, it carries metadata that is not part of the core business message — things like:

* Authentication credentials (WS-Security UsernameToken)
* Message routing directives
* Timestamps for replay prevention
* Transaction identifiers
* Custom application headers

```xml
<soap:Header>
    <wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd">
        <wsse:UsernameToken>
            <wsse:Username>api_user</wsse:Username>
            <wsse:Password Type="...#PasswordText">secret</wsse:Password>
        </wsse:UsernameToken>
    </wsse:Security>
</soap:Header>
```

### 3. Body

The `<soap:Body>` is required and contains the actual payload — the operation request or response.

A request body for an operation that looks up a product might look like:

```xml
<soap:Body>
    <tns:GetProductRequest xmlns:tns="http://example.com/inventory">
        <tns:ProductId>SKU-1042</tns:ProductId>
    </tns:GetProductRequest>
</soap:Body>
```

And the corresponding response:

```xml
<soap:Body>
    <tns:GetProductResponse xmlns:tns="http://example.com/inventory">
        <tns:Product>
            <tns:ProductId>SKU-1042</tns:ProductId>
            <tns:Name>Widget Pro</tns:Name>
            <tns:Price>29.99</tns:Price>
            <tns:InStock>true</tns:InStock>
        </tns:Product>
    </tns:GetProductResponse>
</soap:Body>
```

### 4. Fault

When something goes wrong, the SOAP body contains a `<soap:Fault>` element instead of a normal response:

```xml
<soap:Body>
    <soap:Fault>
        <faultcode>soap:Client</faultcode>
        <faultstring>Product not found: SKU-1042</faultstring>
        <detail>
            <tns:ProductNotFoundError xmlns:tns="http://example.com/inventory">
                <tns:ProductId>SKU-1042</tns:ProductId>
                <tns:ErrorCode>PRODUCT_NOT_FOUND</tns:ErrorCode>
            </tns:ProductNotFoundError>
        </detail>
    </soap:Fault>
</soap:Body>
```

SOAP Faults and their differences between SOAP 1.1 and 1.2 are covered in depth in [Part 6](https://blog.htunnthuthu.com/getting-started/programming/soap-101/part-6-error-handling-soap-faults).

## SOAP 1.1 vs SOAP 1.2

Both versions are in active use. It is common to encounter both in a single integration project. Understanding the differences prevents hard-to-diagnose bugs.

| Aspect            | SOAP 1.1                                           | SOAP 1.2                                   |
| ----------------- | -------------------------------------------------- | ------------------------------------------ |
| Namespace         | `http://schemas.xmlsoap.org/soap/envelope/`        | `http://www.w3.org/2003/05/soap-envelope`  |
| HTTP Content-Type | `text/xml`                                         | `application/soap+xml`                     |
| Fault structure   | `faultcode`, `faultstring`, `faultactor`, `detail` | `Code`, `Reason`, `Node`, `Role`, `Detail` |
| WS-I compliance   | Requires literal encoding                          | Native WS-I Basic Profile 1.1              |
| HTTP binding      | Separate specification                             | Part of the core specification             |
| W3C status        | Note                                               | Recommendation                             |

In my experience, most legacy enterprise systems use SOAP 1.1. SOAP 1.2 is more common in systems designed after 2007 or systems that explicitly target WS-I compliance.

## Communication Styles: RPC vs Document

SOAP supports two communication styles that affect how the body is structured.

### RPC Style

RPC (Remote Procedure Call) style treats the web service like a function call. The body element is named after the operation being called, and contains the operation parameters as child elements.

```xml
<soap:Body>
    <tns:GetProduct>          <!-- operation name -->
        <productId>SKU-1042</productId>  <!-- parameter -->
    </tns:GetProduct>
</soap:Body>
```

**Useful when**: You are modelling procedural operations and migrating from traditional RPC systems.

### Document Style

Document style (also called message-oriented) sends an arbitrary XML document in the body. The structure is defined entirely by the XSD schema in the WSDL. The body element is not necessarily named after the operation.

```xml
<soap:Body>
    <tns:ProductLookupRequest xmlns:tns="http://example.com/inventory">
        <tns:Criteria>
            <tns:ProductId>SKU-1042</tns:ProductId>
            <tns:IncludeVariants>true</tns:IncludeVariants>
        </tns:Criteria>
    </tns:ProductLookupRequest>
</soap:Body>
```

**Useful when**: You are modelling complex business documents (invoices, orders, claims), which is common in B2B integration.

In practice, **Document/Literal** is the most widely used combination and is the WS-I Basic Profile recommendation. When you encounter a SOAP service in the wild, assume Document/Literal unless you have reason to think otherwise.

## SOAP vs REST vs gRPC

Understanding where SOAP fits compared to the alternatives helps you use the right tool.

| Dimension            | SOAP                          | REST                             | gRPC                      |
| -------------------- | ----------------------------- | -------------------------------- | ------------------------- |
| Message format       | XML                           | JSON (typically)                 | Protocol Buffers (binary) |
| Contract             | WSDL (mandatory)              | OpenAPI (optional)               | .proto files (mandatory)  |
| Transport            | HTTP, JMS, SMTP, TCP          | HTTP only                        | HTTP/2 only               |
| Security standard    | WS-Security (message level)   | OAuth 2.0 / API keys (transport) | mTLS, token-based         |
| Streaming            | No                            | No (SSE workaround)              | Native bidirectional      |
| Performance          | Slower (XML parsing overhead) | Moderate                         | Fastest                   |
| Browser support      | Poor                          | Excellent                        | Needs proxy               |
| Legacy compatibility | Excellent                     | Good                             | None                      |
| Enterprise adoption  | Dominant in legacy            | Dominant in new                  | Growing                   |

### When to choose SOAP

From personal experience, SOAP is the right choice when:

* You are **integrating with an existing system** that only offers a SOAP endpoint (banking, government, ERP, CRM platforms like SAP, Salesforce, older versions)
* The integration requires **message-level security** that is independent of transport (signing and encrypting the XML body itself, not just using TLS)
* You need **formal, enforceable contracts** between organisations — the WSDL acts as a legal specification
* You are working in **regulated industries** where SOAP-based WS-\* specifications are mandated (healthcare HL7, financial ISO 20022 transport, insurance)
* Your message is a **structured business document** (invoice, purchase order, claim) rather than a simple CRUD operation

### When not to choose SOAP

* Building a new public-facing API — REST with OpenAPI is simpler and better understood
* Mobile or JavaScript frontends — JSON parsing is far easier
* High-performance microservice communication — gRPC's binary format is faster
* Teams without XML/WSDL knowledge and no time to learn

## Setting Up the Python Environment

I use Python for all examples in this series. The two central libraries are:

* **zeep** — a modern, actively maintained SOAP client. I use this to consume external SOAP services.
* **spyne** — a Python framework for building SOAP services. I use this to build and host SOAP endpoints.

```bash
# Create a project directory
mkdir soap-101-project
cd soap-101-project

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

# Install all libraries used in this series
pip install spyne zeep lxml flask pytest pytest-mock requests
```

Verify the installation:

```python
# verify.py
import spyne
import zeep
import lxml
import flask

print(f"spyne  : {spyne.__version__}")
print(f"zeep   : {zeep.__version__}")
print(f"lxml   : {lxml.__version__}")
print(f"flask  : {flask.__version__}")
```

```bash
python verify.py
# spyne  : 2.14.0
# zeep   : 4.2.1
# lxml   : 5.1.0
# flask  : 3.0.3
```

## Sending Your First SOAP Request

Before building anything, I find it useful to manually construct a raw SOAP message using `requests`. This removes all abstraction and shows exactly what travels over the wire.

I will use a publicly available SOAP service: `http://www.dneonline.com/calculator.asmx` — a simple arithmetic calculator offered as a demo service.

```python
# raw_soap_request.py
import requests

# The WSDL for this service:
# http://www.dneonline.com/calculator.asmx?WSDL

url = "http://www.dneonline.com/calculator.asmx"

# A SOAP 1.1 request to Add two numbers
soap_body = """<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
    <soap:Body>
        <Add xmlns="http://tempuri.org/">
            <intA>15</intA>
            <intB>27</intB>
        </Add>
    </soap:Body>
</soap:Envelope>"""

headers = {
    "Content-Type": "text/xml; charset=utf-8",
    "SOAPAction": '"http://tempuri.org/Add"',
}

response = requests.post(url, data=soap_body, headers=headers)

print(f"Status: {response.status_code}")
print(f"Response:\n{response.text}")
```

The response will be:

```xml
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
    xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Body>
    <AddResponse xmlns="http://tempuri.org/">
      <AddResult>42</AddResult>
    </AddResponse>
  </soap:Body>
</soap:Envelope>
```

This is the complete SOAP request/response cycle. The same thing happens when you use zeep — it just handles building the XML and parsing the response for you.

Notice the `SOAPAction` header in the request. This is a SOAP 1.1 requirement — the server uses it to route the request to the correct operation handler. SOAP 1.2 uses a different mechanism (the `action` parameter in the `Content-Type` header) and does not require `SOAPAction`.

## Parsing a SOAP Response with lxml

When using raw `requests` instead of zeep, you need to parse the XML response yourself. Here is how to do that with lxml:

```python
# parse_soap_response.py
from lxml import etree

soap_response = """<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
    xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <soap:Body>
    <AddResponse xmlns="http://tempuri.org/">
      <AddResult>42</AddResult>
    </AddResponse>
  </soap:Body>
</soap:Envelope>"""

root = etree.fromstring(soap_response.encode())

# Define namespace map for XPath
namespaces = {
    "soap": "http://schemas.xmlsoap.org/soap/envelope/",
    "tns": "http://tempuri.org/",
}

# Extract the result using XPath
result_elements = root.xpath(
    "//tns:AddResult",
    namespaces=namespaces,
)

if result_elements:
    print(f"Result: {result_elements[0].text}")  # Result: 42
```

In Part 4, zeep handles all of this automatically. But understanding the raw XML structure means you can debug problems that zeep abstracts away.

## What's Next

You now understand:

* What SOAP is and why it still matters
* The four parts of a SOAP message: Envelope, Header, Body, Fault
* SOAP 1.1 vs 1.2 differences
* RPC vs Document style
* How a raw SOAP request works in Python

In [Part 2](https://blog.htunnthuthu.com/getting-started/programming/soap-101/part-2-wsdl-service-contracts), we look at WSDL — the XML document that formally describes what a SOAP service does, what operations it exposes, and what data types it accepts and returns.
