- Introduction & Problem Frame
- Mental Model of Divergence
- Key Actors & Data Artifacts
- Taxonomy of Disagreements
- Lifecycle Timeline Example
- Root Causes (Detailed)
- Married Segments & O&D Interplay
- Price vs Availability Mismatch
- Detection & Instrumentation
- Reconciliation Strategies
- Engineering Patterns
- Observability & KPIs
- Triage Runbook
- Preventive Design Moves
- Modernization (NDC & Offers)
- Implementation Checklist
- Mini Glossary
- References & Reading
- Disclaimer
1. Introduction & Problem Frame
You search a trip in a GDS and see 4 seats in R class; the airline host (or NDC offer API) returns 0. Sales complains. Revenue Management says they closed it hours ago. Distribution says “cache delay.” Everyone is technically right-just in different temporal slices. Understanding why these windows appear lets you reduce friction, false promises, and lost revenue.
Goal: Minimize harmful divergence windows between widely distributed caches (GDS / meta / OTA / edge nodes) and authoritative inventory decisions, while preserving performance and cost efficiency.
2. Mental Model of Divergence
Think of each availability answer as (Inventory State Version, Control State Version, Transformation Layer). Disagreement occurs when one or more of these diverge across systems beyond acceptable SLA.
- Inventory State: Physical seats minus committed minus protected.
- Control State: Network RM decisions (bid prices, protection levels, O&D closures, married segment logic).
- Transformation: Channel-specific filters (brand mapping, corporate policy, point-of-sale restrictions, married segment enforcement, schedule/timeband conversions, RBD merges).
If any component lags or transforms differently, you get a mismatch.
3. Key Actors & Data Artifacts
| Actor | Role | Primary Latency Source | Artifact Versioning |
|---|---|---|---|
| Airline Host CRS / Inventory | Authoritative decrement & married segment enforcement | Processing queue + replication | Sequence number / log offset |
| RM Engine | Produces control decisions (bid prices/protections) | Batch optimization cycles | ControlVersionId timestamped |
| GDS Cache Layer | High-FAN search responses | Feed ingestion interval & invalidation SLA | Derived snapshot time |
| Offer / NDC API | Real-time composition & pricing | Round-trip latency to inventory microservices | RequestId, InventoryVersion, ControlVersion |
| OTA / Metasearch Edge | Further caching / dedupe | Client TTL & stale-if-error policies | Etag / hashed payload |
4. Taxonomy of Disagreements
| Type | Description | Impact | Typical Root |
|---|---|---|---|
| False Open | Cache shows seat open; host closed | Cart failure / customer frustration | Stale closure update |
| False Closed | Cache shows closed; host open | Lost sale / dilution prevention lost | Missed open propagation |
| Class Count Drift | Numeric difference in seat count (e.g. 4 vs 2) | Agent confusion / distrust | Partial decrement feed loss |
| Married Segment Break | Cache sells connection; host rejects partial | Repricing friction | Incomplete marriage context |
| Brand/Product Mismatch | Base RBD open; branded fare not offered | Revenue leakage / upsell loss | Product filter divergence |
| Price Inconsistency | Availability open but priced fare different | Requote / churn | Filed fare vs dynamic adjustment timing |
5. Lifecycle Timeline Example
T+00:00 RM closes R class on AB123 10JUL (ControlVersion 4512)
T+00:05 Host updates availability → R=0 (InventoryVersion 982344)
T+00:07 Push message to GDS channel queue (seq 188877) still in transit
T+00:11 GDS still serves cached snapshot (R=4) to agencies
T+00:12 Booking attempt from GDS: host rejects at commit (diff)
T+00:13 Invalidation arrives; GDS marks R closed
Divergence window length: ~8 minutes (beyond SLA of 5) → incident triggered
6. Root Causes (Detailed)
- Propagation Latency: Delay between host update and cache invalidation. Causes: network queuing, batching, rate limiting. Mitigation: priority channels for closures, adaptive push (close events higher QoS than opens).
- Lossy Event Streams: Missed message in feed leading to stale state until periodic full refresh. Mitigation: sequence numbers + gap detector; fallback snapshot request when gap.
- Non-Atomic Multi-Leg Updates: For O&D control, multiple legs updated sequentially. A cache sees partial state. Mitigation: version fencing (apply only when full bundle present) or two-phase marker.
- Married Segment Reconstruction Errors: Some caches approximate marriage by pattern rules; host uses authoritative marriage groups. Mitigation: distribute explicit marriage group tokens.
- Time-Zone / DOW Misalignment: Schedule change at boundary (UTC vs local). Cache indexes wrong flight instance. Mitigation: canonical UTC + local offset pair key; validation job.
- Fare Class Remapping / Code Shares: Marketing vs operating RBD differences not normalized. Mitigation: deterministic mapping table with version id; embed mappingVersion in payloads.
- Dynamic (Continuous) Pricing Layer:** Host reopened with guard price; cache still anchored to filed ladder. Mitigation: treat price and availability as coupled artifacts with joint version.
- Optimistic Oversell Windows: Host increments oversell allowances; cache not yet aware. Mitigation: proactive open event before UI exposure or delayed open until propagation confirmed.
- Edge CDN Stale-if-Error: Transient upstream failure leads edge to serve stale open data. Mitigation: differentiate harmful stale (open states) vs harmless (closed) in caching policy.
- Partial Rollback After Error: Host attempted update, some legs succeeded, RM rolled back logically; cache kept transient state. Mitigation: idempotent update bundles with rollback marker.
- Compression / Serialization Truncation:** Large update batch truncated, silently losing trailing classes. Mitigation: checksum per batch & ack handshake.
- Clock Skew: SLA windows miscalculated because of unsynchronized clocks. Mitigation: NTP enforced; embed origin timestamp & monotonic receipt delta.
7. Married Segments & O&D Interplay
Availability for a connecting itinerary (A–B–C) might be open only as a bundle. A cache lacking the marriage token may incorrectly infer each leg open individually. When a traveler later tries to drop B–C, host denies or reprices. Provide explicit marriageGroupId plus scope=JOINT attribute to caches. Include integrity hash of constituent leg keys so tampering or partial ingestion is detectable.
marriageGroup {
id: "MG-AB123-BC456-20240710-01",
legs:["AB123|2024-07-10|A-B","BC456|2024-07-10|B-C"],
scope:"JOINT",
version:7,
createdAt:"2024-07-01T11:05:12Z"
}
8. Price vs Availability Mismatch
Even when seat count matches, pricing divergence fuels perception of availability mismatch.
- Filed vs Dynamic: Cache priced on filed fare ladder; host offered dynamic anchor.
- Tax/Fee Drift: Changed tax table version not yet applied in cache’s aggregated total.
- Brand Bundle Decomposition: Cache shows base class; host only sells brand with ancillaries included.
Mitigation: Pair each availability answer with pricingContextVersion; at commit, require equivalence or initiate repricing handshake (pre-confirm pop to user or agent).
9. Detection & Instrumentation
Log every search and commit with immutable provenance.
AvailabilityProvenance {
requestId,
inventoryVersion,
controlVersion,
mappingVersion,
marriageVersion,
pricedAt,
servedFrom: HOST | GDS_CACHE | EDGE
}
Create aggregated mismatch events:
MismatchEvent {
discrepancyType: "FALSE_OPEN",
flightLegKey,
class:"R",
cacheVersion:"inv=982300 ctl=4510",
hostVersion:"inv=982344 ctl=4512",
latencySec: 420
}
Trigger anomaly if rate of mismatches > baseline (e.g. False Open > 0.5% of queries per hour for a leg bucket).
10. Reconciliation Strategies
| Strategy | Mechanism | Pros | Trade-offs |
|---|---|---|---|
| Push + Ack | Host pushes diff; cache acks sequence | Low staleness | Back-pressure if ack slow |
| Pull on Gap | Cache detects sequence gap, pulls snapshot | Resilient to loss | Extra host load on bursts |
| Time-Sliced Snapshots | Periodic full baseline plus deltas | Recovery simplicity | Latency between snapshots |
| Hybrid Token Validation | Search returns token; commit validates | Prevents stale commit | User friction if reprice |
| Selective Real-Time | High-risk flights bypass cache | Accuracy where needed | Inconsistent latency profile |
11. Engineering Patterns
11.1 Version Fencing
if (response.controlVersion != host.currentControlVersion) {
re-evaluate availability before commit
}
11.2 Sequence Gap Detector
expectedSeq = lastSeq + 1
if (incomingSeq > expectedSeq) {
emit GAP event
pull snapshot
}
11.3 Conflict Classifier
function classify(host, cache):
if cache.open && host.closed -> FALSE_OPEN
else if cache.closed && host.open -> FALSE_CLOSED
else if cache.count != host.count -> COUNT_DRIFT
else return CONSISTENT
11.4 Read Repair
On commit failure due to stale availability:
1. Fetch authoritative state
2. Update local cache segment
3. Return corrected offer (with new token)
11.5 Event Prioritization
PriorityQueue:
1: Closures & decreases
2: Marriage changes
3: Opens
4: Price-only updates
11.6 Staleness Budgeting
classPolicy['R'] = maxStalenessSec 120
classPolicy['Y'] = 300
if (now - invTimestamp > classPolicy[class]) -> force refresh
12. Observability & KPIs
- Divergence Window p95: Time from host update to cache adoption.
- False Open Rate: (False Open responses) / (Total availability responses).
- Commit Failure Attribution: % failures caused by stale availability vs payment vs price change.
- Gap Frequency: Gaps per 10k event sequences.
- Propagation Latency Histogram: For closures vs opens (separate-closures require tighter SLA).
- Edge Stale Serve Count: Edge served objects beyond staleness budget.
- Marriage Integrity Error Rate: Broken marriage validations per 1k multi-leg bookings.
13. Triage Runbook
- Identify Flight Scope: Gather leg key, RBD, time window from incident sample.
- Pull Provenance Logs: Compare host vs cache controlVersion & inventoryVersion.
- Sequence Audit: Check gap detector dashboard for missing sequence IDs.
- Diff Control Artifacts: Did controlVersion increment without associated push? If yes, feed failure.
- Check Prioritization Queue Lag: Queue depth metrics > threshold?
- Edge TTL Audit: Stale-if-error triggered? Inspect error logs around timestamp.
- Apply Quick Mitigation: Force refresh bundle; temporarily route flight to real-time path.
- Root Cause Classification: Tag incident record with one of taxonomy categories.
- Post-Incident Improvement: Adjust SLA guardrails / add missing alert if gap unalerted.
14. Preventive Design Moves
- Carry
provenanceobject in every availability response; make it visible (hidden param) to ops consoles. - Separate channels for closures vs low-impact updates; closures bypass batch aggregator.
- Bidirectional health checks: caches periodically assert latest sequence they hold; host flags lagging nodes.
- Dual-publish period when changing mappingVersion; reject mixed mapping combos at commit.
- SLA tiering: high-demand departure buckets (< 7 days) force sub-60s closure propagation; long-haul far-future relaxed.
- Rolling canary for feed parser updates with shadow validation before live adoption.
- Edge hint header:
X-Inv-Stale-After: 120guiding CDN to treat data as expired, not extended.
15. Modernization (NDC & Offers)
As distribution shifts to Offer & Order based models:
- Tokenized Offers: Each offer includes
availabilityTokenencoding inventoryVersion & controlVersion hash; commit validates token. - Granular Feature Availability: Instead of RBD counts, decision service returns product feature eligibility; caches must store feature-level TTLs.
- Unified Diff Channels: Single event bus carries both inventory and ancillary capacity changes to keep offer caches coherent.
- Interline Coordination: Downstream partner still EDIFACT? Wrap authoritative segments with digital signature to deter manipulation across formats.
16. Implementation Checklist
- Define canonical LegKey + O&D key; document.
- Attach inventoryVersion & controlVersion to all outward availability responses.
- Implement sequence gap detection + automatic snapshot pull.
- Prioritize closure events; monitor closure propagation p95.
- Married segment explicit distribution (groupId, version, hash).
- MappingVersion for marketing ↔ operating RBD conversions.
- Availability token validation at commit (reject stale).
- Observability dashboards: divergence window, false open/closed rates.
- Runbook in repo with taxonomy and decision tree.
- Chaos test: drop 1% of event messages → expect gap recovery, zero silent drift.
- Edge caching policy differentiates harmful vs benign staleness.
17. Mini Glossary
- False Open: Cache shows class open while authoritative host is closed.
- ControlVersion: Identifier of current RM control set (bid prices / protections).
- InventoryVersion: Monotonic sequence of leg seat state changes.
- Marriage Group: Binding list of segments required to be handled jointly.
- Sequence Gap: Missing event number indicating potential state loss.
- Divergence Window: Time span between authoritative update and global cache convergence.
- Read Repair: On demand correction of stale cache during client interaction.
18. References & Reading
- Public airline RM conference talks (AGIFORS, PODS) on network control propagation.
- Open industry papers describing O&D vs leg control implications for distribution.
- Architectural blogs on cache invalidation, eventual consistency, and sequence-based replication.
- NDC implementation guides on offer/availability tokenization.
Proprietary internal feed formats and vendor-specific algorithms intentionally excluded.