Home

Why Caches Disagree: GDS vs Airline Host Availability

April 11, 2024 • ~22 min read

Inventory Distribution Reliability Debugging

1. Introduction & Problem Frame

You search a trip in a GDS and see 4 seats in R class; the airline host (or NDC offer API) returns 0. Sales complains. Revenue Management says they closed it hours ago. Distribution says “cache delay.” Everyone is technically right—just in different temporal slices. Understanding why these windows appear lets you reduce friction, false promises, and lost revenue.

Goal: Minimize harmful divergence windows between widely distributed caches (GDS / meta / OTA / edge nodes) and authoritative inventory decisions, while preserving performance and cost efficiency.

2. Mental Model of Divergence

Think of each availability answer as (Inventory State Version, Control State Version, Transformation Layer). Disagreement occurs when one or more of these diverge across systems beyond acceptable SLA.

If any component lags or transforms differently, you get a mismatch.

3. Key Actors & Data Artifacts

ActorRolePrimary Latency SourceArtifact Versioning
Airline Host CRS / InventoryAuthoritative decrement & married segment enforcementProcessing queue + replicationSequence number / log offset
RM EngineProduces control decisions (bid prices/protections)Batch optimization cyclesControlVersionId timestamped
GDS Cache LayerHigh-FAN search responsesFeed ingestion interval & invalidation SLADerived snapshot time
Offer / NDC APIReal-time composition & pricingRound-trip latency to inventory microservicesRequestId, InventoryVersion, ControlVersion
OTA / Metasearch EdgeFurther caching / dedupeClient TTL & stale-if-error policiesEtag / hashed payload

4. Taxonomy of Disagreements

TypeDescriptionImpactTypical Root
False OpenCache shows seat open; host closedCart failure / customer frustrationStale closure update
False ClosedCache shows closed; host openLost sale / dilution prevention lostMissed open propagation
Class Count DriftNumeric difference in seat count (e.g. 4 vs 2)Agent confusion / distrustPartial decrement feed loss
Married Segment BreakCache sells connection; host rejects partialRepricing frictionIncomplete marriage context
Brand/Product MismatchBase RBD open; branded fare not offeredRevenue leakage / upsell lossProduct filter divergence
Price InconsistencyAvailability open but priced fare differentRequote / churnFiled fare vs dynamic adjustment timing

5. Lifecycle Timeline Example

T+00:00  RM closes R class on AB123 10JUL (ControlVersion 4512)
T+00:05  Host updates availability → R=0 (InventoryVersion 982344)
T+00:07  Push message to GDS channel queue (seq 188877) still in transit
T+00:11  GDS still serves cached snapshot (R=4) to agencies
T+00:12  Booking attempt from GDS: host rejects at commit (diff)
T+00:13  Invalidation arrives; GDS marks R closed
Divergence window length: ~8 minutes (beyond SLA of 5) → incident triggered

6. Root Causes (Detailed)

  1. Propagation Latency: Delay between host update and cache invalidation. Causes: network queuing, batching, rate limiting. Mitigation: priority channels for closures, adaptive push (close events higher QoS than opens).
  2. Lossy Event Streams: Missed message in feed leading to stale state until periodic full refresh. Mitigation: sequence numbers + gap detector; fallback snapshot request when gap.
  3. Non-Atomic Multi-Leg Updates: For O&D control, multiple legs updated sequentially. A cache sees partial state. Mitigation: version fencing (apply only when full bundle present) or two-phase marker.
  4. Married Segment Reconstruction Errors: Some caches approximate marriage by pattern rules; host uses authoritative marriage groups. Mitigation: distribute explicit marriage group tokens.
  5. Time-Zone / DOW Misalignment: Schedule change at boundary (UTC vs local). Cache indexes wrong flight instance. Mitigation: canonical UTC + local offset pair key; validation job.
  6. Fare Class Remapping / Code Shares: Marketing vs operating RBD differences not normalized. Mitigation: deterministic mapping table with version id; embed mappingVersion in payloads.
  7. Dynamic (Continuous) Pricing Layer:** Host reopened with guard price; cache still anchored to filed ladder. Mitigation: treat price and availability as coupled artifacts with joint version.
  8. Optimistic Oversell Windows: Host increments oversell allowances; cache not yet aware. Mitigation: proactive open event before UI exposure or delayed open until propagation confirmed.
  9. Edge CDN Stale-if-Error: Transient upstream failure leads edge to serve stale open data. Mitigation: differentiate harmful stale (open states) vs harmless (closed) in caching policy.
  10. Partial Rollback After Error: Host attempted update, some legs succeeded, RM rolled back logically; cache kept transient state. Mitigation: idempotent update bundles with rollback marker.
  11. Compression / Serialization Truncation:** Large update batch truncated, silently losing trailing classes. Mitigation: checksum per batch & ack handshake.
  12. Clock Skew: SLA windows miscalculated because of unsynchronized clocks. Mitigation: NTP enforced; embed origin timestamp & monotonic receipt delta.

7. Married Segments & O&D Interplay

Availability for a connecting itinerary (A–B–C) might be open only as a bundle. A cache lacking the marriage token may incorrectly infer each leg open individually. When a traveler later tries to drop B–C, host denies or reprices. Provide explicit marriageGroupId plus scope=JOINT attribute to caches. Include integrity hash of constituent leg keys so tampering or partial ingestion is detectable.

marriageGroup {
  id: "MG-AB123-BC456-20240710-01",
  legs:["AB123|2024-07-10|A-B","BC456|2024-07-10|B-C"],
  scope:"JOINT",
  version:7,
  createdAt:"2024-07-01T11:05:12Z"
}

8. Price vs Availability Mismatch

Even when seat count matches, pricing divergence fuels perception of availability mismatch.

  • Filed vs Dynamic: Cache priced on filed fare ladder; host offered dynamic anchor.
  • Tax/Fee Drift: Changed tax table version not yet applied in cache’s aggregated total.
  • Brand Bundle Decomposition: Cache shows base class; host only sells brand with ancillaries included.

Mitigation: Pair each availability answer with pricingContextVersion; at commit, require equivalence or initiate repricing handshake (pre-confirm pop to user or agent).

9. Detection & Instrumentation

Log every search and commit with immutable provenance.

AvailabilityProvenance {
  requestId,
  inventoryVersion,
  controlVersion,
  mappingVersion,
  marriageVersion,
  pricedAt,
  servedFrom: HOST | GDS_CACHE | EDGE
}

Create aggregated mismatch events:

MismatchEvent {
  discrepancyType: "FALSE_OPEN",
  flightLegKey,
  class:"R",
  cacheVersion:"inv=982300 ctl=4510",
  hostVersion:"inv=982344 ctl=4512",
  latencySec: 420
}

Trigger anomaly if rate of mismatches > baseline (e.g. False Open > 0.5% of queries per hour for a leg bucket).

10. Reconciliation Strategies

StrategyMechanismProsTrade-offs
Push + AckHost pushes diff; cache acks sequenceLow stalenessBack-pressure if ack slow
Pull on GapCache detects sequence gap, pulls snapshotResilient to lossExtra host load on bursts
Time-Sliced SnapshotsPeriodic full baseline plus deltasRecovery simplicityLatency between snapshots
Hybrid Token ValidationSearch returns token; commit validatesPrevents stale commitUser friction if reprice
Selective Real-TimeHigh-risk flights bypass cacheAccuracy where neededInconsistent latency profile

11. Engineering Patterns

11.1 Version Fencing

if (response.controlVersion != host.currentControlVersion) {
   re-evaluate availability before commit
}

11.2 Sequence Gap Detector

expectedSeq = lastSeq + 1
if (incomingSeq > expectedSeq) {
   emit GAP event
   pull snapshot
}

11.3 Conflict Classifier

function classify(host, cache):
  if cache.open && host.closed -> FALSE_OPEN
  else if cache.closed && host.open -> FALSE_CLOSED
  else if cache.count != host.count -> COUNT_DRIFT
  else return CONSISTENT

11.4 Read Repair

On commit failure due to stale availability:
  1. Fetch authoritative state
  2. Update local cache segment
  3. Return corrected offer (with new token)

11.5 Event Prioritization

PriorityQueue:
  1: Closures & decreases
  2: Marriage changes
  3: Opens
  4: Price-only updates

11.6 Staleness Budgeting

classPolicy['R'] = maxStalenessSec 120
classPolicy['Y'] = 300
if (now - invTimestamp > classPolicy[class]) -> force refresh

12. Observability & KPIs

  • Divergence Window p95: Time from host update to cache adoption.
  • False Open Rate: (False Open responses) / (Total availability responses).
  • Commit Failure Attribution: % failures caused by stale availability vs payment vs price change.
  • Gap Frequency: Gaps per 10k event sequences.
  • Propagation Latency Histogram: For closures vs opens (separate—closures require tighter SLA).
  • Edge Stale Serve Count: Edge served objects beyond staleness budget.
  • Marriage Integrity Error Rate: Broken marriage validations per 1k multi-leg bookings.

13. Triage Runbook

  1. Identify Flight Scope: Gather leg key, RBD, time window from incident sample.
  2. Pull Provenance Logs: Compare host vs cache controlVersion & inventoryVersion.
  3. Sequence Audit: Check gap detector dashboard for missing sequence IDs.
  4. Diff Control Artifacts: Did controlVersion increment without associated push? If yes, feed failure.
  5. Check Prioritization Queue Lag: Queue depth metrics > threshold?
  6. Edge TTL Audit: Stale-if-error triggered? Inspect error logs around timestamp.
  7. Apply Quick Mitigation: Force refresh bundle; temporarily route flight to real-time path.
  8. Root Cause Classification: Tag incident record with one of taxonomy categories.
  9. Post-Incident Improvement: Adjust SLA guardrails / add missing alert if gap unalerted.

14. Preventive Design Moves

  • Carry provenance object in every availability response; make it visible (hidden param) to ops consoles.
  • Separate channels for closures vs low-impact updates; closures bypass batch aggregator.
  • Bidirectional health checks: caches periodically assert latest sequence they hold; host flags lagging nodes.
  • Dual-publish period when changing mappingVersion; reject mixed mapping combos at commit.
  • SLA tiering: high-demand departure buckets (< 7 days) force sub-60s closure propagation; long-haul far-future relaxed.
  • Rolling canary for feed parser updates with shadow validation before live adoption.
  • Edge hint header: X-Inv-Stale-After: 120 guiding CDN to treat data as expired, not extended.

15. Modernization (NDC & Offers)

As distribution shifts to Offer & Order based models:

  • Tokenized Offers: Each offer includes availabilityToken encoding inventoryVersion & controlVersion hash; commit validates token.
  • Granular Feature Availability: Instead of RBD counts, decision service returns product feature eligibility; caches must store feature-level TTLs.
  • Unified Diff Channels: Single event bus carries both inventory and ancillary capacity changes to keep offer caches coherent.
  • Interline Coordination: Downstream partner still EDIFACT? Wrap authoritative segments with digital signature to deter manipulation across formats.

16. Implementation Checklist

  • Define canonical LegKey + O&D key; document.
  • Attach inventoryVersion & controlVersion to all outward availability responses.
  • Implement sequence gap detection + automatic snapshot pull.
  • Prioritize closure events; monitor closure propagation p95.
  • Married segment explicit distribution (groupId, version, hash).
  • MappingVersion for marketing ↔ operating RBD conversions.
  • Availability token validation at commit (reject stale).
  • Observability dashboards: divergence window, false open/closed rates.
  • Runbook in repo with taxonomy and decision tree.
  • Chaos test: drop 1% of event messages → expect gap recovery, zero silent drift.
  • Edge caching policy differentiates harmful vs benign staleness.

17. Mini Glossary

  • False Open: Cache shows class open while authoritative host is closed.
  • ControlVersion: Identifier of current RM control set (bid prices / protections).
  • InventoryVersion: Monotonic sequence of leg seat state changes.
  • Marriage Group: Binding list of segments required to be handled jointly.
  • Sequence Gap: Missing event number indicating potential state loss.
  • Divergence Window: Time span between authoritative update and global cache convergence.
  • Read Repair: On demand correction of stale cache during client interaction.

18. References & Reading

  • Public airline RM conference talks (AGIFORS, PODS) on network control propagation.
  • Open industry papers describing O&D vs leg control implications for distribution.
  • Architectural blogs on cache invalidation, eventual consistency, and sequence-based replication.
  • NDC implementation guides on offer/availability tokenization.

Proprietary internal feed formats and vendor-specific algorithms intentionally excluded.

19. Disclaimer

This article summarizes generally known engineering and distribution concepts. It omits proprietary RM vendor math, confidential feed specifications, and commercial strategies. Validate implementation decisions against contractual obligations, regulatory constraints, and internal governance policies before deployment.

Back to all blogs