Why did SLA slip last week?

Updated May 9, 20262 min read

Start from SLA compliance, drill into carrier performance and failure patterns, propose what to investigate next.

This is the deeper-drilldown counterpart to Daily ops standup — called when a metric has moved, and the user wants a root-cause narrative rather than a snapshot.

When to use

  • An ops lead notices SLA dropped on a daily report.
  • A customer success manager asks "why did our on-time rate fall last week?".
  • An executive asks "what happened?" before a meeting.

Tool sequence

report_sla_compliance              (the headline — by how much, in which stage?)
       │
       ▼
report_carrier_performance         (which carriers contributed?)
       │
       ▼
report_failed_delivery_analysis    (which failure types? where?)
       │
       ▼
narrative root-cause story

Example agent prompt

"SLA looks like it slipped last week — what happened?"

"Why did our on-time delivery rate drop?"

Walkthrough

Step 1 — confirm and quantify the slip

report_sla_compliance(
  tenantId="…",
  from="<2 weeks ago>",
  to="<1 week ago>"
)

Then a second call for the comparison week:

report_sla_compliance(
  tenantId="…",
  from="<1 week ago>",
  to="<today>"
)

The agent computes the delta and identifies which stage slipped (processing, collection, delivery, customer promise). That narrows the rest of the investigation.

Step 2 — carrier-level breakdown

report_carrier_performance(
  tenantId="…",
  from="<1 week ago>",
  to="<today>"
)

The agent looks for carriers whose success rate fell more than the overall average. Often the slip is concentrated in one or two carriers; surfacing that is the most actionable insight.

Step 3 — failure breakdown

report_failed_delivery_analysis(
  tenantId="…",
  from="<1 week ago>",
  to="<today>",
  group_by="carrier",
  failed_delivery_group_by="reason"
)

Pulls the failure shape — by carrier and by reason. Pair this with the carrier-performance data from step 2 to triangulate cause.

Step 4 — narrate the root cause

The agent collapses the three responses into a story:

"On-time delivery slipped from 91% to 84% week-over-week — mostly in the delivery stage. Aramex drove most of the gap; their success rate fell from 89% to 78%, and within Aramex failures, 'customer not available' jumped 40% (concentrated in KSA). Suggest checking with Aramex KSA on capacity / first-attempt rates, and reviewing whether our customer-notification SMS is being received in that region."

The shape: lead with the headline delta, attribute it to a slice (carrier / region / reason), end with the recommended investigation.

Variations

  • Per-merchant — scope every call with merchant=<id> for brand-specific drilldowns.
  • Per-region — scope with country_access for region-specific drilldowns. Useful when SLA is regional (e.g. KSA fell, UAE steady).
  • Custom window — substitute "month over month" or "quarter over quarter" for the same shape.
  • Pre-emptive variant — same three calls, run on a schedule (Slack bot every Monday). Only narrate when the delta exceeds a threshold; stay quiet otherwise.

Pitfalls

  • Two reports' definitions of "success" can differ. report_sla_compliance measures against promised dates; report_carrier_performance includes carrier-side success signals. The agent should narrate which lens it's using and not cross-reference numbers naively.
  • Sample size. A 7-day window for a small merchant might have too few shipments to draw conclusions. The agent should mention sample size when narrating, not just percentages.
  • Don't confuse cause and correlation. Failure-reason data is a hypothesis, not a verdict. The agent's narrative should say "suggest checking…", not "the cause is…".
  • Daily ops standup — the lighter cousin; surfaces problems for this pattern to drill into.
  • Diagnose a stuck shipment — per-shipment drilldown when this pattern points to specific shipments worth investigating.