Retry and idempotency

Updated May 29, 20264 min read

Webhook delivery isn't fire-and-forget. When your endpoint fails to acknowledge (non-2xx response, timeout, network error), Carriyo retries with exponential backoff. This is essential for reliability, but it means your handler will see the same event more than once in normal operation. Designing for idempotency isn't optional.

Retry behavior

Carriyo waits up to 30 seconds for a 2xx response on each delivery attempt. If the response is non-2xx, the connection times out, or the connection fails, the delivery is treated as failed and retried.

Standard retries

By default, Carriyo retries a failed delivery 3 times with short delays:

RetryDelay (since previous attempt)Cumulative delay (since original event)
11 minute1 minute
23 minutes4 minutes
35 minutes9 minutes

After the third retry, no further automatic delivery attempts are made under the standard schedule.

Extended retries

A tenant can enable extended retries on top of the standard ones. When enabled, Carriyo continues retrying for up to 30 hours after the original event:

RetryDelay (since previous attempt)Cumulative delay (since original event)
11 hour1 hour
23 hours4 hours
35 hours9 hours
48 hours17 hours
513 hours30 hours

Extended retries can lead to out-of-order events. A newer event for the same shipment may be processed while an older failed one is still being retried hours later. Only enable extended retries if your handler can cope with events arriving out of sequence.

Manual retrigger

If both the standard and extended retries are exhausted, no further automatic attempts are made. The event sits in Settings → Integration Monitor and can be manually retriggered once the underlying issue is fixed:

  • Replay one. Re-trigger delivery for a single event.
  • Replay many. Bulk-replay events that failed during a known outage window.

The same is available via API (POST /webhook-events/retrigger).

Other behaviors worth knowing

  • Same event id is reused across retries. The retry isn't a new event, it's another attempt at delivering the same one.
  • 2xx is final. Anything that returned a 2xx is considered delivered, even if your handler later realized it shouldn't have processed it.

Why idempotency matters

A retry storm during a transient outage means your endpoint sees the same shipment.status_updated event repeatedly. Without idempotency, that means:

  • Your customer gets the "your order has shipped" email three times.
  • Your finance system fires three refunds for one return.
  • Your OMS bumps a counter three times.

The fix is to design every webhook handler to be idempotent: applying the same event twice produces the same outcome as applying it once.

Idempotency strategies

The unique event id is sent in the event-id HTTP header on every delivery. It's stable across retries of the same event, which makes it the natural idempotency key.

Key on event id

Track processed event ids in a small table; check before processing, record after. Simple and reliable.

async function handle(req) {
  const eventId = req.headers["event-id"];
  if (await processedEvents.exists(eventId)) {
    return ok();  // already processed, no-op
  }
  await applyTheEvent(req.body);
  await processedEvents.record(eventId);
  return ok();
}

The processed-events table can have a TTL of a few days. That is long enough for retries (up to ~30 hours with extended retries enabled) to land within the window, but short enough to keep the table from growing unbounded.

Key on entity state

When you don't want to maintain a processed-events table, idempotency on the outcome works for many cases. Setting a shipment's status to delivered twice produces the same final state as setting it once. Don't do anything new when the current state already matches what the webhook reports.

async function handle(req) {
  // Shipment webhooks post the Shipment object directly.
  const shipment = req.body;
  const current = await shipments.getByRef(shipment.partner_shipment_reference);
  if (current.status === shipment.status) {
    return ok();  // already at this state, no-op
  }
  await shipments.update(/* ... */);
  return ok();
}

The body shape depends on the entity type. Shipment and return-request webhooks post the entity object directly. Order webhooks post {oldImage, newImage, trigger, operation}. See Webhooks → what's in the payload.

Use unique constraints

For side-effects that must happen once (sending an email, creating a refund), use a unique key in your downstream system to prevent duplicates.

async function handle(req) {
  const eventId = req.headers["event-id"];
  const shipment = req.body;
  await emails.sendUnique({
    idempotency_key: `delivered:${eventId}`,
    template: "delivered",
    to: shipment.dropoff.contact_email,
  });
  return ok();
}

Most email providers (SendGrid, Postmark, Mailgun, AWS SES) support an idempotency key on send. Same for payment processors on refunds.

Don't return 5xx for "I don't know what to do"

A common mistake: an event arrives for a partner_shipment_reference your system doesn't recognize (test data, off-by-one, sync gap), and the handler returns 500. Carriyo retries. The same unknown reference comes in again. Same 500. Repeat.

Return 200 in that case (and log the unknown reference for investigation). 5xx is for "my system is broken, please retry", not "this reference is meaningless to me, please stop".

How it fits with other modules