External Event Consumption in Salesforce — Retries & Dead-Letter Concepts

Share

When Salesforce consumes external events (for example, Platform Events published by another system), your integration logic must be resilient. Failures will happen — networks time out, APIs throttle, or payloads contain invalid data.

A robust event consumer does two key things:

  1. It retries transient errors safely without looping forever.

  2. It dead-letters unrecoverable events so nothing gets silently dropped.

The goal is to never lose visibility, avoid duplication, and keep side effects idempotent.


? 1. Classify Failures

Before you write retry logic, classify failures as either retryable or terminal.

Retryable failures include:

  • Network timeouts

  • 5xx (server) errors

  • Lock contention (UNABLE_TO_LOCK_ROW)

Action: Retry with exponential backoff and a maximum attempt cap.

Terminal failures include:

  • Validation or business rule errors

  • Missing reference data

  • Schema or data format mismatches

? Action: Dead-letter the event with full context for later investigation.


⚙️ 2. Prioritize Idempotency

External event buses operate on an “at-least-once delivery” model. That means the same event might arrive multiple times.

To prevent duplicate processing, enforce idempotency using a unique key from the event (e.g., DedupKey__c) and a unique record store (like Idempotency__c.Key__c UNIQUE).

Your logic should:

  • Claim then act — attempt to insert the dedup key first.

  • If the insert succeeds, process the event.

  • If it fails with a duplicate error, safely skip it.

This pattern ensures safe replays and prevents side effects like duplicate invoices or API calls.


? 3. Design a Retry Policy

Retries should be bounded and observable, not infinite loops.
Keep a per-event attempt counter (for example, JobRun__c.Attempt__c or a payload field).

Use:

  • Queueable Apex for asynchronous work.

  • Schedulable Apex for delayed retries (e.g., 1 min, 5 min, 15 min).

Stop after a fixed number of attempts, then send the event to your Dead-Letter Queue (DLQ) for review.


? 4. Build a Dead-Letter Queue (DLQ)

A Dead-Letter Queue is a dedicated custom object (e.g., Dead_Letter__c) where failed events are stored permanently until reviewed.

Include the following fields:

  • DedupKey

  • Payload

  • Error

  • Attempts

  • CorrelationId

  • ReplayId or Timestamp

  • Suggested Fix

Admins can view these records in a list view or dashboard, analyze root causes, and reprocess fixed messages from a Flow or custom button.

This pattern ensures no event disappears — every failure becomes traceable.


? 5. Observability

Reliability means nothing without visibility.
To monitor behavior:

  • Log AsyncApexJob IDs and attempt counts.

  • Capture the last error message and time.

  • Optionally publish a status Platform Event like “EventFailed” or “EventRetried” for alerting and monitoring.

These details allow both developers and admins to understand system health in real time.


? Real-World Example: Code Overview

Scenario

An external billing system publishes Invoice_Event__e (Created/Updated).
Salesforce consumes the event via a Platform Event Trigger.
The trigger enforces idempotency, calls an external API, applies bounded retries with backoff, and moves terminal failures to a Dead-Letter Queue.


Custom Objects Used

Idempotency__c

  • Key__c (Text, External ID, Unique)

  • ConsumedAt__c (Datetime)

Dead_Letter__c

  • DedupKey__c (Text)

  • EventType__c (Text)

  • Payload__c (Long Text)

  • Attempts__c (Number)

  • LastError__c (Long Text)

  • CorrelationId__c (Text)

  • FirstSeen__c, LastSeen__c (Datetime)


A) Entry — Platform Event Trigger (Claim Then Enqueue)

// Trigger on Invoice_Event__e (after insert)
trigger InvoiceEventTrigger on Invoice_Event__e (after insert) {
    List<InvoiceWork.Request> work = new List<InvoiceWork.Request>();

    for (Invoice_Event__e e : Trigger.New) {
        String dedup = e.DedupKey__c;        // from producer; deterministic
        if (String.isBlank(dedup)) continue;

        // Idempotency claim — duplicate means we've already processed
        if (!Idempotency.claim(dedup)) continue;

        work.add(new InvoiceWork.Request(
            dedupKey   = dedup,
            eventType  = e.EventType__c,
            payloadJson= e.Payload__c,
            correlationId = e.CorrelationId__c,
            attempt = 0
        ));
    }
    if (!work.isEmpty()) System.enqueueJob(new InvoiceWork(work));
}

B) Idempotency Helper (Unique Claim)

public with sharing class Idempotency {
    public static Boolean claim(String key) {
        try {
            insert new Idempotency__c(Key__c = key, ConsumedAt__c = System.now());
            return true;
        } catch (DmlException ex) {
            // DUPLICATE_VALUE → already processed (safe to skip)
            for (Database.Error e : ex.getDmlFields(0).getErrors())
                if (e.getStatusCode() == StatusCode.DUPLICATE_VALUE) return false;
            throw ex;
        }
    }
}

C) Retry Policy Utility (Delay Schedule + Classification)

public with sharing class RetryPolicy {
    public static final Integer MAX_ATTEMPTS = 4; // 0,1,2,3,4 = 5 tries total

    public static Integer nextDelayMinutes(Integer attempt) {
        // 0→1m, 1→5m, 2→15m, 3→60m, 4→(no next)
        Integer[] table = new Integer[]{1,5,15,60};
        return attempt < table.size() ? table[attempt] : null;
    }

    public static Boolean isRetryable(Exception ex) {
        String m = ex.getMessage();
        return m != null && (
            m.contains('UNABLE_TO_LOCK_ROW') ||
            m.contains('Read timed out') ||
            m.contains('500 ') || m.contains('502 ') || m.contains('503 ') || m.contains('504 ')
        );
    }
}

D) Worker (Queueable + Schedulable)

Processes events, retries transient ones with backoff, and DLQs terminal errors.

public with sharing class InvoiceWork implements Queueable, Database.AllowsCallouts, Schedulable {
    // Simple DTO for batched requests
    public class Request {
        public String dedupKey;
        public String eventType;
        public String payloadJson;
        public String correlationId;
        public Integer attempt;
        public Request(String dedupKey, String eventType, String payloadJson, String correlationId, Integer attempt) {
            this.dedupKey = dedupKey; this.eventType = eventType; this.payloadJson = payloadJson;
            this.correlationId = correlationId; this.attempt = attempt;
        }
    }
    private List<Request> batch;

    public InvoiceWork(List<Request> batch) { this.batch = batch; }

    public void execute(QueueableContext qc) {
        for (Request r : batch) {
            try {
                processOne(r); // may call out
            } catch (Exception ex) {
                handleFailure(r, ex);
            }
        }
    }

    // Schedulable “delayer” for backoff retries
    public void execute(SchedulableContext sc) {
        // Deserialize the payload from the Cron job's name if you choose, or store in a custom object.
        // For brevity, not implemented here; prefer passing state via a small record.
    }

    private void processOne(Request r) {
        // Example callout to external billing system
        HttpRequest req = new HttpRequest();
        req.setEndpoint('callout:BillingBridge/invoice'); // Named Credential
        req.setMethod('POST');
        req.setHeader('Content-Type', 'application/json');
        req.setHeader('X-Correlation-Id', r.correlationId);
        req.setBody(r.payloadJson);

        HttpResponse res = new Http().send(req);
        Integer code = res.getStatusCode();

        if (code >= 200 && code < 300) {
            // Success — nothing else to do (idempotency already claimed)
            return;
        }
        // Non-2xx: decide retryability
        if (code >= 500 || code == 429) {
            // Retryable
            throw new CalloutException('Retryable HTTP ' + code + ': ' + res.getBody());
        } else {
            // Terminal (e.g., 400/404/422)
            throw new AuraHandledException('Terminal HTTP ' + code + ': ' + res.getBody());
        }
    }

    private void handleFailure(Request r, Exception ex) {
        Boolean retryable = RetryPolicy.isRetryable(ex);
        Integer nextAttempt = r.attempt + 1;

        if (retryable && nextAttempt <= RetryPolicy.MAX_ATTEMPTS) {
            Integer delay = RetryPolicy.nextDelayMinutes(r.attempt);
            if (delay == null) { // no further delay slot
                deadLetter(r, ex);
                return;
            }
            // schedule a delayed retry via Schedulable wrapper
            String cron = cronInMinutes(delay);
            System.schedule('Retry '+r.dedupKey+' #'+nextAttempt, cron,
                new RetryLauncher(r, nextAttempt));
            return;
        }
        // Terminal or attempts exhausted → DLQ
        deadLetter(r, ex);
    }

    private void deadLetter(Request r, Exception ex) {
        insert new Dead_Letter__c(
            DedupKey__c = r.dedupKey,
            EventType__c = r.eventType,
            Payload__c = r.payloadJson,
            Attempts__c = r.attempt,
            LastError__c = ex.getMessage(),
            CorrelationId__c = r.correlationId,
            LastSeen__c = System.now(),
            FirstSeen__c = (r.attempt == 0 ? System.now() : null) // set on first write in real code
        );
    }

    private static String cronInMinutes(Integer minutes) {
        Datetime t = Datetime.now().addMinutes(minutes);
        // sec min hour day month ? year
        return String.format('0 {0} {1} {2} {3} ? {4}',
            new List<String>{
                String.valueOf(t.minute()),
                String.valueOf(t.hour()),
                String.valueOf(t.day()),
                String.valueOf(t.month()),
                String.valueOf(t.year())
            });
    }

    // Tiny Schedulable to re-enqueue a single request after backoff
    global class RetryLauncher implements Schedulable {
        Request r; Integer attempt;
        public RetryLauncher(Request r, Integer attempt) { this.r = r; this.attempt = attempt; }
        global void execute(SchedulableContext sc) {
            r.attempt = attempt;
            System.enqueueJob(new InvoiceWork(new List<Request>{ r }));
        }
    }
}

? Why This Works

  • Idempotency ensures safe replays — duplicate messages won’t double-process.

  • Bounded retries handle temporary errors without overwhelming dependencies.

  • Dead-lettering captures hard failures for visibility and manual intervention.

  • Queueable + Schedulable gives flexibility for async and delayed processing.

This architecture keeps your external integrations reliable, observable, and maintainable

External event consumption


? Final Thoughts

Reliable external event consumption in Salesforce comes down to three principles:

  1. Idempotent consumers — process each event exactly once.

  2. Smart retry policies — backoff and limit attempts.

  3. Dead-letter visibility — store and review failures safely.

Treat every failure as either retryable (retry with backoff) or terminal (send to DLQ).
By persisting detailed diagnostic context, operations teams can confidently fix issues and re-enqueue events — ensuring no data loss and no infinite loops.

This pattern keeps your event-driven pipelines resilient, predictable, and production-grade.

  • October 21, 2025