Handling Large Data Volumes in Salesforce: Smart Batch Design, QueryMore, and Governor Limit Strategies

Share

When your Salesforce org contains millions of records, how you read and process that data matters just as much as what you do with it. Salesforce’s Batch Apex is designed to process massive datasets within governor limits—but to use it effectively, you need well-tuned scope sizes, efficient “queryMore”-style pagination, and code that gracefully adapts to those limits.

This guide walks through proven, field-tested strategies for scaling Batch Apex to handle large data volumes without hitting governor walls.


1) Batch Building Blocks — What Scales and What Doesn’t

Batchable + QueryLocator

Use this when one SOQL query can describe your entire dataset.

  • Streams up to 50 million records server-side — Salesforce automatically handles the internal queryMore.

  • Ideal for simple, index-friendly queries.

  • Keeps processing lightweight and highly efficient.

Batchable + Iterable

Use this when your data comes from multiple sources, complex filters, or even external APIs.

  • You implement your own pagination loop (similar to queryMore), usually with keyset pagination (WHERE Id > :lastId ORDER BY Id LIMIT N).

  • There’s no automatic 50M-row streaming; you control how the data flows.

  • Great for flexible or custom data retrieval patterns.


2) Scope Size Tuning — Finding the Sweet Spot

Salesforce’s default batch scope is 200 records, which typically balances performance with stability. However, adjusting it can make a big difference.

Lower your scope (50–100) if:

  • You’re doing callouts or heavy computations.

  • You encounter heap size or CPU limit issues.

  • You update related records (like parent objects) and see UNABLE_TO_LOCK_ROW errors.

Increase your scope (300–500) only if:

  • Your rows are lightweight and logic is minimal.

  • You’ve measured CPU, SOQL, and DML usage and still have room to spare.

? Tip: Each batch execute call runs independently, with its own set of governor limits. Design your logic so every chunk can succeed on its own.


3) Understanding “queryMore” in Apex

With QueryLocator

When you use a Database.QueryLocator, Salesforce automatically performs queryMore behind the scenes. You don’t need extra code — it streams the records seamlessly.

DIY “queryMore”

If you’re using an Iterable or running outside a batch context, mimic queryMore using keyset pagination.

  • Avoid using OFFSET (it’s slow and limited).

  • Store your progress using a “cursor” (like lastSeenId or LastModifiedDate) so you can resume where you left off.


4) Designing for Governor Limits

Governor limits are what keep Salesforce multitenant and stable. Designing with them in mind separates reliable jobs from those that randomly fail.

Key principles:

  • Selective queries: Always filter by indexed fields (Id, OwnerId, lookup, external ID, audit fields). Avoid wide LIKE filters.

  • SOQL reuse: Query once, cache in a Map, and never put queries or DML inside inner loops.

  • Efficient updates: Only update changed fields, and aim for a single DML statement per object type per execute.

  • Lock handling: Group data by parent, sort records consistently, and retry smaller chunks only if you hit a lock.

  • Memory efficiency: Use counters instead of storing lists when using Stateful. If you need a record later, re-query it by Id.

  • Partial success: Use Database.update(list, false) and capture individual errors instead of letting one record fail the entire batch.

  • Observability: Always log your AsyncApexJob.Id, totals, and failure counts. Make sure admins can easily track what happened.


Real-World Example (With Code)

Scenario:
You need to rebuild a denormalized metric on Contact for inactive customers.

  • Stream millions of records efficiently using QueryLocator.

  • Keep each batch scope lightweight and within governor limits.

  • For complex filters, offer an Iterable version with keyset-style pagination.


A) Batch Using QueryLocator (Salesforce Handles Internal queryMore)

Scenario
Rebuild a denormalized metric on Contact for inactive customers:

  1. Stream up to tens of millions with QueryLocator.

  2. Keep each scope light and governor-aware.

  3. For orgs with exotic filters, provide an Iterable variant with keyset “queryMore.”

A) Batch with QueryLocator (Salesforce handles internal queryMore)

// ContactRebuildBatch.cls
global with sharing class ContactRebuildBatch
implements Database.Batchable<SObject>, Database.Stateful {

    // optional counters for a summary email
    public Integer ok = 0, bad = 0;

    global Database.QueryLocator start(Database.BatchableContext bc) {
        // Selective filter: use indexed fields to stay fast and selective
        return Database.getQueryLocator([
            SELECT Id, AccountId, LastActivityDate, Inactive_Flag__c, Metric__c
            FROM Contact
            WHERE IsDeleted = false
              AND LastModifiedDate = LAST_N_YEARS:5
        ]);
    }

    global void execute(Database.BatchableContext bc, List<Contact> scope) {
        // Governor-aware: hydrate parents once
        Set<Id> acctIds = new Set<Id>();
        for (Contact c : scope) if (c.AccountId != null) acctIds.add(c.AccountId);

        Map<Id, Account> parents = acctIds.isEmpty() ? new Map<Id, Account>() :
            new Map<Id, Account>([
                SELECT Id, Status__c, Tier__c
                FROM Account WHERE Id IN :acctIds
            ]);

        // Compute minimal changes
        List<Contact> updates = new List<Contact>();
        for (Contact c : scope) {
            Account a = parents.get(c.AccountId);
            Boolean inactive = (a == null || a.Status__c == 'Inactive');
            Integer newMetric = computeMetric(c, a);

            Boolean changed = (c.Inactive_Flag__c != inactive) || (c.Metric__c != newMetric);
            if (changed) {
                updates.add(new Contact(
                    Id = c.Id,
                    Inactive_Flag__c = inactive,
                    Metric__c = newMetric
                ));
            }
        }

        if (!updates.isEmpty()) {
            Database.SaveResult[] sr = Database.update(updates, /* allOrNone */ false);
            for (Database.SaveResult r : sr) {
                if (r.isSuccess()) ok++; else bad++;
            }
        }

        // Soft guardrails & telemetry when approaching limits
        if (Limits.getQueries() > 80 || Limits.getDmlRows() > 9000) {
            System.debug(LoggingLevel.WARN, 'Near limits: Q=' + Limits.getQueries() + ' DMLRows=' + Limits.getDmlRows());
        }
    }

    private Integer computeMetric(Contact c, Account a) {
        // Keep cheap: arithmetic/booleans only
        Integer base = (c.LastActivityDate == null) ? 0 : 1;
        if (a != null && a.Tier__c == 'Platinum') base += 5;
        return base;
    }

    global void finish(Database.BatchableContext bc) {
        System.debug('Contacts updated. Success=' + ok + ' Failed=' + bad);
        // Optionally send a summary email or publish a Platform Event
    }
}

Usage (anonymous apex):
Database.executeBatch(new ContactRebuildBatch(), 200);
Tweak scope (e.g., 100) if you see CPU/heap pressure or lock contention.


B) Iterable Batch with Manual “queryMore” (Keyset Pagination)

Use this when one SOQL can’t capture your entire dataset (for example, if you’re combining multiple filters) or when you need a reliable resume mechanism.

// KeysetIterable.cls
public with sharing class KeysetIterable implements Iterable<Contact>, Iterator<Contact> {
    private Id lastId;
    private Integer pageSize;
    private List<Contact> buffer = new List<Contact>();

    public KeysetIterable(Integer pageSize) {
        this.pageSize = Math.max(1, Math.min(pageSize, 2000));
    }

    public Iterator<Contact> iterator() { return this; }

    public Boolean hasNext() {
        if (!buffer.isEmpty()) return true;
        // "queryMore" style: keyset pagination
        String soql = 'SELECT Id, AccountId, LastActivityDate, Metric__c FROM Contact ' +
                      (lastId == null ? '' : 'WHERE Id > :lastId ') +
                      'ORDER BY Id LIMIT :pageSize';
        buffer = Database.query(soql);
        return !buffer.isEmpty();
    }

    public Contact next() {
        Contact c = buffer.remove(0);
        lastId = c.Id;
        return c;
    }
}
// Iterable-based Batch
global with sharing class ContactRebuildIterableBatch
implements Database.Batchable<Contact> {

    private Integer pageSize;
    public ContactRebuildIterableBatch(Integer pageSize) { this.pageSize = pageSize; }

    global Iterable<Contact> start(Database.BatchableContext bc) {
        return new KeysetIterable(pageSize);
    }

    global void execute(Database.BatchableContext bc, List<Contact> scope) {
        // exactly the same governor-aware processing as Example A
        // ...
    }

    global void finish(Database.BatchableContext bc) { /* summary */ }
}

Notes:

  • Keyset pagination on an indexed field (like Id) behaves much like manual queryMore.

  • Store lastId in a custom object (e.g., JobRun__c) if you want to resume after a failure.

  • Avoid using OFFSET for large datasets—it’s slower and capped.

C) Optional: Lock-Aware Updates with Chunked Retries

public with sharing class LockSafe {
    public static void updateSafe(List<SObject> rows) {
        if (rows.isEmpty()) return;
        try {
            update rows;
        } catch (DmlException ex) {
            if (ex.getMessage().contains('UNABLE_TO_LOCK_ROW') && rows.size() > 1) {
                Integer mid = rows.size() / 2;
                updateSafe(rows.subList(0, mid));
                updateSafe(rows.subList(mid, rows.size()));
            } else throw ex;
        }
    }
}

Chaining strategies 1


Final Thoughts

When you’re working with massive Salesforce datasets, the key is to design your batch jobs like small, self-contained transactions.

  • Use QueryLocator whenever possible — it streams automatically and handles queryMore.

  • Switch to Iterable + Keyset when you need custom paging or resumability.

  • Tune your scope based on data shape and real measurements.

  • Keep your logic bulkified, selective, and lock-aware.

Follow these principles, and your jobs will scale to millions of records smoothly—without tripping over Salesforce’s governor limits.

  • October 21, 2025