getQueryLocator vs. Iterator in Apex

When you write a Batch class in Salesforce, the start method asks you a fundamental question: “How should I get the data?”

You have two options: Database.getQueryLocator or Iterable<sObject>.

While 90% of developers default to getQueryLocator, understanding when and why to use an Iterator distinguishes a senior developer from a junior one. Let’s break down the differences with real-time use cases and technical examples.

1. getQueryLocator

Imagine you need to fill a swimming pool. You connect a hose/pipe directly to the main water supply (the Database). The water flows continuously and efficiently. You don’t need to carry the water yourself; the pressure from the main supply does the work.

  • In Apex: You pass a SOQL query string directly to Salesforce. The platform handles the heavy lifting, streaming up to 50 million records automatically.

2. Iterator 

Now imagine you need to fill that pool, but the water isn’t coming from a single tap. Some water is from a well, some is from bottled water, and some is filtered rain. You have to manually collect it in buckets, check the quality, maybe mix it, and then pour it into the pool.

  • In Apex: You write custom logic to gather data. This data might come from complex calculations, external APIs, or a mix of multiple objects. You hand-feed the batch job to the final list.

Now we got what is the real purpose of getQueryLocator and Iterator, lets dig more into real examples 

Understanding Database.getQueryLocator

getQueryLocator is the simplest way to return a massive SOQL result set in a Batch Apex job.

The single most important reason to use getQueryLocator is the Governor Limit bypass.

Standard SOQL Limit: 50,000 records. If you query 50,001 records into a List, your code explodes.

QueryLocator Limit: 50,000,000 records. Because it streams data, Salesforce allows you to touch up to 50 million records in a single batch job.

You can implement getQueryLocator in two ways. Both work, but they have subtle differences.

1. The Inline SOQL: This is the modern, preferred approach if your query is static.

global Database.QueryLocator start(Database.BatchableContext BC) {
    // Salesforce validates this query when you save the file. 
    // If 'Status__c' doesn't exist, you can't save the code.
    return Database.getQueryLocator([SELECT Id, Name FROM Account]);
}

2. The Dynamic String: Use this if you need to change the query based on variables (e.g., passing a date filter into the batch class constructor).

global Database.QueryLocator start(Database.BatchableContext BC) {
    String query = 'SELECT Id, Name FROM Account WHERE CreatedDate = :dateVariable';
    // Salesforce does NOT check this until the code actually runs.
    return Database.getQueryLocator(query);
}

Scenario: We will build a Batch class designed to delete old “Log” records (or any custom object) that are older than a specific number of days. This is a perfect use case because Log tables often grow into the millions, making standard queries impossible.

You have a custom object System_Log__c. You have 2 million records, and you want to delete everything older than 90 days to save storage space.

global BatchLogCleaner implements Database.Batchable<sObject>, Database.statefull{
Public Integer totalRecordsDeleted = 0; 
Public Database.QueryLocator start(Database.BatchableContext bc){
return Database.getQueryLocator([Select Id, name From System_log__c where createdDate = Last_90_days]);
}
Public void execute(Database.BatchableContext bc, List<System_Log__c> scope){
Delete scope; // Deleting records in a batch
totalRecordsDeleted += scope.size();
}
Public void finish(Database.BatchableContext bc){
      System.debug('Batch Job Complete. Total Logs Deleted: ' + totalRecordsDeleted); 
}
}

How to Run It

You would execute this from the Developer Console > Anonymous Window or a scheduled job.

BatchLogCleaner bc = BatchLogCleaner();
// Execute with a batch size of 2000 (Max optimized size for simple deletes)
Database.executeBatch(bc, 200);

Limitations about Database.getQueryLocator:

Even though it is powerful, getQueryLocator has rules:

  1. No Aggregate Queries: You typically cannot use GROUP BY or aggregate functions (like SUM, COUNT) in a QueryLocator. It is designed for retrieving raw records (sObjects), not summarized data.
  2. Subquery Fetch Limits: While the main query supports 50 million records, subqueries (e.g., SELECT Id, (SELECT Id FROM Contacts) FROM Account) can sometimes hit fetch limits if the child relationships are too deep or massive.
  3. Unordered Id Preservation: If you use ORDER BY in your query, Salesforce attempts to honor it, but for massive datasets (millions of rows), sorting can sometimes degrade performance or be overridden by internal chunking optimization.

Understanding Iterator:

While getQueryLocator is the automatic, high-speed highway for Salesforce data, the Iterator is the off-road vehicle. It allows you to go where standard SOQL cannot process complex lists, external data, or data that requires heavy calculation before the job even starts.

Scenario: You are running a “Weekly Sales Leaderboard” batch. You need to rank Sales Reps based on a complex formula that involves:

  1. Closed Opportunities.
  2. Customer Satisfaction Scores (stored in a different object).
  3. Number of phone calls made (stored in Tasks).

A single SOQL query cannot join all these tables efficiently or perform the complex math required to “Rank” them.

The Solution: We perform the heavy math in the start method, build a list of Custom Wrapper Objects, and then pass that list to the Batch to send out the emails.

global class WeeklyScoreBatch implements Database.Batchable<WeeklyScoreBatch.RepScore> {

    // 1. Define a Wrapper Class to hold our complex data
    // This is NOT a Salesforce object; it exists only in memory.
    global class RepScore {
        public Id userId;
        public String userName;
        public Decimal totalScore;
        
        public RepScore(Id uid, String name, Decimal score) {
            this.userId = uid;
            this.userName = name;
            this.totalScore = score;
        }
    }

    // 2. START: The Iterator
    // Notice the return type is Iterable<RepScore>, not QueryLocator
    global Iterable<RepScore> start(Database.BatchableContext BC) {
        
        List<RepScore> scorecard = new List<RepScore>();
        
        // --- Complex Logic Starts Here ---
        // Imagine we have complex logic that fetches Users, 
        // loops through their Opps and Tasks, and calculates a score.
        // (Simplified for brevity)
        
        List<User> users = [SELECT Id, Name FROM User WHERE IsActive = TRUE LIMIT 1000];
        
        for(User u : users) {
            // Perform math that SOQL can't do
            Decimal mathScore = (Math.random() * 100); 
            
            // Add to our custom list
            scorecard.add(new RepScore(u.Id, u.Name, mathScore));
        }
        
        // Return the simple List. Lists implement Iterable automatically!
        return scorecard;
    }

    // 3. EXECUTE: Processing the Wrappers
    // Notice the scope is List<RepScore>, not List<sObject>
    global void execute(Database.BatchableContext BC, List<RepScore> scope) {
        
        for(RepScore rep : scope) {
            // Now we process our custom object
            System.debug('Processing Score for: ' + rep.userName + ' Score: ' + rep.totalScore);
            
            // Example: Create a record based on this wrapper
            // Performance_Log__c log = new Performance_Log__c();
            // log.User__c = rep.userId;
            // log.Score__c = rep.totalScore;
            // insert log;
        }
    }

    global void finish(Database.BatchableContext BC) {
        System.debug('Leaderboard calculation complete.');
    }
}

Limitations of Iterator in Salesforce Batch Apex

Iterator gives you flexibility, but it’s not perfect. Here’s what you need to be aware of before using it in production.

  1. Database.getQueryLocator can handle up to 50 million records.Iterator cannot. Because Iterator loads data into Apex memory (List, Set, custom structure). The moment your list crosses limits, you hit heap size errors.
  2. QueryLocator fetches chunks of data as needed. Iterator doesn’t. You are responsible for loading and preparing all data upfront inside the constructor.
  3. If a batch fails mid-execution:QueryLocator batches resume exactly from next chunk. Iterator batches do not automatically recover
  4. Batch Apex is stateless. This means between the start, execute, and finish methods, Salesforce serializes (saves) your variables and objects to the database and deserializes (reloads) them. 

If your Iterator class holds references to things that cannot be serialized such as HTTPResponse objects, open JSONParser streams, or Savepoint variables the batch job will crash with a SerializationException.

Satyam parasa
Satyam parasa

Satyam Parasa is a Salesforce and Mobile application developer. Passionate about learning new technologies, he is the founder of Flutterant.com, where he shares his knowledge and insights.

Articles: 66

One comment

  1. The distinction between “inline” (or “static”) SOQL and “dynamic” SOQL is incorrect. Static SOQL allows use of bindings (such as the example “:dateVariable”). In fact, static SOQL bindings are more sophisticated, allowing the binding value to be determined by evaluating arbitrary code, inline, within the binding.

    The true distinction between static and dynamic SOQL is whether you need to actually change the “select list” (the fields to return), the WHERE clause conditions and/or the ORDER BY choice. Use static SOQL if you do not need to dynamically change these parameters, based on the state of the Batchable implementation, and dynamic SOQL if you do.

    For example, I could have my batch accept the name of a field set used to define the select list. In this case, I have to use dynamic SOQL. Alternatively, the batch might query some custom metadata to determine the WHERE clause conditions to apply, so again I must use dynamic SOQL. Finally, the batch accepts an “after this creation date/time” variable in its constructor. In this case, I can just use static SOQL.

    On top of that, the example batch implementations are declared global. This is not required (and hasn’t been for at least about 10 years) unless you plan to allow these classes to be visible across a namespace boundary.

    Another thing to note; the example log deletion scenario code uses the erroneous but valid combination of:

    … implements Database.Batchable

    and:

    … execute(Database.BatchableContext bc, List scope) …

    Note that the generic type in the declaration is “sObject” (better as “SObject”) but the generic type on “execute” is “System_Log__c”. To accurately reflect true generics, both *should* be the same:

    … implements Database.Batchable
    … execute(Database.BatchableContext bc, List scope) …

    A further point – there are some mistakes in the example code (such as comments mentioning a “Status__c” field that isn’t in the static SOQL, Database.Stateful being misspelled and calling an Apex class a “wrapper class” even though it is just a data transfer object, not wrapping an SObject record at all).

    The concept of the example where “We perform the heavy math in the start method, build a list of Custom Wrapper Objects, and then pass that list to the Batch to send out the emails” is somewhat erroneous and is immediately open to hitting governor limits. (You even implicitly acknowledge this issue since you use LIMIT to keep the processing to at most 1000 users, arbitrarily selected, instead of dealing with all users.)

    It would be better to build a query locator based batch that iterates the Sales Reps and build state in the batch that is finally used in the finish method to generate and email out the finalized “leaderboard”. That way you avoid hitting heap, SOQL query row or CPU limits if that math really is that heavy.

    In the “limitations” section you have some points I would contest.

    I believe you can write the Iterator class to perform dynamic population of the data set, determining what to return each time hasNext or next is invoked, rather than “loading and preparing all data upfront inside the constructor”, but honestly I’ve never found a real life scenario where I’ve needed to use Iterator instead of Query Locator.

    Additionally, Batch apex is not stateless. If you implement a batch with non-static class attributes, the values for these attributes are maintained through the entire lifecycle of the batch. If you want to allow changes to these attributes’ values to be retained between calls to start, execute and/or finish then you do have to implement Database.Stateful, but all this really does is ensure the Batch instance gets re-serialized into Salesforce’s internal storage after each call to one of these methods (as you do correctly say).

Leave a Reply

Your email address will not be published. Required fields are marked *