The tale of careless update DML in Apex

A pretty common pattern you’ll see in Apex is to query a collection of records, with the fields needed to perform some processing, iterate over these records performing the processing, update some fields in the records as a result, and finally perform a DML with those records.

This pattern is potentially dangerous and something that should be avoided because of a fatal flaw in how Apex-based update DML functions: all the populated fields in the records get written back to the database, regardless of whether they have been updated by the code or not.

The broken query/modify/update pattern

This is the pattern where the results of a query are directly modified and used in the update DML.

An example, 1, of this pattern might be:

Account[] accounts = [
  SELECT Id, Field1, Field2, Field3, Field4
    FROM Account
    WHERE ...
];


for (Account account : accounts) {
   account.Field2 += account.Field1 +
     account.Field3 * UserInfo.getUserId().hashCode() /
     account.Field4;
   account.Field5 = 'Something';
}


update accounts;

See how the changes are made directly in the queried records, and those modified record instances are used in the update DML. Looks OK, right? (I used some data that doesn’t come from the Account record to make it clear I couldn’t just use a formula field here.)

Now let’s consider another processing thread that might be processing the same records, but for a different purpose, with example 2:

Account[] accounts = [
 SELECT Id, Field1, Field2, Field6
   FROM Account
     WHERE ...
];


for (Account account : accounts) {
   account.Field7 = account.Field1 +
     account.Field2 - UserInfo.getUserId().hashCode();
}


update accounts;

Again, this looks OK (though Field6 was queried but not used).

But what happens if I have different transactions calling these pieces of code at more-or-less the same time?

Here’s a representation of a possible sequence of events in this case, where I assume the two transactions, A and B, both want to update Account X, where transaction A is running example 1 and transaction B is running example 2. The events are illustrated in time order:

  1. A: “Account[] accounts = […]” gets Account X with Field1, Field2, Field3 and Field4 populated.
  2. B: “Account[] accounts = […]” gets Account X with Field1, Field2 and Field6 populated.
  3. A: “account.Field2 += …” and “account.Field5 = …” update the Account X in-memory record to change Field2 and to add Field5.
  4. B: “account.Field7 = …” updates the Account X in-memory record to add Field7.
  5. A: “update accounts” updates Account X in the database, setting Field1, Field3 and Field4 to their original values from step 1 and updating Field2 and Field5.
  6. B: “update accounts” updates Account X in the database, setting Field1, Field2 and Field6 to their original values from step 2 and updating Field7.

At the end of this interleaved sequence, the change to Field2 made by transaction A has been lost.

This is all down to the fact that the two transactions can run concurrently against the same record and Apex’s update DML isn’t clever enough to only store fields that the code has explicitly modified since the record was queried.

How to fix this broken pattern

There are a couple of options that spring to mind:

  1. Use FOR UPDATE queries.
  2. Ensure only relevant fields are included in the DML.

The FOR UPDATE pattern

The simplest solution, in terms of code change, is to ensure that all the processes query records with the FOR UPDATE keywords in the SOQL. The disadvantage of this approach is that processes that would end up updating different fields on the same record get effectively turned into serial, instead of parallel, activities. This slows the processing down on the org and can even lead to Query Exceptions being thrown if it takes more than 10 seconds to gain the lock on the required records when querying them.

Account[] accounts = [
       SELECT Id, Field1, Field2, Field3, Field4
       FROM Account
       WHERE ...
       FOR UPDATE
];

The query/add/update pattern

This approach updates the code to make sure that the DML does not include extraneous fields. This takes slightly more coding effort and increases the heap usage, but generally speaking allows for maximum throughput while avoiding loss of concurrent changes in other fields.

Considering the earlier examples, the code for example 1 can be changed to apply this revised approach, thus:

Account[] accounts = [
       SELECT Id, Field1, Field2, Field3, Field4
       FROM Account
       WHERE ...
];


Account[] updates = new Account[0];


for (Account account : accounts) {
   updates.add(new Account(
           Id = account.Id,
           Field2 = account.Field2 +
                   account.Field1 +
                   account.Field3 * UserInfo.getUserId().hashCode() /
                   account.Field4,
           Field5 = 'Something'));
}


update updates;

The updates contain clean, in-memory records with the required IDs and just the fields that must be updated.

Example 2 is similarly revised, thus:

Account[] accounts = [
 SELECT Id, Field1, Field2
   FROM Account
     WHERE ...
];


Account[] updates = new Account[0];


for (Account account : accounts) {
   updates.add(new Account(
           Id = account.Id,
           Field7 = account.Field1 +
                   account.Field2 - UserInfo.getUserId().hashCode()));
}


update updates;

Again, the updates contain clean, in-memory records (and no longer unnecessarily queries Field6).

If these two pieces of code were to be executed in parallel again, and happened to interleave as shown in the previous section, this time transaction A’s changes to Field2 would no longer be lost.

Conclusion

Adopting the query/add/update approach ensures that any Apex code supports concurrent use while minimizing the duration for which records are locked in the database. Note that there can still be good reasons to use FOR UPDATE, though over-use of these types of query can certainly reduce throughput and result in a worse user experience in some scenarios.

NB: I do not suggest writing code using the structure illustrated in this article; there are good reasons for separation of concerns and modularity. However, mixing the querying, business logic and DML as shown here helps show the problem in a concise manner.

Share your love
Phil W
Phil W

Phil is a Product Architect for Bullhorn, a Salesforce Summit tier ISV Partner. He has been developing and architecting Salesforce AppExchange products since 2017, with a long career before that in product development and professional services in the defence, healthcare, telecommunications and workforce management spaces.

Articles: 2

Leave a Reply

Your email address will not be published. Required fields are marked *