Bulk Processing vs Single Item Processing

Avoid bulk processing whenever possible; focus on single item processing.

 

Popularity and Problems of Bulk Processing

In today's world, many software applications are focused on big amount of data and problems around processing it. Big amount of data means many rows or objects in a database or other kind of storage, when many of them need to go through the same kind of process. When developers solve such assignments, they tend to think that the obvious answer is in bulk processing.

I must say, there is nothing wrong in bulk processing in specific circumstances. It's not the silver bullet though. In concrete cases (which also seems to be the most common case), you should be avoiding bulk processing and favoring a single item processing instead.

Let's assume that we are writing a fairly simple application, where a user provides some kind of input, and then the program makes a single kind of modification to many rows in the database. If that's all the program does, we should not be worried at all and use bulk processing. Indeed, if we look at the code or try to debug it, we expect that many rows will be looped through and modified by the program code.

Unfortunately, most of the applications we write professionally, are not that simple, and bulk processing can work against our productivity long run.

To make this point clearer, let's assume that the program I mentioned above needs to be upgraded so that it makes several kinds of changes to the same number of rows, for the same kind of input. Specifically, database holds list of employees, their bonuses, and their addresses. User enters a percentage of bonus (multiplier), which is same for all the employees, based on company's performance. Steps that the program should do include:

  1. For each employee record, calculate bonus amount based on the company's performance multiplier.
  2. Then, for each employee record, find the hiring date, and subtract the prorated amount from the bonus amount.
  3. Once these numbers are found, send a request to the HR department for issuing a check with that amount to the addresses of the employees.

 

It's an unfortunate reality that the requirements are often formed exactly as it's written above. Don't get me wrong - it's absolutely natural to speak about the expectations in this manner - just by listing all the things that need to happen, and describing targets for each kind of operation. It's only unfortunate from the technical standpoint, because developers often take it literally and implement as it's written - as a bulk processing - without applying any modeling or design thinking to the problem.

So, if we naivly design it as it sounds, we will have 3 consecutive loops, one for each item in the above list of steps; each of the loops goes through all the records (either employees or their bonuses or their addresses) and does the actions described above, for each of the found row, within a loop.

Below is the sample pseudo code for it. Though this is very naive example for the sake of example. In real life code, different loops are span across several different method bodies, all modifying same collection in series of procedures, by passing it around everywhere (do you recall you've seen something like that? I also have an article talking about this from another perspective):

//bulk processing - warning: not recommended!

public void ProcessAllEmployees(int bonusMultiplier)
{
	var allEmployees = GetAllEmployees();
	
	ApplyMultiplier(allEmployees); //loop inside.

	SubtractProratedBonusDueToHiringDate(allEmployees); //loop inside.

	SendHRCheckRequest(allEmployees); //loop inside.
}

 

This is bulk processing, and here are the problems it brings:

 

Workflow and Benefits of Single Item Processing

Requirements described above can be rewritten in a slightly different manner, which makes relevance of the single item processing apparant.

Take a single employee and run the following actions for it:

  1. Given the multiplier, calculate bonus amount for the employee.
  2. Given the employee's hiring date and bonus, calculate prorated bonus amount for the employee.
  3. Given the employee's most recent address and prorated bonus amount, issue a request to HR system for sending a check.

Repeat steps above for each employee in the company.

 

Although uncommon, if the requirements were written like I just showed above, even the same naive developers would probably think about writing a code without loops. Maybe just one loop which initiates the process for a single employee, but not three different loops in a row. This approach is not a bulk processing anymore, since the flow won't be expressed as just loops flowing into each other. Instead, we would have a continuous chain of actions for the single employee. This makes our intentions apparent, since we just do the 3 steps in a row.

Pseudo code is below. Ideally, most of the methods will be called on the employee instance itself. Below code looks like a procedural data-driven code just for the simplicity's sake:

//single item processing - recommended!

public void ProcessAllEmployees(int bonusMultiplier)
{
	var allEmployees = GetAllEmployees();
	
	foreach (var employee in allEmployees)
	{
		ApplyMultiplier(employee);
		SubtractProratedBonusDueToHiringDate(employee);
		SendHRCheckRequest(employee);
	}
}

Here are the benefits that we gain, compared to the problems described with bulk processing before:

 

When to Prefer Bulk Processing

Example described above - although most common among the requirements - is somewhat specific. Processing of a single employee record is not tied with other employee records. If I had asked to calculate an average age of all the employees, we wouldn't be able to solve it without bulk processing. That's because for calculating the average number, we need to loop through all the records (or all the ages specifically) within a single loop, sum them up, and then divide the sum with the total number of all rows.

 

It's very important to understand that most of the time bulk processing can be avoided, even though requirements may sound otherwise. Given the strong benefits of the single item processing approach, I highly recommend that you keep eye on opportunities for using it.

Want even more opportunities for improvement? Read other articles around software architecture or take one of my training courses for your entire team.

 

 

About Author and Content

Author of the above content is Tengiz Tutisani - owner and technical leader at tutisani.com.

If you agree with the provided thoughts and want to make it part of your team's culture, we can help.
We provide in-person, immersive technical training courses around Software Architecture, Domain-Driven Design, and Extreme Programming topics. We also develop software solutions.

Let's Talk!