Concurrency and Parallelism

Introduction

Many large tasks can be broken into smaller, manageable pieces to improve efficiency. In computing, concurrency and parallelism allow multiple tasks to be divided in a way that they can progress independently, making better use of available resources.

For example, consider preparing a breakfast consisting of scrambled eggs, muffins, and sliced fruit. Some steps must happen sequentially:

Cheese must be shredded before being added to the eggs.
Muffin batter must be prepared before it is baked.

However, many tasks can overlap in progress:

While waiting for the muffins to bake, eggs can be cooked.
While waiting for the eggs to cook, fruit can be sliced.

For the sake of the following discussions, a chef in the kitchen is comparable to a CPU core, and a cooking task is comparable to a process or thread.

Concurrency

Concurrency is the ability of a system to manage multiple tasks at the same time. These tasks may not necessarily execute simultaneously but are interleaved by the kernel so that no task unfairly hoards all of the time on the processor.

Consider a single chef working in the kitchen to prepare breakfast. The chef may leave the muffins to bake and switch to cooking eggs, but the chef is never physically working on both tasks at once. Similarly, a computer with a single core interleaves multiple processes by rapidly switching execution between them. This is concurrency: multiple tasks making progress without necessarily running simultaneously.

Furthermore, the chef never spends a long, consecutive time on a single task in the kitchen. If the chef were to stay blissfully ignorant of the other items while watching the muffins bake, then the eggs may burn and the fruit would never be sliced. Instead, the chef frequently “checks in” on each task and spends a few moments with each before moving on to the next.

Physically, the switching of contexts and movement between the stove with the eggs to the cutting board with the fruit “wastes” some time where the chef is working on neither of the tasks. In a really busy kitchen, this movement could be constricted by extra bodies in the isles that slow down the time to change from one end of the kitchen to the other end. However, the nature of different tasks in the kitchen and also different kinds of computations on a computer necessitate this kind of rapid context switching to keep many different tasks moving forward with limited computational resources.

Concurrency on Computers

A computer system is described as “executing concurrently” when a single CPU core is assigned multiple tasks. In this case, the core is only ever working on a single task at a time, although multiple tasks are overlapping in progress.

Concurrent Eligibility

In the breakfast example, the muffins, eggs, and fruit can each be handled concurrently because they are independent of each other.

Mathematically, operations that follow the associative property (e.g., addition and multiplication) lend themselves well to concurrency:

\[1 + (2 + 3) = (1 + 2) + 3\] \[1 \times (2 \times 3) = (1 \times 2) \times 3\]

These operations can be structured in a way that different segments of a computation can make progress without requiring strict ordering. However, not all kinds of computation can be reordered and cannot be broken into concurrent parts. For example:

\[1 \div (2 \div 3) \ne (1 \div 2) \div 3\]

Therefore, in order to make tasks compute faster, the designer must first find creative ways to break the task into smaller chunks while preserving the correct answer.

Example

A classic computing example of concurrency is the make build system. A Makefile specifies dependencies between targets, forming a partial order. If certain targets are independent, make can schedule them to be built concurrently. However, if a system has only one CPU core, these independent tasks will not truly run at the same time; rather, the system will switch between them rapidly.

Parallelism

Parallelism is a specific form of concurrency in which multiple tasks execute at the same time, requiring multiple CPU cores. While concurrency often requires structuring tasks to allow independent progress, parallelism is about physically executing multiple tasks simultaneously.

Returning to the breakfast example, concurrency allows the tasks to be interleaved, but true parallelism occurs if multiple people are cooking at the same time—one person frying eggs while another slices fruit and another bakes muffins. The work is not just structured efficiently; it is actually happening simultaneously. True parallelism requires a second chef to enter the kitchen and begin slicing the fruit while the first chef remains hard at work with the eggs. If labor shortages or kitchen capacity prevented a second chef from becoming involved in the meal preparation, then the overall process would not actually be performed in parallel.

Importantly, parallelism requires concurrency, but not all concurrency leads to parallel execution. A concurrent algorithm can execute in parallel if the hardware allows it, but on a single-core processor, it will still rely on context switching rather than true simultaneous execution.

Parallelism on Computers

In computing, parallelism occurs when multiple cores or processors execute independent tasks at the same time.

A system is described as “executing in parallel” when multiple cores are simultaneously computing different tasks.

Even when a computer has multiple cores, it may still be “executing concurrently” when there are more active processes than cores available. In this case, each of the cores individually handles multiple processes concurrently.

Modern consumer laptops typically have 4 to 8 cores. Often, some of these cores are optimized for power-efficient computations to preserve battery life, while others are optimized for high-performance computations.

On the BYU supercomputer, each compute node has between 20 and 128 cores, depending on the cluster. The exact details of each cluster can be viewed on the Research Computing website. In total, the BYU Supercomputer network currently has ~30,000 CPU cores available for computation!

Example

For example, in array summation, a concurrent approach might involve breaking the array into sections and scheduling computations to interleave on a single-core processor. A parallel approach, in contrast, would assign different sections to multiple CPU cores, where each core performs computations simultaneously before combining the results.

For more details, refer to section 2.1 of HPC Tutorials volume 1.

Real Example

The following Gantt chart illustrates the breakfast preparation process, highlighting how different chefs manage their tasks with varying degrees of concurrency and parallelism. By analyzing their workflows, we can better understand key concepts in task scheduling, efficiency, and system optimization.

In the illustrated breakfast preparation process, the combined efforts of Chef A, Chef B, and Chef C form a parallel system where each chef operates independently to complete their assigned tasks. This system maximizes efficiency by allowing multiple food preparation processes to occur simultaneously rather than sequentially. Importantly, none of the chefs are required to wait on one another to complete their tasks; while this can prevent bottlenecks and ensure smooth workflow execution, it also leads to inefficiencies of it’s own.

Chef Task Approaches and Their Computational Analogies

Chef A: Task Grouping with Limited Context Switching

Chef A’s workflow demonstrates an approach where a person completes as much of one task as possible before switching to another. The only exception to this occurs when a baking process begins, at which point Chef A shifts focus to another task while waiting. This strategy is highly efficient because it minimizes context switching, which in computing terms can be likened to reducing the overhead of switching between processes or threads. Chef A works on multiple tasks concurrently but organizes them to ensure minimal interruptions.

Chef B: Fully Sequential Processing

In contrast to Chef A, Chef B follows a strictly sequential approach, completing one task entirely before moving on to the next. However, this approach is viable in this scenario because Chef B’s tasks do not involve long cooking or baking processes that require waiting. This method can be directly compared to a single-threaded process in computing, where a program runs linearly without switching between tasks. While this approach is simple and avoids the overhead of task switching, it does not take advantage of concurrent execution and thus may not be the most efficient method in a high-demand system.

Chef C: High-Frequency Task Switching

Chef C adopts a different approach, frequently switching between tasks to make progress on multiple fronts. This mirrors how modern computers execute tasks by rapidly shifting between different processes at the microsecond level. While this method introduces more frequent context switching, it ensures that no single task is left idle for too long, effectively utilizing available time. This approach is often seen in multitasking operating systems where the CPU executes multiple processes by time-slicing.

Challenges and Inefficiencies in the System

While the system overall demonstrates effective parallelism, a few inefficiencies emerge upon closer inspection:

Delayed Breakfast Completion Due to Task Bottlenecks
- Although much of the breakfast preparation occurs in parallel, the entire meal is delayed by six minutes due to Chef A’s bacon cooking task. This represents a real-world challenge in concurrent programming: the difficulty of evenly distributing workload among multiple processing units. If another chef had taken on the bacon preparation earlier, the entire breakfast could have been completed sooner.
Underutilization of Chef B’s Time
- Chef B completes all assigned tasks 13 minutes before the other chefs finish, leading to wasted potential. In computing, this corresponds to a processor or thread completing its task early but remaining idle while other components continue working. Ideally, work should be distributed so that all tasks conclude simultaneously for optimal efficiency.
The Importance of Task Coordination in Computing and Cooking
- In a real breakfast preparation scenario, the chefs would likely coordinate to ensure all dishes are finished at the same time so that food does not cool before serving. Similarly, in computing environments, special scheduling strategies are used to ensure that tasks requiring synchronization complete together, rather than one process waiting unnecessarily for another.
Idle Time During Long-Running Operations
- At various points, individual chefs experience idle time while waiting for certain tasks to finish, such as baking or boiling. This mirrors how CPUs sometimes have idle cycles when waiting for I/O operations, such as disk reads or network responses. Effective computing architectures attempt to minimize these idle cycles by scheduling additional tasks during wait periods.

Computational Comparisons: Background Work and System Calls

Many of the lighter cooking tasks, such as baking, boiling, and brewing, do not require constant attention from the chef. These are similar to system calls in computing, where a process requests an operation (such as reading from a disk or fetching data from a database) and then continues performing other tasks while waiting for a response. Different system calls have varying durations, much like how boiling water takes longer than heating a pan. Understanding these dependencies is crucial for designing efficient concurrent systems.

Optimizing the Breakfast Preparation System

Given these observations, several optimizations could be introduced to reduce the overall breakfast preparation time:

Reassigning Tasks for Better Load Balancing
- Chef B could take on additional work, such as assisting with the bacon preparation or toasting, to prevent early task completion and unnecessary idle time.
- Chef C could handle certain finalization tasks while other food items are still cooking to ensure a smooth serving process.
Prioritizing Bottlenecked Tasks Earlier
- If the bacon task were started earlier, the overall breakfast completion time would decrease. In computing, this is similar to prioritizing tasks on the critical path of execution to avoid unnecessary delays.
Coordinating Final Task Completion for Simultaneous Serving
- By adjusting the timing of each task, all food items could be ready at the same moment. This is akin to scheduling algorithms ensuring that interdependent processes complete in an optimal sequence.

Questions for Consideration

Now that we’ve analyzed how different approaches to task management impact the efficiency of breakfast preparation, it’s time to step back and consider the broader implications. The following questions encourage you to think critically about scheduling, workload distribution, and optimization—both in the kitchen and in computing systems.

Balancing Workload – Chef B finishes all tasks significantly earlier than the others. How could the work be redistributed to ensure that all chefs finish at the same time? What principles from computing could help optimize this scheduling?
Minimizing Idle Time – There are moments when a chef is waiting for an item to finish cooking before continuing. In computing, this is comparable to CPU idle time. What strategies could be used to reduce these idle periods and keep all chefs (or CPU cores) actively engaged?
Trade-offs in Task Switching – Chef A minimizes context switching, while Chef C frequently switches between tasks. Which approach is better suited for different types of workloads in computing, and why? How do factors like task complexity, dependencies, and processing power affect this decision?
Real-World Scheduling – In large-scale computing systems, load balancers and schedulers determine how tasks are distributed. If you were designing a task scheduler for this breakfast preparation, how would you assign work to maximize efficiency?
Concurrency Pitfalls – The system experiences an unnecessary bottleneck when breakfast is delayed by the bacon cooking last. What are real-world examples of similar bottlenecks in computing, and how can they be mitigated?
Parallelism vs. Concurrency – Chef B works sequentially, Chef A works concurrently, and Chef C demonstrates an interleaved approach. What types of real-world computing tasks best align with each of these strategies?
Scalability – If a fourth chef joined the kitchen, what changes could be made to further improve efficiency? How does this concept apply to adding more processors or threads in computing?

Conclusion

This breakfast preparation example serves as an insightful analogy for concurrency and parallelism in computing. The different approaches taken by each chef—Chef A’s task grouping, Chef B’s sequential execution, and Chef C’s rapid switching—reflect real-world computing strategies. By studying the inefficiencies in this system, we gain valuable insights into how concurrent systems can be optimized to improve efficiency, reduce idle time, and ensure a balanced workload distribution. Just as chefs must carefully coordinate their efforts to serve a perfectly timed meal, computer systems must employ intelligent scheduling strategies to maximize performance and minimize wasted resources.