C++ Threading

std::jthreads are C++ 20’s flagship mechanism for threading. Unlike OpenMP, adding them to a program means that the program is squarely parallel and must be compiled as such.

[!NOTE] You have previously benefitted from having multiple threads because OpenMP programatically compiled your code to spawn threads and coordinate the aggregration of results.

In this section, we’ll be stepping a level deeper in the complexities to manually implement threading.

This will require manual coordination of thread executation which does expose more manual control in situations where:

OpenMP is not available or

more fine-tuned approaches are necessary

Compiling with C++ Threads

g++ -pthread ... will compile with C++ threads; it’s usually easy to find the required flag on the man page or the internet for any compiler. CMakeLists.txt requires two things: finding the threading package and linking it to a target:

# At the top, under the `project` call:
find_package(Threads REQUIRED)
# With other compilation calls
add_executable(blah blah.cpp)
target_link_libraries(blah PRIVATE Threads::Threads)

Using C++ Threads

std::jthreads are simple in principle–they spawn, run a function while the main program is executing, then join automatically. For example, the following program prints arabic numerals in the main thread, latin in the other:

#include <thread>
#include <chrono>
#include <iostream>

int main() {
    // Latin numerals print in a spawned thread
    auto latin_numeral_thread = std::jthread([]{ // using lambdas to spawn threads is idiomatic
        for (auto &numeral: {"I", "II", "III", "IV", "V"}) {
            std::this_thread::sleep_for(std::chrono::milliseconds(50));
            std::cout << numeral << std::endl;
        }
    });
    // Arabic numerals print in the main thread
    for (int i=1; i<=5; i++) {
        std::this_thread::sleep_for(std::chrono::milliseconds(45));
        std::cout << i << " is written as ";
    }
    return 0; // latin_numeral_thread is automatically joined
}

/* Compile with `g++ -std=c++20 -pthread -o numerals numerals.cpp`
 * Output looks like:
 * 1 is written as I
 * 2 is written as II
 * 3 is written as III
 * 4 is written as IV
 * 5 is written as V
 */

This is simple enough when no coordination is required, or when clock-based coordination is sufficient as above, but getting threads to synchronize or communicate correctly is challenging. Even getting two threads to interleave the printing of “PING” and “PONG” below requires 3 variables dedicated solely to coordination:

[!NOTE] This demonstrates with a real example the complexity of coordinating thread execution.
The specifics of the technologies used (mutex, and condition_variable) are explained below.

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

int main() {
    // Simulation parameters
    const int n_volleys = 4;
    bool pinging = true; // start with PING
    std::mutex mtx;
    std::condition_variable cv;
    // PONG in worker thread
    auto pong_thread = std::jthread([&]{
        for (int i=0; i<n_volleys; i++) {
            std::unique_lock lock(mtx);
            cv.wait(lock, [&]{ return !pinging; });
            std::cout << "PONG!" << std::endl;
            pinging = !pinging;
            cv.notify_one();
        }
    });
    // PING in main thread
    for (int i=0; i<n_volleys; i++) {
        std::unique_lock lock(mtx);
        cv.wait(lock, [&]{ return pinging; });
        std::cout << "PING, ";
        pinging = !pinging;
        cv.notify_one();
    }
    return 0;
}

/* Output:
 * PING, PONG!
 * PING, PONG!
 * PING, PONG!
 * PING, PONG!
 */

Thread Coordination

C++ Atomics

Atomic operations appear to happen instantaneously from the perspective of threads–multiple threads can write to atomics without worrying about race conditions. The tradeoff is a severe performance penalty if you use them heavily–the following example code will do the same thing as the reduction example code and use 20x the CPU time:

#include <iostream>
#include <atomic>

int main() {
    std::atomic<size_t> counter = 0;
    #pragma omp parallel for
    for (size_t i = 0; i < 20000000; ++i) {
       counter += 1;
    }
    std::cout << counter << std::endl;
    return 0;
}

C++ Mutexes and Semaphores

std::mutex and std::counting_semaphore provide mutexes and sempahores in C++. It’s best to use std::unique_lock and std::lock_guard rather than locking and unlocking mutexes manually.

C++ Condition Variables

Condition variables encapsulate the idea of waiting until some condition is true. They have three main methods:

wait
notify_one
notify_all

The example below shows a simple thread-safe queue with push and pop methods. Safety is ensured by the condition_variable and mutex. Take a bit of time to make sure you understand how this works and why it’s safe:

std::mutex mtx;
std::condition_variable cv;
auto pop(std::queue &queue) {
   std::unique_lock lock(mtx);
   while (queue.empty()) {
      cv.wait(lock);
   }
   return queue.pop();
}
void push(std::queue &queue, auto val) {
   std::unique_lock lock(mtx);
   queue.push(val);
   cv.notify_one();
}

C++ Barriers

Use std::barrier for barriers in C++20.