C++ Coroutines – Let’s Play with Them!

Amir Kirsh
Amir Kirsh reading time: 13 minutes
March 23, 2022

C++20 added a feature that a lot of us were waiting for – coroutines. (In another post we talked about other features that came out with C++20 and in other previous posts we also discussed related topics: modernizing your C++ code and the evolution of C++. 

In this post we are going to play a bit with C++ Coroutines. 

Let’s start with just a piece of code. 

template<typename T> 
unique_generator<T> range(T fromInclusive, T toExclusive) { 
    for (T v = fromInclusive; v < toExclusive; ++v) { 
        co_yield v; 
    } 
} 


int main() { 
    for (auto val : range(1, 10)) { 
        std::cout << val << '\n'; 
    } 
} 

You can find above code in Compiler Explorer, here: https://coro.godbolt.org/z/zK3E9TEce 

Let’s explain what we have above. 

A coroutine is a special function that can suspend its execution and resume later at the exact point where execution was suspended. When suspending, the function may return (yield) a value. When coroutine ends execution it also may return a value. 

When a coroutine is suspended, its state is copied into an allocated object that represents the state of the coroutine (not on the stack, we would call it the coroutine “frame”). When the coroutine is suspended, it returns some kind of “handle”. The return value itself would be generated through the handle. 

In the main above we use “range” as a coroutine function. The thing that makes “range” become a coroutine function is having  a “co_yield”, “co_return” or “co_await” within it. 

In the above function we use “co_yield”, which returns a value while keeping the “frame” of the function, so we can get back to it for the next iteration and the function would preserve its state for us. 

Note that it is not the same as using a static variable to preserve state, as we can call the coroutine from different threads, or recursively, and each call would preserve its own “frame” independently. To achieve that, the state of the function has to be allocated into a “frame” which is managed through the return value of the coroutine. 

The return “handle” of the coroutine is set by the return type. This “handle” holds an inner promise_type (note that this is not related to std::promise). The promise_type must have the function get_return_object(). For other requirements for promise_type, see cppreference on coroutines promise_type.  

The machinery of handling the promise_type and the lifetime of the coroutine “frame” is a real burden. To avoid that you can use an existing implementation and focus on the implementation of the coroutine itself. The cppreference usage example for std::coroutine_handle presents such an implementation for a generator class. We used a similar generator from another library in the example above. This library brings us the unique_generator type, which behaves as an iterator (i.e., we can iterate over the values being yielded from our coroutine, using the return value type unique_generator) in a similar way to the Generator presented in the above cppreference link. 

The responsibility of the unique_generator is not easy stuff. It should handle the coroutine frame allocation and deallocation. If you want to have a peep at the nitpicking of handling coroutine frames, take a look at this bug fix for unique_generator. 

A coroutine ends execution when it reaches a co_return, or when it reaches the end of the function. In our case, the range function finishes when we reach the toExclusive value in the loop. 

Some limitations for coroutines, as of C++20: 

Coroutines: 

  • cannot use return, only co_return 
  • cannot use varargs (e.g. like printf) 
  • cannot be constexpr 
  • cannot be a constructor or a destructor 
  • cannot be the main function  
  • cannot use auto or concepts as return type (the programmer needs to specify the return type so that the compiler knows what handle type to use, e.g. generator<int>; this obviously can’t be inferred from the function body’s contents) 

    The hazard of passing parameters to coroutines by reference 

    Let’s look at an example taken from Arthur O’Dwyer’s blog: 

    unique_generator<char> explode(const std::string& s) { 
        for (char ch : s) { 
            co_yield ch; 
        } 
    }  
    
    int main() { 
        for (char ch : explode("hello world")) { 
            std::cout << ch << '\n'; 
        } 
    } 

    The above code creates a temporary string in the call to the coroutine function “explode”. However, this temporary string is dead before the actual first use of the coroutine, as the lifetime of temporaries is not extended as part of the coroutine frame creation. 

    As you can see in the code above, the bug is revealed when we run with address sanitizer (-fsanitize=address) and is not detected without that flag. That means this is one of those bugs that can work in your environment and crash in production. 

    Note that the problem would not be solved even if we try to copy the temporary string to another string that would outlive the coroutine lifetime: 

    unique_generator<char> explode(const std::string& s) { 
        auto ps = std::make_unique<std::string>(s); 
        for (char ch : *ps) { 
            co_yield ch; 
        } 
    } 

    The above code still has undefined behavior, as the first call to the coroutine just creates it without executing even the first line of the body. Then, on the first execution the temporary is already dead and we try to create a heap-allocated string (by calling make_unique) from a dead temporary. Note again that the bug in this example is revealed when we run with address sanitizer (-fsanitize=address) and is not detected in this case without it. 

    To better understand the separation between creating the coroutine and actually calling it, we can separate the lines in main into two: 

    auto coro = explode("hello world"); // (1) coroutine being created 
    for (char ch : coro) {  // (2) coroutine being called 
        std::cout << ch << '\n'; 
    } 

    The first line, marked with (1) is still okay, but the second line marked as (2) executes coro, the coroutine, at a point where the temporary string created from “hello world” is already dead. The creation of the unique_ptr from the temporary string is done on the first call of line (2), which is too late, as the temporary string is already dead by then. 

    We can change the code to make it valid by sending a string that is not a temporary: 

    int main() { 
        std::string s = "hello world"; 
        // may_explode is a coroutine getting const string& 
        for (char ch : may_explode(s)) { // ok doesn't explode now 
            std::cout << ch << '\n'; 
        } 
    } 

    But the above has changed only the call and not the function itself, so the function can still be called with a temporary, and we are still exposed to an undefined behavior usage. 

    We could change the function to expect something that outlives the coroutine, such as a unique_ptr: 

    unique_generator<char> doesnt_explode(std::unique_ptr<std::string> ps) { 
        for (char ch : *ps) { 
            co_yield ch; 
        } 
    }  
    
    int main() { 
        for (char ch : doesnt_explode(std::make_unique<std::string>("good"))) { 
            std::cout << ch << '\n'; 
        } 
    } 

    However, one may argue that the above API is not too friendly. 

    We could also pass the string by-value, an option that we would discuss later on. 

    Code that sometimes works, depending on the parameter 

    As seen above, a coroutine that takes a const lvalue reference can work if we actually send an lvalue reference that outlives the lifetime of the coroutine or can explode if we, for example, send an rvalue. This is also the case with the following code, expecting std::string_view:

    unique_generator<char> extract(std::string_view s) { 
        for (char ch : s) { 
            co_yield ch; 
        } 
    }  
    
    int main() { 
        // this works ok 
        for (char ch : extract("hello world")) { 
            std::cout << ch << '\n'; 
        } 
     
        // this doesn't 
        using namespace std::string_literals; 
        for (char ch : extract("hello world"s)) { 
            std::cout << ch << '\n'; 
        } 
    } 

    Again, the undefined behavior is revealed with address sanitizer (-fsanitize=address) and is not revealed in this code example without it. 

    No, don’t pass params by value! 

    Some sources (such as SonarSource, for instance) have advised that when it comes to coroutines it is better to get parameters by value, to be safer and avoid the dangling reference scenarios presented above.  

    I beg to differ. 

    First, getting by value doesn’t always help, as we can see in the string_view example above. (One may argue that views are a kind of reference-semantic type, analogous to `const T&`, so passing a string_view by value isn’t really passing “by value”. That’s true. Yet technically speaking, the argument of “pass-by-value should save you from troubles” doesn’t always hold as is.)

    Second, the problem is less with the parameter that we expect and more with sending a temporary, which was a known issue way before coroutines arrived. 

    And third, the process can be very inefficient, especially with coroutines. 

    Let’s make our coroutine a bit more generic so it can extract items from any container and for “safety reasons” (that we question) we will get the container by value: 

    template<typename T> 
    unique_generator<const typename T::value_type&> extract(T s) { 
        for (const auto& val : s) { 
            co_yield val; 
        } 
    } 

    Note that since coroutines are not allowed to use auto for their return type, at least in C++20, we need to express the return type explicitly. 

    In our main we will compare a coroutine loop and a simple loop using objects of type MyString, as internal values of the container, so we can add printouts in its constructors and destructor: 

    int main() { 
        std::array arr{MyString("Hello"), MyString("World"), MyString("!!!") }; 
        std::cout << "========================\n"; 
        std::cout << "coroutine loop:\n"; 
        std::cout << "------------------------\n"; 
        for (const auto& val : extract(arr)) { 
            std::cout << val << '\n'; 
        } 
        std::cout << "========================\n"; 
        std::cout << "simple loop:\n"; 
        std::cout << "------------------------\n"; 
        for (const auto& val : arr) { 
            std::cout << val << '\n'; 
        } 
    } 

    The effect of our coroutine getting the container by value can be clearly seen in the printout: 

    ======================== 
    coroutine loop: 
    ------------------------ 
    MyString copy ctor: Hello (0x7ffefe1f5790) 
    MyString copy ctor: World (0x7ffefe1f57b0) 
    MyString copy ctor: !!! (0x7ffefe1f57d0) 
    MyString copy ctor: Hello (0x610000000070) 
    MyString copy ctor: World (0x610000000090) 
    MyString copy ctor: !!! (0x6100000000b0) 
    ~MyString: !!! (0x7ffefe1f57d0) 
    ~MyString: World (0x7ffefe1f57b0) 
    ~MyString: Hello (0x7ffefe1f5790) 
    Hello (0x610000000070) 
    World (0x610000000090) 
    !!! (0x6100000000b0) 
    ~MyString: !!! (0x6100000000b0) 
    ~MyString: World (0x610000000090) 
    ~MyString: Hello (0x610000000070) 
    ======================== 
    simple loop: 
    ------------------------ 
    Hello (0x7ffefe1f5710) 
    World (0x7ffefe1f5730) 
    !!! (0x7ffefe1f5750) 

    In this example we could actually get the container by reference, as we send an actual lvalue reference that outlives the lifetime of the coroutine. This is the change (note the ref on T): 

    template<typename T> 
    unique_generator<const typename T::value_type&> extract(const T& s) { 
        for (const auto& val : s) { 
            co_yield val; 
        } 
    } 

    And the output would now become much nicer for the coroutine: 

    ======================== 
    coroutine loop: 
    ------------------------ 
    Hello (0x7fff7b224350) 
    World (0x7fff7b224370) 
    !!! (0x7fff7b224390) 
    ======================== 
    simple loop: 
    ------------------------ 
    Hello (0x7fff7b224350) 
    World (0x7fff7b224370) 
    !!! (0x7fff7b224390) 

    However, the current code still allows getting a temporary that would result in undefined behavior: 

    for (const auto& val : extract(std::array{MyString("Hi"), MyString("!!")})) { 
        std::cout << val << '\n'; 
    } 

    Looking at the output, it is clear that we have undefined behavior, as we print the strings after being destructed: 

    ======================== 
    coroutine loop: 
    ------------------------ 
    MyString ctor from char*: Hello (0x7ffe650e0fc0) 
    MyString ctor from char*: World (0x7ffe650e0fe0) 
    MyString ctor from char*: !!! (0x7ffe650e1000) 
    ~MyString: !!! (0x7ffe650e1000) 
    ~MyString: World (0x7ffe650e0fe0) 
    ~MyString: Hello (0x7ffe650e0fc0) 
    Hello (0x7ffe650e0fc0) 
    World (0x7ffe650e0fe0) 
    !!! (0x7ffe650e1000) 

    And again, the code crashed with -fsanitize=address, and didn’t without the address sanitizer. In this case it was acting as a hidden bug waiting for production. 

    My proposed solution to achieve the efficiency of getting by reference while preventing dangling reference bugs is simple and not new to coroutines. Implement the const reference and delete the rvalue reference: 

    void extract(const std::string&& s) = delete; 
    
    unique_generator<char> extract(const std::string& s) { 
        for (char ch : s) { 
            co_yield ch; 
        } 
    } 
    
    int main() { 
        std::string s = "hello world"; 
        for (char ch : extract(s)) { 
            std::cout << ch << '\n'; 
        } 
    
        // doesn't compile! Good!! 
        // for (char ch : extract("temp")) { 
        //     std::cout << ch << '\n'; 
        // } 
    } 

    Note that the above idea of deleting the rvalue version resolves the undefined behavior in this case, but is not bulletproof and is considered by some as a bad practice (see Abseil Tip of the Week #149: Object Lifetimes vs. = delete for an interesting discussion on the subject). Though controversial and not bulletproof, I still find this solution contributing. 

    Binary Tree inorder traversal with coroutines 

    This example is inspired by Adi Shavit’s talk at CppCon 2019 on coroutines.  

    Suppose that we want to traverse over a binary tree inorder like this: 

    BinaryTree<int> t1{5, 3, 14, 2, -3, 100, 56, 82, 72, 45}; 
    for (auto val : t1.inorder()) { 
        std::cout << val << '\n'; 
    } 

    Can we implement a member coroutine function in class BinaryTree? Well, the answer is: yes, we can! 

    Here it is: 

    template<typename T> 
    class BinaryTree { 
        struct TreeNode { 
            T value; 
            TreeNode* left = nullptr; 
            TreeNode* right = nullptr; 
            // [...] 
            unique_generator<T> inorder() { 
                if(left) { 
                    for(auto v: left->inorder()) { 
                        co_yield v; 
                    } 
                } 
                co_yield value; 
                if(right) { 
                    for(auto v: right->inorder()) { 
                        co_yield v; 
                    } 
                } 
            } 
        }; 
        TreeNode* head = nullptr; 
        // [...] 
    public: 
        auto inorder() { 
            return head->inorder(); 
        } 
        // [...] 
    }; 

    The above would fail for an empty BinaryTree, like this one: 

    BinaryTree<int> t2{}; 
    for (auto val : t2.inorder()) { // crashes here, head is null 
        std::cout << val << '\n'; 
    } 

    There are several nice and simple ways to solve the empty tree traversal, keeping the coroutine approach. You can find one here. 

    To Summarize 

    We played with simple coroutines, specifically with generator coroutines. The main idea of coroutines is to have a function that preserves a state while releasing control back to the caller. Coroutines in C++ are a complex beast. The coroutine implementer should manage the frame to be created when yielding out, but we used an external library that manages this for us. Coroutines are sensitive to dangling references to temporary objects, one would say even more than simple functions, as even if it seems that we use the temporary object when still alive. However, this is not the case for references copied into the coroutine frame. If you hear advice of passing objects by value to coroutines don’t get tempted to do that when it is costly (well, this is the same advice as for ordinary function calls. Pass-by-value can be safer than const reference, but can be expensive for large non-trivial types). We discussed the hazard of reference to temporary and ways to avoid it. 

    Resources and additional reading 

     

     

     

     

    jetbrains MSVC incredibuild jfrog conan.io 2022 best c++ ecosystems

     

    Amir Kirsh
    Amir Kirsh reading time: 13 minutes minutes March 23, 2022
    March 23, 2022

    Table of Contents

    Related Posts

    13 minutes 8 Reasons Why You Need Build Observability

    Read More  

    13 minutes These 4 advantages of caching are a game-changer for development projects

    Read More  

    13 minutes What Level of Build Observability Is Right for You?

    Read More