C++20 — Practical Coroutines
Writing custom coroutines is not a trivial task in C++20. In this article, I will guide you through three increasingly complex examples of coroutines.
While this article will explain each example in detail, I will not go into the basics of coroutines. For that, please read my C++20 Coroutines article. All the examples are from my Coroutines Epoll and Sockets library.
The synchronous coroutine
The simplest coroutine is a coroutine that behaves like a normal function. So, of course, it might seem pointless to implement such a coroutine. Remember, however, that we cannot use coroutine keywords like co_await
outside of coroutines. So for convenience, we can write a special coroutine type for an asynchronous main.
The main difference from a completely synchronous coroutine is that we return std::suspend_always
in final_suspend()
(line 8). Without this, the promise would be destructed as the coroutine finishes running. However, we use the promise to store the result. We set it in return_value
(line 10), which is called on co_return
and then read it in the conversion to int
operator (line 23).
We also need to write a destructor since the coroutine must be explicitly destroyed now (line 20).
We can then use this coroutine as a replacement for main
like this:
Inside of our async_main
we can now co_await
on other coroutines.
The chainable coroutine
One of the compelling features of coroutines is the ability of symmetric transfer. With synchronous code, we only rarely need to care about running out of stack space. However, it is not unusual in asynchronous code to only execute small chunks of code and then hand off control to another part of the program. Avoiding stack space issues then requires meticulous design. With coroutines, we can avoid this problem entirely by relying on the code generated by the compiler.
We will look at chainable_task
in the next section, however, let's go through what is happening here first. We call co_await
on the result of async_op()
. This requires the result type chainable_task
to provide the required await_ready()
, await_suspend()
and await_resume()
interface. With await_resume()
providing the result value.
From the perspective of the caller, this behaves exactly like a synchronous call. The caller is suspended after the call and only resumed at the end to read the result value. However, in the background, we instead chain the coroutines without nesting them.
So let's look at what is happening in the background:
Let's focus on the critical parts. First, the awaitable interface on lines 43–48 is relatively straightforward:
- we want the caller to suspend, so we return
false
inawait_ready()
- inside
await_suspend()
, we remember the caller's handle (caller will bedemo()
in the last example) and return our handle, which will resume this coroutine (async_op()
in the last example) - inside
await_resume()
we return the stored result
Second, the promise type (lines 16–19) itself is where we set the main behaviour of the task:
- we suspend in
initial_suspend()
, which returns the control (and an instance ofchainable_task
) to the caller, who then calls the previously discussedco_await
- in
final_suspend()
, we want to handoff the control back to the caller, so we return a special awaitable object that returns (line 5) the handle we stored insideawait_suspend()
on line 45 - lastly, the
return_value()
method on line 19 is what stores the result that is then read inawait_resume()
on line 48
All this comes together to chain the execution instead of nesting it. This is a lot to take in, so I recommend grabbing the library code and adding debug prints to constructors and destructors if you are still struggling. It will allow you to observe changes in behaviour as you change your code.
Detached tasks
So far, we have only discussed linear execution models. By that, I mean that the code we write is still behaving like completely synchronous code. When we write co_await some_coro()
the following line will only be executed when the some_coro()
finishes running.
Imagine code like this:
If we never receive a connection on server1, this code will prevent server2 from accepting connections, even if we have plenty of them. Of course, we could spawn these on separate threads, but we can avoid this and still run everything on a single thread with coroutines.
Conceptually, we want to keep coroutines blocked on external events (connection arrived, data to read, timer expired, etc.) in a suspended state and only resume them when we know for sure that they have work to do.
To achieve this, we will need to introduce a global component, a scheduler of coroutines. All we want from the scheduler is to be a generator style coroutine that keeps looping and yielding control to coroutines that can run:
Emitters are external sources of events (connection arrived, socket ready for writing, condition evaluated to true, timeout expired, etc.). The scheduler then maps the received event to the corresponding coroutine blocked by it and resumes it:
In the CES library, the coroutine handle blocked by an event is part of the emitted event. So we simply handoff to that coroutine using a special awaitable type.
We now have a way to resume suspended coroutines. However, we still need the other side, which is a way to suspend a coroutine and register it in the scheduler:
This awaitable type stores the continuation and information about the event that is blocking the coroutine. In await_ready
we also handle the early-return case when a coroutine co_awaits
a condition that is already true. The notify_emitters
call on line 14 is a flip side of notify_departure
call from the last example. Notifying emitters will register the event to be watched, departure will unregister the event to mitigate fantom events.
The last part of the puzzle is a way for calling coroutines to spawn multiple detached coroutines. On top of that, we also need a way to add event emitters to the scheduler.
With this interface, multiple coroutines can be enqueued, and the scheduler then resumed with a call to run
.
Links and technical notes
All the code examples are either directly taken from a simplified from the CES: Coroutines, Epoll and Sockets library. This library is tested and working using trunk version of GCC 12.
Thank you for reading
Thank you for reading this article. Did you enjoy it?
I also publish videos on YouTube. Do you have questions? Hit me up on Twitter or LinkedIn.