Hacking Coroutines into C(wiomoc.de)

160 pointsby jmillikin7 months ago18 comments

adinisom7 months ago
My favorite trick in C is a light-weight Protothreads implemented in-place without dependencies. Looks something like this for a hypothetical blinky coroutine:
```
  typedef struct blinky_state {
    size_t pc;
    uint64_t timer;
    ... variables that need to live across YIELDs ...
  } blinky_state_t;
  
  blinky_state_t blinky_state;
  
  #define YIELD() s->pc = __LINE__; return; case __LINE__:;
  void blinky(void) {
    blinky_state_t *s = &blinky_state;
    uint64_t now = get_ticks();
    
    switch(s->pc) {
      while(true) {
        turn_on_LED();
        s->timer = now;
        while( now - s->timer < 1000 ) { YIELD(); }
        
        turn_off_LED();
        s->timer = now;
        while( now - s->timer < 1000 ) { YIELD(); }
      }
    }
  }
  #undef YIELD
```
Can, of course, abstract the delay code into it's own coroutine.
Your company is probably using hardware containing code I've written like this.
What's especially nice that I miss in other languages with async/await is ability to mix declarative and procedural code. Code you write before the switch(s->pc) statement gets run on every call to the function. Can put code you want to be declarative, like updating "now" in the code above, or if I have streaming code it's a great place to copy data.
- dkjaudyeqooe7 months ago
  A cleaner, faster way to implement this sort of thing is to use the "labels as values" extension if using GCC or Clang []. It avoids the switch statement and associated comparisons. Particularly useful if you're yielding inside nested loops (which IMHO is one of the most useful applications of coroutines) or switch statements.
  [] https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
- syncurrent7 months ago
  In `proto_activities` this blinking would look like this:
  pa_activity (Blinker, pa_ctx_tm(), uint32_t onMs, uint32_t offMs) { pa_repeat { turn_on_LED(); pa_delay_ms (onMs); turn_off_LED(); pa_delay_ms (offMs); } } pa_end
  Here the activity definition automatically creates the structure to hold the pc, timer and other variables which would outlast a single tick.
- fjfaase7 months ago
  I have used this approach, with an almost similar looking define for YIELD myself.
  If there is just one instance of a co-routine, which is often the case for embedded software, one could also make use of static variables inside the function. This also makes the code slightly faster.
  You need some logic, if for example two co-routines need to access a shared peripheral, such as I2C. Than you might also need to implement a queue. Last year, I worked a bit on a tiny cooperative polling OS, including a transpiler. I did not finish the project, because it was considered too advanced for the project I wanted to use it for. Instead old fashion state machines documented with flow-charts were required. Because everyone can read those, is the argument. I feel that the implementation of state machines is error prone, because it is basically implementing goto statements where the state is like the label. Nasty bugs are easily introduced if you forget a break statement at the right place is my experience.
  - adinisom7 months ago
    Yes, 100%. State transitions are "goto" by another name. State machines have their place but tend to be write-only (hard to read and modify) so are ideally small and few. Worked at a place that drank Miro Samek's "Practical Statecharts in C/C++" kool-aid... caused lots of problems. So instead I use this pattern everywhere that I can linearize control flow. And if I need a state machine with this pattern I can just use goto.
    Agreed re: making the state a static variable inside the function. Great for simple coroutines. I made it a pointer in the example for two reasons:
    - Demonstrates access to the state variables with very little visual noise... "s->"
    - For sub-coroutines that can be called from multiple places such as "delay" you make the state variable the first argument. The caller's state contains the sub-coroutine's state and the caller passes it to the sub-coroutine. The top level coroutine's state ends up becoming "the stack" allocated at compile-time.
    kazinator7 months ago
    Worked at a place that drank hiearchical state machines kool-aid. Yeah.
    https://en.wikipedia.org/wiki/UML_state_machine#Hierarchical...
- csmantle7 months ago
  Yeah. Protothreads (with PT_TIMER extensions) is one of the classics libraries, and also was used in my own early embedded days. I was totally fascinated by its turning ergonomic function-like macros into state machines back then.
astrobe_7 months ago
[State machines] lacked a linear flow
That's because you need a state machine when your control flow is not linear. They are represented by graphs, remember? This is actually a case where using gotos might be clearer. Although not drastically better because the main problem is that written source code is linear by nature. A graph described by a dedicated DSL such as GraphViz has the same problem, although at least you can visualize the result.
But control flow is only one term of the equation, the other being concurrency. One typically has more than one state machine running; sometimes one use state machines that are actually essentially linear because of that. Cooperative multitasking. I would question trying to solve these two problems, non-linearity and concurrency. Sometimes when you try too hard to kill two birds with one stone you end up with one dead bird and a broken window.
One lecturer of the conference announced earlier [1] made that point too that visualization helps a lot, and that reminded me of Pharo's inspection tools [2]. Seeing what's going on under the hood is more important that one usually thinks.
One issue with state machines is that they are hardly modular: adding a state or decomposing a state into multiple states is more work than one would like it to be. It is the inverse problem of visualization: what you draw is what you code. A good tool for that would let the user connect nodes with arrows and assign code to nodes and/or arrows; it would translate this into some textual intermediate language to play nice with Git, and a compiler would transform it to C code for integration in the build system.
[1] https://bettersoftwareconference.com/ [2] https://pharo.org/features
mikepurvis7 months ago
FreeRTOS can also be used with a cooperative scheduler: https://www.freertos.org/Why-FreeRTOS/Features-and-demos/RAM...
That said, if I was stuck rolling this myself, I think I’d prefer to try to do it with “real” codegen than macros. If nothing else it would give the ability to do things like blocks and correctness checks, and you’d get much more readable resulting source when it came to stepping through it with a debugger.
userbinator7 months ago
Of course, the project didn’t allow us to use an RTOS.
That tends to just make the project eventually implement an approximation of one... as what appears to have happened here.
How I'd solve the given problem is by using the PWM peripheral (or timer interrupts if no PWM peripheral exists) and pin change interrupts, with the CPU halted nearly 100% of the time. I suspect that approach is even simpler than what's shown here.
- kevin_thibedeau7 months ago
  You should not use interrupts for button inputs. You will just end up hammering the processor with useless interrupts when the switch bounces. Human interfaces can be polled and still maintain responsiveness. If polling isn't fast enough for machine actuated IO or you need to stay in a low power state then interrupts could be considered but you really need a non-naive solution that disables the interrupt from within the handler for a specified timeout duration.
  - apple14177 months ago
    I've worked on several low power projects, while yes we needed an interrupt to wake the processor, they all still used polling for all the actual button handling. At worst the interrupt just set a flag. It's actually kind of amazing how polling turns the main loop into a denounce filter, entirely for free.
  - userbinator7 months ago
    I haven't had problems with that in my designs, debouncing is done in hardware.
syncurrent7 months ago
A similar approach, but rooted in the idea of synchronous languages like Esterel or Blech:
https://github.com/frameworklabs/proto_activities
- vanderZwan7 months ago
  This is Céu[0] erasure! ... I'm joking, I'm joking, although I do think it deserves a mention.
  Seriously though, neat library! It took me a moment to realize that everything with a pa_ prefix is a macro, for the dumb reason of being used to only see those in ALL_CAPS. Not saying you can't use lower-case macros but I think a short sentence mentioning it before the demo code, and with a "see the protothread under the hood page[1] for an explanation of how it generally works" would help a lot with demystifying the code for people unfamiliar with the concepts involved.
  [0] https://ceu-lang.github.io/
  [1] https://dunkels.com/adam/pt/expansion.html
Neywiny7 months ago
The intent here is nice. I historically hate state machines for sequential executioners. To me they make sense in FPGA/ASIC/circuits. In software, they just get so complicated. I've even seen state managers managing an abstracted state machine implementing a custom device to do what's ultimately very sequential work.
It's my same argument that there should be no maximum number of lines to a function. Sometimes, you just need to do a lot of work. I comment the code blocks, maybe with steps/parts, but there's no point in making a function that's only called in one place.
But anything is better than one person I met who somehow was programming without knowing how to define their own functions. Gross
- duped7 months ago
  > I comment the code blocks, maybe with steps/parts, but there's no point in making a function that's only called in one place.
  I encourage junior developers that get into this habit (getting worse now, with LLMs) to convert the comment into a function name and add the block as a function, thinking pretty carefully about its function signature. If you have a `typedef struct state` that gets passed around, great.
  The reason for splitting up this code is so that the person writing it doesn't fuck up, the input/output is expressed as types and validated before they push it. It's easy for me to review, because I can understand small chunks of code better than big chunks, and logically divides up the high level architecture from the actual implementation so I can avoid reviewing the latter if I find trouble with the former. It's also good as a workflow, where you can pair to write out the high level flow and then split off to work on implementation internally. And most importantly, it makes it possible to test the code.
  I have had this discussion with many grumbly developers that think of the above as a "skill issue." I don't really want to work with those people, because their code sucks.
  - Neywiny7 months ago
    I think for me, and it sounds like for this author, the context lost by that abstraction makes it harder to review. In my experience it's easier for me to understand a small block of code, but it's harder to understand how it impacts the system when it's out of context.
    For example:
    x++;
    A very easy piece of code to understand. But who wants x, and what values could they expect? Why do we ++ and under what conditions?
    Those effects, again just for me your mileage may vary, tend to get much harder to understand.
  - 7 months ago
    undefined
- throwaway815237 months ago
  > there's no point in making a function that's only called in one place.
  There's nothing wrong with doing that if it helps make your code clearer. The compiler's optimizer will inline it when appropriate so there's no runtime overhead either.
  - munch1177 months ago
    Not only that, the compiler's optimizer might actually do a better job if you split up a big function. Because the smaller functions have less register pressure.
    Neywiny7 months ago
    I'm not sure I agree and I think you should try some stuff out on godbolt first. The compiler can see where variables are no longer in use, whereas unless you turn on link time optimization (which is known for being messy so nobody seems to), you'll likely get a lot of unnecessary push/pop between the function calls.
    throwaway815237 months ago
    Declare the functions static and the compiler won't export the symbols and it can do more inlining.
- TechDebtDevin7 months ago
  I actually write a lot of Go in state machine like patterns. My state types files would make you think im schizophrenic. I just finished up a project this week that was 10k lines of comments in 18k loc. Noone else has to read it tho, they actually probably appreciate it if they did.
user____name7 months ago
I've recently read a bunch of articles explaining these weird macro soup setups for emulating coroutines in C. This one is probably the most advanced writeup in implementing fibers/coroutines I came across. The focus is on a multithreaded context, which seems to complicate things a lot. Honestly I feel like you need language level support for them in that case, they seem more trouble than they're worth otherwise, at least in plain C.
https://graphitemaster.github.io/fibers/
jonhohle7 months ago
I’ve used libaco for coroutones in C and liked it. In my case I used it to deal with the differences betwen between eventing in libevent/libuv and feeding zlib for streaming decompression. It allowed the zlib loop to continue to look like a standard zlib loop.
7 months ago
undefined
throwaway815237 months ago
As the article acknowledges at the end, this is sort of like protothreads which has been around for ages. The article's CSS was so awful that I didn't read anything except the last paragraph, which seemed to tell me what I wanted to know.
- mananaysiempre7 months ago
  Right, this is more or less this blog author’s riff on (PuTTY author) Simon Tatham’s old page on coroutines using switch[1], which itself indicates that Tom Duff thought of this (which makes sense, it’s only a half-step away from Duff’s device) and described it as “revolting”. So in this sense the idea predates the standardization of C.
  [1] https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
- hecanjog7 months ago
  > The article's CSS was so awful
  Small text sizes? What is the problem for you?
  - shakna7 months ago
    Whilst I wouldn't call it "awful", the spacing between the lines isn't helping any.
  - throwaway815237 months ago
    The low contrast color scheme.
codr77 months ago
Looks overly complicated to me.
This is an alternative I wrote for my C book:
https://github.com/codr7/hacktical-c/tree/main/task
moconnor7 months ago
A colleague of mine did this much more elegantly by manually updating the stack and jmping. This was a couple of decades ago and afaik the code is still in use in supercomputing centres today.
Asooka7 months ago
I would have used CPC or adapted qemu's coroutines. Coroutines in C are a very well explored field with several mature solutions.
Nursie7 months ago
Cooperative multithreading via setjmp and longjmp has been around in C since the 80s at least.
I’m not sure this is so much hacking as an accepted technique from the old-old days which has somewhat fallen out of favour, especially as C is falling a little outside of the mainstream these days.
Perhaps it’s almost becoming lost knowledge :)
- ajb7 months ago
  This isn't using setjmp/longjmp
  It's using Simon Tatham's method based on Duff's device (https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html)
  - Nursie7 months ago
    Sure, I guess I just wanted to point out that regardless of method, people have been building these sorts of facilities in C for a very long time.
    It doesn’t lessen the achievement of course, but it amuses me an in “everything old is new again” kinda way.
    johnisgood7 months ago
    "everything old is new again" is so true. I see it across IT all the time, heh.
  - vanderZwan7 months ago
    One of the most surprising things I discovered a while back is that this technique not only technically works in JavaScript, it actually beats its own built-in generator syntax in performance terms even if used as an iterable object:
    https://jsbenchit.org/?src=1b165435c816c6d298e6b800b4742568
    https://jsbenchit.org/?src=dedb07499cfa289b94d686bde05901df
    Context: JS has an iteration protocol[0] that lets you create your own custom objects to be used with syntactic sugar for iteration. The sensible expectation is that the built-in syntax for generating such functions would produce the fastest code. It clearly doesn't.
    Having said that I do not recommend manually writing code this way because if this is something that you need to worry about while writing JavaScript, it's a sign that you're using the wrong tool for the job anyway.
    [0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...
  - Nursie7 months ago
    Also wanted to add that while I think this is very clever, and I am a big fan of Mr Tatham's work, that by the time we're talking about the 'advanced' version at the end we're edging towards using a stack-based system in the form of a context object and at that point it feels like we're just a leap and a jump to stack-based coroutines and full-on cooperative multithreading.
    Also, by the time you're passing a coroutine context around anyway, you could refactor (say) the decompressor around a decompression context and the code would stay nice...
    It's definitely interesting though, and it's been a few years since I read that coroutines page.
webdevver7 months ago
the chiark green end guy has an article about this topic too
Agyemang7 months ago
Yes
stefantalpalaru7 months ago
[dead]
joshlk7 months ago
Rust can be used in an embedded environment and also offers asynchronous execution built into the language