> Context switching is virtually free, comparable to a function call.
If you’re counting that low, then you need to count carefully.
A coroutine switch, however well implemented, inevitably breaks the branch predictor’s idea of your return stack, but the effect of mispredicted returns will be smeared over the target coroutine’s execution rather than concentrated at the point of the switch. (Similar issues exist with e.g. measuring the effect of blowing the cache on a CPU migration.) I’m actually not sure if Zig’s async design even uses hardware call/return pairs when a (monomorphized-as-)async function calls another one, or if every return just gets translated to an indirect jump. (This option affords what I think is a cleaner design for coroutines with compact frames, but it is much less friendly to the CPU.)
So a foolproof benchmark would require one to compare the total execution time of a (compute-bound) program that constantly switches between (say) two tasks to that of an equivalent program that not only does not switch but (given what little I know about Zig’s “colorless” async) does not run under an async executor(?) at all. Those tasks would also need to yield on a non-trivial call stack each time. Seems quite tricky all in all.
If you constantly switch between two tasks from the bottom of their call stack (as for stackless coroutines) and your stack switching code is inlined, then you can mostly avoid the mispaired call/ret penalty.
Also, if you control the compiler, an option is to compile all call/rets in and out of "io" code in terms of explicit jumps. A ret implemented as pop+indirect jump will be less less predictable than a paired ret, but has more chances to be predicted than an unpaired one.
My hope is that, if stackful coroutines become more mainstreams, CPU microarchitectures will start using a meta-predictor to chose between the return stack predictor and the indirect predictor.
Even so. You're talking about storing and loading at least ~16 8-byte registers, including the instruction pointer which is essentially a jump. Even to L1 that takes some time; more than a simple function call (jump + pushed return address).
Only stack and instruction pointer are explicitly restored. The rest is handled by the compiler, instead of depending on the C calling convention, it can avoid having things in registers during yield.
See this for more details on how stackful coroutines can be made much faster:
On ARM64, only fp, sp and pc are explicitly restored; and on x86_64 only rbp, rsp, and rip. For everything else, the compiler is just informed that the registers will be clobbered by the call, so it can optimize allocation to avoid having to save/restore them from the stack when it can.
If this was done the classical C way, you would always have to stack-save a number of registers, even if they are not really needed. The only difference here is that the compiler will do the save for you, in whatever way fits the context best. Sometimes it will stack-save, sometimes it will decide to use a different option. It's always strictly better than explicitly saving/restoring N registers unaware of the context. Keep in mind, that in Zig, the compiler always knows the entire code base. It does not work on object/function boundaries. That leads to better optimizations.
Yes, you write inline assembly that saves the frame pointer, stack pointer, and instruction pointer to the stack, and list every other register as a clobber. GCC will know which ones its using at the call-site (assuming the function gets inlined; this is more likely in Zig due to its single unit of compilation model), and save those to the stack. If it doesn't get inlined, it'll be treated as any other C function and only save the ones needed to be preserved by the target ABI.
I wonder how you see it. Stackful coroutines switch context on syscall in the top stack frame, the deeper frames are regular optimized code, but syscall/sysret is already big context switch. And read/epoll loop has exactly same structure, the point of async programming isn't optimization of computation, but optimization of memory consumption. Performance is determined by features and design (and Electron).
> buttering the cost of switches [over the whole execution time]
The switches get cheaper but the rest of the code gets slower (because it has less flexibility in register allocation) so the cost of the switches is "buttered" (i.e. smeared) over the rest of the execution time.
But I don't think this argument holds water. The surrounding code can use whatever registers it wants. In the worst case it saves and restores all of them, which is what a standard context switch does anyway. In other words, this can be better and is never worse.
You are right that the statement was overblown, however when I was testing with "trivial" load between yields (synchronized ping-pong between coroutines), I was getting numbers that I had trouble believing, when comparing them to other solutions.
In my test of a similar setup in C++ (IIRC about 10 years ago!), I was able to do a context switch every other cycle. The bottleneck was literally the cycles per taken jump of the microarchitecture I was testing again. As in your case it was a trivial test with two coroutines doing nothing except context switching, so the compiler had no need to save any registers at all and I carefully defined the ABI to be able to keep stack and instruction pointers in registers even across switches.
I am still mystified as to why callback-based async seems to have become the standard. What this and e.g. libtask[1] do seems so much cleaner to me.
The Rust folks adopted async with callbacks, and they were essentially starting from scratch so had no need to do it that way, and they are smarter than I (both individually and collectively) so I'm sure they have a reason; I just don't know what it is.
Stackless coroutines can be implemented using high level language constructs, and entirely in your language. Because of this it interacts with legacy code, and existing language features in predictable ways. Some security software or code hardening and instrumentation libraries will break as well.
Also, async at low level is literally always callbacks (even processor interrupts are callbacks)
By mucking about with the stack, you break stuff like stack unwinding for exceptions and GC, debuggers, and you probably make a bunch of assumptions you shouldn't
If you start using the compiler backend in unexpected ways, you either expose bugs or find missing functionality and find that the compiler writers made some assumptions about the code (either rightfully or not), that break when you start wildly overwriting parts of the stack.
Writing a compiler frontend is hard enough as it is, and becoming an LLVM expert is generally too much for most people.
But even if you manage to get it working, should you have your code break in either the compiler or any number of widely used external tooling, you literally can't fast track your fix, and thus you can't release your language (since it depends on a broken external dependency, fix pending whenever they feel like it).
I guess even if you are some sort of superhero who can do all this correclty, the LLVM people won't be happy merging some low level codegen change that has the potential to break all compiled software of trillion dollar corporations for the benefit of some small internet project.
One thing I would consider "unclean" about the zio approach (and e.g. libtask) is that you pass it an arbitrary expected stack size (or, as in the example, assume the default) and practically just kind of hope it's big enough not to blow up and small enough to be able to spawn as many tasks as you need. Meanwhile, how much stack actually ends up being needed by the function is a platform specific implementation detail and hard to know.
This is a gotcha of using stack allocation in general, but exacerbated in this case by the fact that you have an incentive to keep the stacks as small as possible when you want many concurrent tasks. So you either end up solving the puzzle of how big exactly the stack needs to be, you undershoot and overflow with possibly disastrous effects (especially if your stack happens to overflow into memory that doesn't cause an access violation) or you overshoot and waste memory. Better yet, you may have calculated and optimized your stack size for your platform and then the code ends up doing UB on a different platform with fewer registers, bigger `c_long`s or different alignment constraints.
Go depends on the fact that it can track all pointers, and when it needs to resize stacks, it can update them.
Previous versions of Go used segmented stacks, which are theoretically possible, if Zig really wanted (would need compiler support), but they have nasty performance side-effects, see https://www.youtube.com/watch?v=-K11rY57K7k
The research Microsoft engineers did on stackful vs stackless coroutines for the c++ standard I think swayed this as “the way” to implement it for something targeting a systems level - significantly less memory overhead (you only pay for what you use) and offload the implementation details of the executor (lots of different design choices that can be made).
There are downsides to stackful coroutines (peak stack usage for example), but I feed that p1364 was attacking a strawman: first of all it is comparing a solution with builtin compiler support against a pure library implementation, second it is not even comparing against the reference implementation of the competing proposal.
If your language has RAII or exceptions, it raises crazy questions about how if thread A is hosting fiber 1, which throws an exception, which propagates outside of the fiber invocation scope, destroys a bunch of objects, then we switch to fiber 2, which sees the world in an inconsistent state (outside resources have been cleaned up, inside ones still alive).
This was literally impossible in pre-fiber code, so most existing code would probably not handle it well.
That's not different from threads running concurrent exceptions (in fact it is simpler in the single threaded example). RAII or exceptions are really not an issue for stackful coroutines.
> In the previous C++ version, I used Qt, which might seem very strange for a server software, but I wanted a nice way of doing asynchronous I/O and Qt allowed me to do that. It was callback-based, but Qt has a lot of support for making callbacks usable. In the newer prototypes, I used Go, specifically for the ease of networking and concurrency. With Zig, I was stuck.
For me, I want these for Rust, especially what Zig has because I use KDE. I know about https://github.com/KDAB/cxx-qt and it is the only maintained effort for Rust that is left standing after all these years. But I don't want QML. I definitely don't want C++ or CMake. I just want Rust and Cargo.
Stackful coroutines make sense when you have the RAM for it.
I've been using Zig for embedded (ARM Cortex-M4, 256KB RAM) mainly for memory safety with C interop. The explicitness around calling conventions catches ABI mismatches at compile-time instead of runtime crashes.
I actually prefer colored async (like Rust) over this approach. The "illusion of synchronous code" feels magical, but magic becomes a gotcha in larger codebases when you can't tell what's blocking and what isn't.
Isn't this a bad time to be embracing Zig? It's currently going through an intrusive upheaval of its I/O model. My impression is that it was going to take a few years for things to shake out. Is that wrong?
> My impression is that it was going to take a few years for things to shake out. Is that wrong?
I had that very impression in early 2020 after some months of Zigging (and being burned by constant breaking changes), and left, deciding "I'll check it out again in a few years."
I had some intuition it might be one of these forever-refactoring eternal-tinker-and-rewrite fests and here I am 5 years later, still lurking for that 1.0 from the sidelines, while staying in Go or C depending on the nature of the thing at hand.
That's not to say it'll never get there, it's a vibrant project prioritizing making the best design decisions rather than mere Shipping Asap. For a C-replacement that's the right spirit, in principle. But whether there's inbuilt immunity to engineers falling prey to their forever-refine-and-resculpt I can't tell. I find it a great project to wait for leisurely (=
Kind of is a bad idea. Even the author’s library is not using the latest zig IO features and is planning for big changes with 0.16. From the readme of the repo:
> Additionally, when Zig 0.16 is released with the std.Io interface, I will implement that as well, allowing you to use the entire standard library with this runtime.
Unrelated to this library, I plan to do lots of IO with Zig and will wait for 0.16. Your intuition may decide otherwise and that’s ok.
Hmm, if one writes a library Zetalib for the language Frob v0.14 and then Frob v0.15 introduces breaking changes that everyone else is going to adapt to, then well, package managers and version control is going to help indeed - they will help in staying in a void as no one will use Zetalib anymore because of the older Frob.
Yes, your opinion. I run it in production and everything I've built with it has been rock solid (aside from my own bugs). I haven't touched a few of my projects in a few years and they work fine, but if I wanted to update them to the latest version of Zig I'd have a bit of work ahead of me. That's it.
It really depends on what you are doing, but if it's something related to I/O and you embrace the buffered reader/writer interfaces introduced in Zig 0.15, I think not much is going to change. You might need changes on how you get those interfaces, but the core of your code is unchanged.
IMO, it's very wrong. Zig's language is not drastically changing, it's adding a new, *very* powerful API, which similar to how most everything in zig passes an allocator as a function param, soon functions that want to do IO, will accept an object that will provide the desired abstraction, so that callers can define the ideal implementation.
In other words, the only reason to not use zig if you detest upgrading or improving your code. Code you write today will still work tomorrow. Code you write tomorrow, will likely have a new Io interface, because you want to use that standard abstraction. But, if you don't want to use it, all your existing code will still work.
Just like today, if you want to alloc, but don't want to pass an `Allocator` you can call std.heap.page_allocator.alloc from anywhere. But because that abstraction is so useful, and zig supports it so ergonomically, everyone writes code that provides that improved API
side note; I was worried about upgrading all my code to interface with the new Reader/Writer API that's already mostly stable in 0.15.2, but even though I had to add a few lines in many existing projects to upgrade. I find myself optionally choosing to refactor a lot of functions because the new API results is code that is SO much better. Both in readability, but also performance. Do I have to refactor? No, the old API works flawlessly, but the new API is simply more ergonomic, more performant and easier to read and reason about. I'm doing it because I want to, not because I have to.
Everyone knows' a red diff is the best diff, and the new std.Io API exposes an easier way to do things. Still, like everything in zig, it allows you to write the code that you want to write. But if you want to do it yourself, that's fully supported too!
Haha no! Zig makes breaking changes in the stdlib in every release. I can guarantee you won’t be able to update a non trivial project between any of the latest 10 versions and beyond without changing your code , often substantially, and the next release is changing pretty much all code doing any kind of IO. I know because I keep track of that in a project and can see diffs between each of the latest versions. This allows me to modify other code much more easily.
But TBH, in 0.15 only zig build broke IIRC. However, I just didn’t happen to use some of the things that changed, I believe.
This isn't quite accurate. If you look at the new IO branch[1] you'll see (for example) most of the std.fs functions are gone, and most of what's left is deprecated. The plan is for all file/network access, mutexes, etc to be accessible only through the Io interface. It'll be a big migration once 0.16 drops.
> Do I have to refactor? No, the old API works flawlessly
The old API was deleted though? If you're saying it's possible to copy/paste the old stdlib into your project and maintain the old abstractions forward through the ongoing language changes, sure that's possible, but I don't think many people will want to fork std. I copy/pasted some stuff temporarily to make the 0.15 migration easier, but maintaining it forever would be swimming upstream for no reason.
Even the basic stuff like `openFile` is deprecated. I don't know what else to tell you. Zig won't maintain two slightly different versions of the fs functions in parallel. Once something is deprecated, that means it's going away. https://github.com/ziglang/zig/blob/init-std.Io/lib/std/fs/D...
Oh, I guess that's a fair point. I didn't consider the change from `std.fs.openFile` to `std.Io.Dir.openFile` to be meaningful, but I guess that is problematic for some reason?
You're of course correct here; but I thought it was reasonable to omit changes that I would describe as namespace changes. Now considering the audience I regret doing so. (it now does require nhe Io object as well, so namespace is inarticulate here)
That is literally a breaking change, so your old code will by definition not work flawlessly. Maybe the migration overhead is low, but it’s not zero like your comment implies
Mostly out of curiosity, a read on a TCP connection could easily block for a month - how does the I/O timeout interface look like ? e.g. if you want to send an application level heartbeat when a read has blocked for 30 seconds.
I don't have a good answer for that yet, mostly because TCP reads are expected to be done through std.Io.Reader which isn't aware of timeouts.
What I envision is something like `asyncio.timeout` in Python, where you start a timeout and let the code run as usual. If it's in I/O sleep when the timeout fires, it will get woken up and the operation gets canceled.
I see something like this:
var timeout: zio.Timeout = .init;
defer timeout.cancel(rt);
timeout.set(rt, 10);
const n = try reader.interface.readVec(&data);
No, I'm targeting Zig 0.15. The new Io interface is not in master yet, it's still evolving. When it's merged to master and stable, I'll start implementing the vtable. But I'm just passing Runtime around, instead of Io. So you can easily migrate code from zio to std when it's released.
This is very true. Most examples of async io I've seen - regardless of the framework - gloss over timeouts and cancellation. It's really the hardest part. Reading and writing asynchronously from a socket, or whatever, is the straightforward part.
I really need to play with Zig. I got really into Rust a few months ago, and I was actually extremely impressed by Tokio, so if this library also gives me Go-style concurrency without having to rely on a garbage collector, then I am likely to enjoy it.
Go has tricks that you can't replicate elsewhere, things like infinitely growable stacks, that's only possible thanks to the garbage collector. But I did enjoy working on this, I'm continually impressed with Zig for how nice high-level looking APIs are possible in such a low-level language.
> This video illustrates the use case of Perc within the Aegis Combat System, a digital command and control system capable of identifying and tracking incoming threats and providing the war fighter with a solution to address threats. Aegis, developed by Lockheed Martin, is critical to the operation of the DDG-51, and Lockheed Martin has selected Perc as the operating platform for Aegis to address real-time requirements and response times.
True. However in the bounded-time GC space few projects share the same definitions of low-latency or real-time. So you have to find a language that meets all of your other desiderata and provides a GC that meets your timing requirements. Perc looks interesting, Metronome made similar promises about sub-ms latency. But I'd have to get over my JVM runtime phobia.
Pre-1.0 Rust used to have infinitely growing stacks, but they abandoned it due to (among other things) performance reasons (IIRC the stacks were not collected with Rust's GC[1], but rather on return; the deepest function calls may happen in tight loops, and if you are allocating and freeing the stack in a tight loop, oops!)
You mean GO segmented stacks? You can literally them in C and C++ with GCC and glibc. It was implemented to support gccgo, but it works for other languages as well.
It is an ABI change though, so you need to recompile the whole stack (there might be the ability for segmented code to call non segmented code, but I don't remember the extent of the support) and it is probably half deprecated now. But it works and it doesn't need GC.
No, Go abandoned segmented stacks a long time ago. It causes unpredictable performance, because you can hit alloc/free cycle somewhere deep in code. What they do now is that when they hit stack guard, they allocate a new stack (2x size), copy the data, update pointers. Shrinking happens during GC.
I think by now we can consider gccgo will enventually join gcj.
The Fortran, Modula-2 and ALGOL 68 frontends are getting much more development work than gccgo, stuck in pre-generics Go, version 1.18 from 2022 and no one is working on it other than minor bug fixes.
If you succeed in creating a generic async primitive, it doesn't really matter what the original task was (as long as it's something that requires async), no? That's an implication of it being generic?
Honestly, have been excited about Zig for quite a while, dabbled a bit a while back and was waiting for it getting closer to 1.0 to actually do a deep dive... but that moment doesn't seem to come.
I don't mind, it's up to the maintainers on how they want to proceed. However, I would greatly appreciate if Zig news was a bit clearer on what's happening, timelines etc.
I think it takes relatively little time to do so, but optics would be so much better.
> Context switching is virtually free, comparable to a function call.
If you’re counting that low, then you need to count carefully.
A coroutine switch, however well implemented, inevitably breaks the branch predictor’s idea of your return stack, but the effect of mispredicted returns will be smeared over the target coroutine’s execution rather than concentrated at the point of the switch. (Similar issues exist with e.g. measuring the effect of blowing the cache on a CPU migration.) I’m actually not sure if Zig’s async design even uses hardware call/return pairs when a (monomorphized-as-)async function calls another one, or if every return just gets translated to an indirect jump. (This option affords what I think is a cleaner design for coroutines with compact frames, but it is much less friendly to the CPU.)
So a foolproof benchmark would require one to compare the total execution time of a (compute-bound) program that constantly switches between (say) two tasks to that of an equivalent program that not only does not switch but (given what little I know about Zig’s “colorless” async) does not run under an async executor(?) at all. Those tasks would also need to yield on a non-trivial call stack each time. Seems quite tricky all in all.
If you constantly switch between two tasks from the bottom of their call stack (as for stackless coroutines) and your stack switching code is inlined, then you can mostly avoid the mispaired call/ret penalty.
Also, if you control the compiler, an option is to compile all call/rets in and out of "io" code in terms of explicit jumps. A ret implemented as pop+indirect jump will be less less predictable than a paired ret, but has more chances to be predicted than an unpaired one.
My hope is that, if stackful coroutines become more mainstreams, CPU microarchitectures will start using a meta-predictor to chose between the return stack predictor and the indirect predictor.
> I’m actually not sure if Zig’s async design even uses hardware call/return pairs
Zig no longer has async in the language (and hasn't for quite some time). The OP implemented task switching in user-space.
Even so. You're talking about storing and loading at least ~16 8-byte registers, including the instruction pointer which is essentially a jump. Even to L1 that takes some time; more than a simple function call (jump + pushed return address).
Only stack and instruction pointer are explicitly restored. The rest is handled by the compiler, instead of depending on the C calling convention, it can avoid having things in registers during yield.
See this for more details on how stackful coroutines can be made much faster:
https://photonlibos.github.io/blog/stackful-coroutine-made-f...
> The rest is handled by the compiler, instead of depending on the C calling convention, it can avoid having things in registers during yield.
Yep, the frame pointer as well if you're using it. This is exactly how its implemented in user-space in Zig's WIP std.Io branch green-threading implementation: https://github.com/ziglang/zig/blob/ce704963037fed60a30fd9d4...
On ARM64, only fp, sp and pc are explicitly restored; and on x86_64 only rbp, rsp, and rip. For everything else, the compiler is just informed that the registers will be clobbered by the call, so it can optimize allocation to avoid having to save/restore them from the stack when it can.
Is this just buttering the cost of switches by crippling the optimization options compiler have?
If this was done the classical C way, you would always have to stack-save a number of registers, even if they are not really needed. The only difference here is that the compiler will do the save for you, in whatever way fits the context best. Sometimes it will stack-save, sometimes it will decide to use a different option. It's always strictly better than explicitly saving/restoring N registers unaware of the context. Keep in mind, that in Zig, the compiler always knows the entire code base. It does not work on object/function boundaries. That leads to better optimizations.
This is amazing to me that you can do this in Zig code directly as opposed to messing with the compiler.
See https://github.com/alibaba/PhotonLibOS/blob/2fb4e979a4913e68... for GNU C++ example. It's a tiny bit more limited, because of how the compilation works, but the concept is the same.
To be fair, this can be done in GNU C as well. Like the Zig implementation, you'd still have to use inline assembly.
> If this was done the classical C way, you would always have to stack-save a number of registers
I see, so you're saying that GCC can be coaxed into gathering only the relevant registers to stack and unstack not blindly do all of them?
Yes, you write inline assembly that saves the frame pointer, stack pointer, and instruction pointer to the stack, and list every other register as a clobber. GCC will know which ones its using at the call-site (assuming the function gets inlined; this is more likely in Zig due to its single unit of compilation model), and save those to the stack. If it doesn't get inlined, it'll be treated as any other C function and only save the ones needed to be preserved by the target ABI.
I wonder how you see it. Stackful coroutines switch context on syscall in the top stack frame, the deeper frames are regular optimized code, but syscall/sysret is already big context switch. And read/epoll loop has exactly same structure, the point of async programming isn't optimization of computation, but optimization of memory consumption. Performance is determined by features and design (and Electron).
What do you mean by "buttering the cost of switches", can you elaborate? (I am trying to learn about this topic)
I think it is
> buttering the cost of switches [over the whole execution time]
The switches get cheaper but the rest of the code gets slower (because it has less flexibility in register allocation) so the cost of the switches is "buttered" (i.e. smeared) over the rest of the execution time.
But I don't think this argument holds water. The surrounding code can use whatever registers it wants. In the worst case it saves and restores all of them, which is what a standard context switch does anyway. In other words, this can be better and is never worse.
You are right that the statement was overblown, however when I was testing with "trivial" load between yields (synchronized ping-pong between coroutines), I was getting numbers that I had trouble believing, when comparing them to other solutions.
In my test of a similar setup in C++ (IIRC about 10 years ago!), I was able to do a context switch every other cycle. The bottleneck was literally the cycles per taken jump of the microarchitecture I was testing again. As in your case it was a trivial test with two coroutines doing nothing except context switching, so the compiler had no need to save any registers at all and I carefully defined the ABI to be able to keep stack and instruction pointers in registers even across switches.
Semi-unrelated, but async is coming soon to Zig. I'm sorta holding off getting deep into Zig until it lands. https://kristoff.it/blog/zig-new-async-io/
I am still mystified as to why callback-based async seems to have become the standard. What this and e.g. libtask[1] do seems so much cleaner to me.
The Rust folks adopted async with callbacks, and they were essentially starting from scratch so had no need to do it that way, and they are smarter than I (both individually and collectively) so I'm sure they have a reason; I just don't know what it is.
1: https://swtch.com/libtask/
Stackless coroutines can be implemented using high level language constructs, and entirely in your language. Because of this it interacts with legacy code, and existing language features in predictable ways. Some security software or code hardening and instrumentation libraries will break as well.
Also, async at low level is literally always callbacks (even processor interrupts are callbacks)
By mucking about with the stack, you break stuff like stack unwinding for exceptions and GC, debuggers, and you probably make a bunch of assumptions you shouldn't
If you start using the compiler backend in unexpected ways, you either expose bugs or find missing functionality and find that the compiler writers made some assumptions about the code (either rightfully or not), that break when you start wildly overwriting parts of the stack.
Writing a compiler frontend is hard enough as it is, and becoming an LLVM expert is generally too much for most people.
But even if you manage to get it working, should you have your code break in either the compiler or any number of widely used external tooling, you literally can't fast track your fix, and thus you can't release your language (since it depends on a broken external dependency, fix pending whenever they feel like it).
I guess even if you are some sort of superhero who can do all this correclty, the LLVM people won't be happy merging some low level codegen change that has the potential to break all compiled software of trillion dollar corporations for the benefit of some small internet project.
One thing I would consider "unclean" about the zio approach (and e.g. libtask) is that you pass it an arbitrary expected stack size (or, as in the example, assume the default) and practically just kind of hope it's big enough not to blow up and small enough to be able to spawn as many tasks as you need. Meanwhile, how much stack actually ends up being needed by the function is a platform specific implementation detail and hard to know.
This is a gotcha of using stack allocation in general, but exacerbated in this case by the fact that you have an incentive to keep the stacks as small as possible when you want many concurrent tasks. So you either end up solving the puzzle of how big exactly the stack needs to be, you undershoot and overflow with possibly disastrous effects (especially if your stack happens to overflow into memory that doesn't cause an access violation) or you overshoot and waste memory. Better yet, you may have calculated and optimized your stack size for your platform and then the code ends up doing UB on a different platform with fewer registers, bigger `c_long`s or different alignment constraints.
If something like https://github.com/ziglang/zig/issues/157 actually gets implemented I will be happier about this approach.
Couldn’t you use the Go approach of starting with a tiny stack that is big enough for 90% of cases, then grow it as needed?
Go depends on the fact that it can track all pointers, and when it needs to resize stacks, it can update them.
Previous versions of Go used segmented stacks, which are theoretically possible, if Zig really wanted (would need compiler support), but they have nasty performance side-effects, see https://www.youtube.com/watch?v=-K11rY57K7k
The research Microsoft engineers did on stackful vs stackless coroutines for the c++ standard I think swayed this as “the way” to implement it for something targeting a systems level - significantly less memory overhead (you only pay for what you use) and offload the implementation details of the executor (lots of different design choices that can be made).
Yup, stackful fibers are an anti-pattern. Here's Gor Nishanov's review for the C++ ISO committee https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p13... linked from https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=10... . Notice how it sums things up:
> DO NOT USE FIBERS!
And this is the rebuttal: https://www.open-std.org/JTC1/SC22/WG21/docs/papers/2019/p08...
There are downsides to stackful coroutines (peak stack usage for example), but I feed that p1364 was attacking a strawman: first of all it is comparing a solution with builtin compiler support against a pure library implementation, second it is not even comparing against the reference implementation of the competing proposal.
> DO NOT USE FIBERS!
For C++.
If your language has RAII or exceptions, it raises crazy questions about how if thread A is hosting fiber 1, which throws an exception, which propagates outside of the fiber invocation scope, destroys a bunch of objects, then we switch to fiber 2, which sees the world in an inconsistent state (outside resources have been cleaned up, inside ones still alive).
This was literally impossible in pre-fiber code, so most existing code would probably not handle it well.
That's not different from threads running concurrent exceptions (in fact it is simpler in the single threaded example). RAII or exceptions are really not an issue for stackful coroutines.
Is stackful fibers the same as stackful coroutines?
yes same thing, different names.
The thread stack for something like libtask is ambiguously sized and often really large relative to like, formalized async state.
I think it started with an interrupt. And less abstraction often wins.
> callback-based async seems to have become the standard
At some level it's always callbacks. Then people build frameworks on top of these so programmers can pretend they're not dealing with callbacks.
> In the previous C++ version, I used Qt, which might seem very strange for a server software, but I wanted a nice way of doing asynchronous I/O and Qt allowed me to do that. It was callback-based, but Qt has a lot of support for making callbacks usable. In the newer prototypes, I used Go, specifically for the ease of networking and concurrency. With Zig, I was stuck.
There are new Qt bindings for these. Go has https://github.com/mappu/miqt and Zig has https://github.com/rcalixte/libqt6zig. I wonder if the author knew about them. I don't know enough about either language to speak on the async parts.
For me, I want these for Rust, especially what Zig has because I use KDE. I know about https://github.com/KDAB/cxx-qt and it is the only maintained effort for Rust that is left standing after all these years. But I don't want QML. I definitely don't want C++ or CMake. I just want Rust and Cargo.
Stackful coroutines make sense when you have the RAM for it.
I've been using Zig for embedded (ARM Cortex-M4, 256KB RAM) mainly for memory safety with C interop. The explicitness around calling conventions catches ABI mismatches at compile-time instead of runtime crashes.
I actually prefer colored async (like Rust) over this approach. The "illusion of synchronous code" feels magical, but magic becomes a gotcha in larger codebases when you can't tell what's blocking and what isn't.
> when you can't tell what's blocking and what isn't.
Isn't that exactly why they're making IO explicit in functions? So you can trace it up the call chain.
Isn't this a bad time to be embracing Zig? It's currently going through an intrusive upheaval of its I/O model. My impression is that it was going to take a few years for things to shake out. Is that wrong?
> My impression is that it was going to take a few years for things to shake out. Is that wrong?
I had that very impression in early 2020 after some months of Zigging (and being burned by constant breaking changes), and left, deciding "I'll check it out again in a few years."
I had some intuition it might be one of these forever-refactoring eternal-tinker-and-rewrite fests and here I am 5 years later, still lurking for that 1.0 from the sidelines, while staying in Go or C depending on the nature of the thing at hand.
That's not to say it'll never get there, it's a vibrant project prioritizing making the best design decisions rather than mere Shipping Asap. For a C-replacement that's the right spirit, in principle. But whether there's inbuilt immunity to engineers falling prey to their forever-refine-and-resculpt I can't tell. I find it a great project to wait for leisurely (=
Kind of is a bad idea. Even the author’s library is not using the latest zig IO features and is planning for big changes with 0.16. From the readme of the repo:
> Additionally, when Zig 0.16 is released with the std.Io interface, I will implement that as well, allowing you to use the entire standard library with this runtime.
Unrelated to this library, I plan to do lots of IO with Zig and will wait for 0.16. Your intuition may decide otherwise and that’s ok.
What's a few years? They go by in the blink of an eye. Zig is a perfectly usable language. People who want to use it will, those who don't won't.
following upstream is overrated since we have good package managers and version control.
it's completely feasible to stick to something that works for you, and only update/port/rewrite when it makes sense.
what matters is the overall cost.
Hmm, if one writes a library Zetalib for the language Frob v0.14 and then Frob v0.15 introduces breaking changes that everyone else is going to adapt to, then well, package managers and version control is going to help indeed - they will help in staying in a void as no one will use Zetalib anymore because of the older Frob.
only for hobby project
TigerBeetle, Bun, and Ghostty all beg to differ...
You or in general? Because, you know, this is like, your opinion, man.
My Opinion???
how about you goes to Zig github and check how progress of the language
it literally there and its still beta test and not fit for production let alone have mature ecosystem
Yes, your opinion. I run it in production and everything I've built with it has been rock solid (aside from my own bugs). I haven't touched a few of my projects in a few years and they work fine, but if I wanted to update them to the latest version of Zig I'd have a bit of work ahead of me. That's it.
It really depends on what you are doing, but if it's something related to I/O and you embrace the buffered reader/writer interfaces introduced in Zig 0.15, I think not much is going to change. You might need changes on how you get those interfaces, but the core of your code is unchanged.
IMO, it's very wrong. Zig's language is not drastically changing, it's adding a new, *very* powerful API, which similar to how most everything in zig passes an allocator as a function param, soon functions that want to do IO, will accept an object that will provide the desired abstraction, so that callers can define the ideal implementation.
In other words, the only reason to not use zig if you detest upgrading or improving your code. Code you write today will still work tomorrow. Code you write tomorrow, will likely have a new Io interface, because you want to use that standard abstraction. But, if you don't want to use it, all your existing code will still work.
Just like today, if you want to alloc, but don't want to pass an `Allocator` you can call std.heap.page_allocator.alloc from anywhere. But because that abstraction is so useful, and zig supports it so ergonomically, everyone writes code that provides that improved API
side note; I was worried about upgrading all my code to interface with the new Reader/Writer API that's already mostly stable in 0.15.2, but even though I had to add a few lines in many existing projects to upgrade. I find myself optionally choosing to refactor a lot of functions because the new API results is code that is SO much better. Both in readability, but also performance. Do I have to refactor? No, the old API works flawlessly, but the new API is simply more ergonomic, more performant and easier to read and reason about. I'm doing it because I want to, not because I have to.
Everyone knows' a red diff is the best diff, and the new std.Io API exposes an easier way to do things. Still, like everything in zig, it allows you to write the code that you want to write. But if you want to do it yourself, that's fully supported too!
> Code you write today will still work tomorrow.
Haha no! Zig makes breaking changes in the stdlib in every release. I can guarantee you won’t be able to update a non trivial project between any of the latest 10 versions and beyond without changing your code , often substantially, and the next release is changing pretty much all code doing any kind of IO. I know because I keep track of that in a project and can see diffs between each of the latest versions. This allows me to modify other code much more easily.
But TBH, in 0.15 only zig build broke IIRC. However, I just didn’t happen to use some of the things that changed, I believe.
This isn't quite accurate. If you look at the new IO branch[1] you'll see (for example) most of the std.fs functions are gone, and most of what's left is deprecated. The plan is for all file/network access, mutexes, etc to be accessible only through the Io interface. It'll be a big migration once 0.16 drops.
[1]: https://github.com/ziglang/zig/blob/init-std.Io/lib/std/fs.z...
> Do I have to refactor? No, the old API works flawlessly
The old API was deleted though? If you're saying it's possible to copy/paste the old stdlib into your project and maintain the old abstractions forward through the ongoing language changes, sure that's possible, but I don't think many people will want to fork std. I copy/pasted some stuff temporarily to make the 0.15 migration easier, but maintaining it forever would be swimming upstream for no reason.
> most of the std.fs functions are gone, and most of what's left is deprecated.
uhhh.... huh? you and I must be using very different definitions for the word most.
> The old API was deleted though?
To be completely fair, you're correct, the old deprecated writer that was available in 0.15 has been removed https://ziglang.org/documentation/0.15.2/std/#std.Io.Depreca... contrasted with the master branch which doesn't provide this anymore.
edit: lmao, your profile about text is hilarious, I appreciate the laugh!
Even the basic stuff like `openFile` is deprecated. I don't know what else to tell you. Zig won't maintain two slightly different versions of the fs functions in parallel. Once something is deprecated, that means it's going away. https://github.com/ziglang/zig/blob/init-std.Io/lib/std/fs/D...
Oh, I guess that's a fair point. I didn't consider the change from `std.fs.openFile` to `std.Io.Dir.openFile` to be meaningful, but I guess that is problematic for some reason?
You're of course correct here; but I thought it was reasonable to omit changes that I would describe as namespace changes. Now considering the audience I regret doing so. (it now does require nhe Io object as well, so namespace is inarticulate here)
That is literally a breaking change, so your old code will by definition not work flawlessly. Maybe the migration overhead is low, but it’s not zero like your comment implies
Zealotry in almost every paragraph.
Mostly out of curiosity, a read on a TCP connection could easily block for a month - how does the I/O timeout interface look like ? e.g. if you want to send an application level heartbeat when a read has blocked for 30 seconds.
I don't have a good answer for that yet, mostly because TCP reads are expected to be done through std.Io.Reader which isn't aware of timeouts.
What I envision is something like `asyncio.timeout` in Python, where you start a timeout and let the code run as usual. If it's in I/O sleep when the timeout fires, it will get woken up and the operation gets canceled.
I see something like this:
Are you working using Zig master with the new Io interface passed around, by the way?
No, I'm targeting Zig 0.15. The new Io interface is not in master yet, it's still evolving. When it's merged to master and stable, I'll start implementing the vtable. But I'm just passing Runtime around, instead of Io. So you can easily migrate code from zio to std when it's released.
This is very true. Most examples of async io I've seen - regardless of the framework - gloss over timeouts and cancellation. It's really the hardest part. Reading and writing asynchronously from a socket, or whatever, is the straightforward part.
I really need to play with Zig. I got really into Rust a few months ago, and I was actually extremely impressed by Tokio, so if this library also gives me Go-style concurrency without having to rely on a garbage collector, then I am likely to enjoy it.
Go has tricks that you can't replicate elsewhere, things like infinitely growable stacks, that's only possible thanks to the garbage collector. But I did enjoy working on this, I'm continually impressed with Zig for how nice high-level looking APIs are possible in such a low-level language.
Also, it is about time to let go with GC-phobia.
https://www.withsecure.com/en/solutions/innovative-security-...
https://www.ptc.com/en/products/developer-tools/perc
Note the
> This video illustrates the use case of Perc within the Aegis Combat System, a digital command and control system capable of identifying and tracking incoming threats and providing the war fighter with a solution to address threats. Aegis, developed by Lockheed Martin, is critical to the operation of the DDG-51, and Lockheed Martin has selected Perc as the operating platform for Aegis to address real-time requirements and response times.
Not all GCs are born alike.
> Not all GCs are born alike.
True. However in the bounded-time GC space few projects share the same definitions of low-latency or real-time. So you have to find a language that meets all of your other desiderata and provides a GC that meets your timing requirements. Perc looks interesting, Metronome made similar promises about sub-ms latency. But I'd have to get over my JVM runtime phobia.
I consider one where human lifes depend on it, for good or worse depending on the side, real time enough.
GC is fine, what scaries me is using j*va in Aegis..
The OutOfMemoryError will happen after rocket hits the target.
Pre-1.0 Rust used to have infinitely growing stacks, but they abandoned it due to (among other things) performance reasons (IIRC the stacks were not collected with Rust's GC[1], but rather on return; the deepest function calls may happen in tight loops, and if you are allocating and freeing the stack in a tight loop, oops!)
1: Yes, pre-1.0 Rust had a garbage collector.
Rust still has garbage collection if you use Arc and Rc. Not a garbage collector but this type of garbage collection.
You mean GO segmented stacks? You can literally them in C and C++ with GCC and glibc. It was implemented to support gccgo, but it works for other languages as well.
It is an ABI change though, so you need to recompile the whole stack (there might be the ability for segmented code to call non segmented code, but I don't remember the extent of the support) and it is probably half deprecated now. But it works and it doesn't need GC.
No, Go abandoned segmented stacks a long time ago. It causes unpredictable performance, because you can hit alloc/free cycle somewhere deep in code. What they do now is that when they hit stack guard, they allocate a new stack (2x size), copy the data, update pointers. Shrinking happens during GC.
I think by now we can consider gccgo will enventually join gcj.
The Fortran, Modula-2 and ALGOL 68 frontends are getting much more development work than gccgo, stuck in pre-generics Go, version 1.18 from 2022 and no one is working on it other than minor bug fixes.
Do you know that there's a concurrent Scala library named ZIO (https://zio.dev)? :-)
The first time I heard about Zig was actually on Bun’s website, it’s been getting better and better lately.
What makes a NATS client implementation the right prototype from which to extract a generic async framework layer?
This looks interesting but I'm not familiar with NATS
If you succeed in creating a generic async primitive, it doesn't really matter what the original task was (as long as it's something that requires async), no? That's an implication of it being generic?
The layer was not extracted from the NATS client, the NATS client was just a source of frustration that prompted this creation.
There is an extremely popular library/framework for Scala named ZIO out there,… Naming is hard.
Move Zig, for great justice.
One of the very first internet memes. The zig team should adopt it as the slogan.
https://en.wikipedia.org/wiki/All_your_base_are_belong_to_us
Zio already exists, https://zio.dev/
Honestly, have been excited about Zig for quite a while, dabbled a bit a while back and was waiting for it getting closer to 1.0 to actually do a deep dive... but that moment doesn't seem to come.
I don't mind, it's up to the maintainers on how they want to proceed. However, I would greatly appreciate if Zig news was a bit clearer on what's happening, timelines etc.
I think it takes relatively little time to do so, but optics would be so much better.
The article says it was created to write audio software but I'm unable to find any first sources for that. Pointers?
See the first example in Andrew's introduction: https://andrewkelley.me/post/intro-to-zig.html