> Many vector types include a capacity field, so that resizing on every push can be avoided. I do not include one, because simplicity is more important to me and realloc often does this already internally. In most scenarios, the performance is already good enough.
I think this is the wrong decision (for a generic array library).
Without a capacity field, each push operation potentially triggers a realloc, causing O(n) copying and possible memory fragmentation - especially problematic for large vectors or performance-critical code.
No, I claim that the performance of realloc is good enough for most use cases because it also does not move the memory in case there is already enough space left.
I then mention that for other use cases, you can maintain a capacity field only in the part of the code where you need this.
Whether this is the right design for everybody, I do not know, but so far it is what I prefer for myself.
I think it's a very interesting design choice. I haven't read the code, so maybe you already thought of this, but one idea that comes to mind is that instead of reallocing new_size, you realloc f(new_size) where f is some function that rounds up to discrete steps. This should ensure good asymptotics as realloc can then realize that the requested allocation size is identical to the current one and nothing needs to be done.
However one possible issue is if someone pushes and pops repeated just at the boundary where f increases in value. To address that you would have to use more advanced techniques, and I think "cheat" by inspecting internal structures of the allocator.
Edit: malloc_usable_size could be used for this purpose I think.
In all my current code where I tried it makes no noticeable difference, and I am not a fan of premature optimization. But then, I could always switch to the alternative API.
Popular allocators will indeed grow your allocation in-place without moving when possible. This is essentially the same as if you'd tracked it yourself in your vector and grown it once in a while, though it will work with bytes instead of number of items. See for instance the size classes in jemalloc at https://jemalloc.net/jemalloc.3.html. If you ask for 1 byte, you actually have 8 bytes, so realloc within the same size class will be cheap compared to actually moving.
Exactly! Why would I want to add my own memory management logic on top of the memory management logic that already exist.
One valid reason might be that I can't rely on realloc not be poor, but then I would rather use my own special allocation function. Other valid reasons would be to have very precise control or certain guarantees, but then I would prefer a different interface. In any case, I do not think that this logic belongs into my vector. But it is also possible that I change my mind on this...
As you said in the article, the reason is performance and that stands even if a realloc implementation is not poor.
It can't preallocate, so when adding many items it will make many expensive realloc calls that could have been avoided. Though you could have an interface that allows pushing or popping many items at once to make up for it, but this is less convenient.
You can currently forgo shrinking on pop but you can't if mixing popping and pushing. You need to know the capacity for that, otherwise it may actually shrink on a consequent push. This could incur many expensive reallocs if used similarly to a stack.
Even the cheap realloc calls will cost more than checking an int in code that's likely small, inlined, and with its data in one cache line (number of items and capacity next to each other in one struct, which are also next to the first item). The realloc function is probably dynamically linked, more complex, and has to access additional data.
If you don't want to add more memory overhead to the vector, consider just using two ints and it won't be any larger, that's enough unless a vector should be many gigabytes. Otherwise if you think that's not enough and feel like leaning into the complexity instead, use bit fields or bit twiddling to split up one 64 bit int into say, 56+8 bit ints. Let the smaller int track additional capacity rather than total capacity and also use that to solve your hysteresis problem. Well, or just use two 64 bit ints, but what's the fun in that?
When I implemented this, I did some minimal benchmarking. Realloc was easily fast enough to cover all my use cases. I am not Facebook who need to optimize the single last byte of their string type. For the fast case, there is the alternative API which allows one to keep track of the capacity, without having the cost of storing it in the type.
But realloc also does overallocate a little bit, and for large blocks it would remap the pages. The inlining argument makes sense though. (Edit: Pushing and then popping a billion integers takes about 2 seconds with std::vector<int>, 10 with vec(int) and realloc each time, and 4.3 seconds with a simple preallocation logic that does not require a capacity on my laptop. Having a call to "rand" in it already adds more overhead.).
Many C programmers need proper generic programming mechanisms (perhaps something like Zig's comptime) in C, but macros are the worst possible approach, and they don't want to switch to a different language like C++. As a result, they struggle with these issues. This is what I think the standardization committee should focus on, but instead, they introduced _Generic.
The biggest issue is the ABI for C - it's the lingua-franca of language interoperability and can't really be changed - so whatever approach is taken it needs to be fully compatible with the existing ABI. `_Generic` is certainly flawed but doesn't cause any breaking ABI changes.
That's also a major reason why you'd use C rather than C++. The C++ ABI is terrible for language interoperability. It's common for C++ libraries to wrap their API in C so that it can be used from other language's FFIs.
Aside from that another reason we prefer C to C++ is because we don't want vtables. I think there's room for a `C+` language, by which I mean C+templates and not C+classes - perhaps with an ABI which is a subset of the C++ ABI but superset of the C ABI.
Then don't use virtual functions. Then there will be no vtables.
You might have known that already, but in general I'm surprised how many engineers think that all C++ classes have vtables. No, most in fact do not. C++ classes generally have the same memory layout as a C struct as long as you don't use virtual functions.
> I think there's room for a `C+` language, by which I mean C+templates and not C+classes - perhaps with an ABI which is a subset of the C++ ABI but superset of the C ABI.
indeed, i have spoken to a lot of my colleagues about just that. if overloading is not allowed, perhaps there is still some hope for a backwards compatible abi ?
I don't think we can get away with just using the C ABI - or even if we did, we would need a standardized name-mangling scheme, and then any language which consumes the ABI would need to be aware of this name-mangling scheme, so it would effectively be a new ABI.
We might be able to make this ABI compatible with C if no templates are used, which wouldn't cause breaking changes - but for other compilers to be able to use templates they would need to opt-in to the new scheme. For that we'd probably want to augment libffi to include completely new functions for dealing with templates. Eg, we'd have an ffi_template_type, and an ffi_prep_template for which we supply its type arguments - then an ffi_prep_templated_cif for calls which use templates, and so forth. It would basically be a new API - but probably still more practical than trying to support the C++ ABI.
Another issue is that if we compile some library with templates and expose them in the ABI, we need some way to instantiate the template with new types which were not present when the library was compiled. There's no trivial solution to this. We'd really need to JIT-compile the templates.
If the templates are monomorphized, each instantiation of a templated function will have a different address. To acquire the address of any given instantiation we need a symbol in the object file.
This is true. I agree with this statement. It's the holy cow of C. However, the problem with generic programming and metaprogramming isn't going away, and many people continue to struggle with it. Introducing something like compile-time reflection might be a solution...
They showed something they think it’s neat. You start a topic with the assumption that they struggle, not sure how you get that information from the original post or you just want to state that claim anyway?
The most insulting thing about _Generic is the name. Really? _Generic? For a type-based switch with horrific syntax? What were they thinking...
That said, generic programming in C isn't that bad, just very annoying.
To me the best approach is to write the code for a concrete type (like Vec_int), make sure everything is working, and then do the following:
A macro Vec(T) sets up the struct. It can then be wrapped in a typedef like typedef Vec(int) Vec_i;
For each function, like vec_append(...), copy the body into a macro VEC_APPEND(...).
Then for each relevant type T: copy paste all the function declarations, then do a manual find/replace to give them some suffix and fill in the body with a call to the macro (to avoid any issues with expressions being executed multiple times in a macro body).
Is it annoying? Definitely. Is it unmanageable? Not really. Some people don't even bother with this last bit and just use the macros to inline the code everywhere.
Some macros can delegate to void*-based helpers to minimize the bloating.
EDIT: I almost dread to suggest this but CMake's configure_file command works great to implement generic files...
There are less annoying ways to implement this in C. There are at least two different common approaches which avoid having macro code for the generic functions:
The first is to put this into an include file
#define type_argument int
#include <vector.h>
Then inside vector.h the code looks like regular C code, except where you insert the argument.
foo_ ## type_argument ( ... )
The other is to write generic code using void pointers or container_of as regular functions, and only have one-line macros as type safe wrappers around it. The optimizer will be able to specialize it, and it avoids compile-time explosion of code during monomorphization,
I do not think that templates are less annoying in practice. My experience with templates is rather poor.
An idea I had was to implement a FUSE filesystem for includes, so instead of the separate `#define type_argument` (and `#undef type_argument` that would need to follow the #include), we could stick the type argument in the included filename.
The written `vector.h(type_argument)` file could just be a regular C header or an m4 file which has `type_argument` in its template. When requesting `vector.h(int32_t)` the FUSE filesystem would effectively give the output of calling `gcc -E` or `m4` on the template file as the content of the file being requested.
Eg, if `vector.h(type_argument)` was an m4 file containing:
But the idea is to make it transparent so that existing tools just see the pre-processed file and don't need to call `m4` manually. We would need to mount each include directory that uses this approach using said filesystem. This shouldn't require changing a project's structure as we could use the existing `include/` or `src/` directory as input when mounting, and just pick some new directory name such as `cfuse/include` or `cfuse/src`, and mount a new directory `cfuse` in the project's root directory. The change we'd need to make is in any Makefiles or other parts of the build, where instead of `gcc -Iinclude` we'd have `gcc -Icfuse/include`. Any non-templated headers in `include/` would just appear as live copies in cfuse/include/, so in theory this could work without causing anything to break.
Those techniques being less annoying is highly debatable ;). Working with void* is annoying, header includes look quite ugly with the ## concatenation everywhere or even a wrapper macro. It also gets much worse when you need to customize the suffix (because type_argument is char* or whatever).
Sometimes the best option is an external script to instantiate a template file.
It may be debatable, but I would say C++'s template syntax is not nicer. I do not think working with void pointers is annoying, but I also prefer the container_of approach. The ## certainly has the limitation that you need to name things first, but I do not think this much of a downside.
The name has to be ugly, new names in C are always taken from the set of reserved identifiers: those starting with an underscore & a capital letter, or with two underscores. Since they didn't reserve any "normal" names, all new keywords will be stuff like `_Keyword` or `__keyword`, unless they break backwards compatibility. And they really hate breaking backwards compatibility, so that's quite unlikely.
Hey, I understand you and know this stuff well, having worked with it for many years as a C dev. To be honest, this isn't how things should generally be done. Macros were invented for very simple problems. Yes, we can abuse them as much as possible (for example, in C++, we discovered SFINAE, which is an ugly, unreadable technique that wasn't part of the programming language designer's intent but rather like a joke that people started abusing), but is it worth it?
I'm currently at a crossroads: C++ or Zig. One is very popular with a large community, amazing projects, but has lots of ugly design decisions and myriad rules you must know (this is a big pain, it seems like even Stroustrup can't handle all of them). The other is very close to what I want from C, but it's not stable and not popular.
Its only real issue is that people will constantly tell you how bad it is and how their language of choice is so much better. But if you look at how things work out in practice, you can usually do things very nicely in C.
My choice in this situation is indeed C, but every once in a while I hit a problem that makes me yearn for better metaprogramming.
Perfect hashing that you’d ideally use two different approaches for depending on whether the platform has a cheap popcount (hi AArch32), but to avoid complicating the build you give up and emulate popcount instead. Hundreds of thousands of lines of asynchronous I/O code written in a manual continuation-passing style, with random, occasionally problematic blocking synchronization sprinkled all over because the programmer simply could not be bothered anymore to untangle this nested loop, and with a dynamic allocation for each async frame because that’s the path of least resistance. The intense awkwardness of the state-machine / regular-expression code generators, well-developed as they are. Hoping the compiler will merge the `int` and `long` code paths when their machine representations are identical, but not seeing it happen because functions must have unique addresses. Resorting to .init_array—and slowing down startup—because the linker is too rigid to compute this one known-constant value. And yes, polymorphic datastructures.
I don’t really see anybody do noticeably better than C; I think only Zig and Odin (perhaps also Hare and Virgil?) are even competing in the same category. But I can’t help feeling that things could be much better. Then I look at the graveyard of attempted extensions both special-purpose (CPC[1]) and general (Xoc[2]) and despair.
It would be interesting to understand better where language feature are actually needed or helpful, and where the code should be organized differently. I also observe that often cure if worse than the disease.
From my experience, trying to make C type safe is counter productive.
It's perfectly possible to do generic vectors in C without twisting the language. This implementation isn't as safe as alternatives in other languages, but plays well on C's strengths.
I think the overwhelmingly better approach for C is codegen here. Better ergonomics, tab completion, less error prone, etc. As long as your codegen is solid!
It is also not clear how you get tap completion with code generation. But you could also get tab completion here, somebody just has to add this to the tab completion logic.
The post is more of a quick-n-dirty (and rather trivial) proof of concept as the code includes only sporadical checks for allocation errors and then adds a hand-wavy disclaimer to improve it as needed.
E.g. in production code this
if (!vec_ptr) // memory out
abort();
for (int i = 0; i < 10; i++)
vec_push(int, &vec_ptr, i);
should really be
if (!vec_ptr) // memory out
abort();
for (int i = 0; i < 10; i++)
if (! vec_push(int, &vec_ptr, i))
abort();
If this is in a library code, then I tend to disagree. As an user of a library, I would rather be able to handle errors the way I want, I do not want the library to decide this for me, so just return an error value, like "VEC_ERR_NOMEM", or whatever.
It's amazing how many people try to write generic containers for C, when there is already a perfect solution for that, called C++. It's impossible to write generic type-safe code in C, and this version resorts to using GCC extensions to the language (note the ({…}) expressions).
For those afraid of C++: you don't have to use all of it at once, and compilers have been great for the last few decades. You can easily port C code to C++ (often you don't have to do anything at all). Just try it out and reassess the objections you have.
Except that I find C++ far from being perfect. In fact, I switched from C++ to C (a while ago) to avoid its issues and I am being much happier I also find my vec(int) much nicer.
In fact, we are at the moment ripping out some template code in a C code base which has some C++ for cuda in it, and this one file with C++ templates almost doubles the compilation time of the complete project (with ~700 source files). IMHO it is grotesque how bad it is.
My problem with C++, and maybe this is just me, is RAII.
Now, Resource Aquisition Is Initialization is correct, but the corollary is not generally true, which is to say, my variable going out of scope does not generally mean I want to de-aquire that resource.
So, sooner or later, everything gets wrapped in a reference counting smart pointer. And reference counting always seemed to me to be a primitive or last-resort memory managment strategy.
> my variable going out of scope does not generally mean I want to de-aquire that resource.
But it does! When an object goes out of scope, nobody can/shall use it anymore, so of course it should release its (remaining) resources. If you want to hold on the object, you need to revisit its lifetime and ownership, but that's independent from RAII.
Your problem is not with RAII, but with reference counting, which you correctly identified should be the last resort, not the default; at least for the applications typically written in C++.
... in C++. Because, at least in C++, it is a bad, slow and inconvenient GC. If you want or need generalized GC there are significantly better languages for that.
Instead of reference counting, consider having two types. An "owner" type which actually contains the resource and the destructor to dequire the resource. And "lender" types which contain a reference (a pointer or just logically (e.g., an fd can just be copied into the lender but only closed by the owner) to the resource which don't dequire on destruction.
Same thing as what Rust does with `String` and `str`.
I agree, but the current specification is complex too: two identical "tagged" structs are compatible, two identical "untagged" structs are not. And before C23 it was even worse, depending on whether the two structs were defined in the same file or not.
We're applying a patch over a patch over a patch... no surprise the end result looks like a patchwork!
Sure, but _Record would add even more complexity. The tag rules I had changed in C23 were a step to remove complexity, so a step towards cleaning it up. I wasn't able to fix the untagged case, because WG14 had concerns, but I think these can be addressed, making another step. It is always much harder to undo complexity than to add it.
I made a pretty convenient C vector library a while back that lets you use the [] operator directly on the vector pointers: https://github.com/Mashpoe/c-vector
An important property of C++ templates is that their instantiations are de-duplicated at link-time.
If you have two dependencies in C which both use the same generic container macros, and both instantiate vec(int) you'll fail-out with redefinition errors.
Sorry, I don't understand what macro you're referring to. Your post addressed "vec(int)", and suggested it will fail if repeated, which was true before C23 but not anymore. What functions are defined through vec(int)?
You don't need an explicit allocator: you can create an empty object and vec_push() does the magic of (re)allocating the memory when needed.
Instead, what is missing is an automatic deallocator, one that's automatically called when the variable goes out of scope. For example:
{
vec(int) v={}; /* no extra room allocated */
vec_push(v,1); /* space allocated */
... use v ...
} /* here the space is dellocated, then v is released */
This example doesn't use the same definition of vec as TFA, but something more similar to 'span' by the same author.
Array_Free is very simple, and Array_Grow is barely more complicated (however I wrote it off the cuff, so of course it could still be wrong). Both of these mainly exist just to keep stdlib.h out of the header.
You might want to hide the size naming policy behind a macro:
#define ARRAY_LENGTH(S) (S##_length)
But it'd often be shorter just to type the name out. (On the other hand, if attempting to fully productize this, maybe it'd be nice to let the consumer customize the naming convention - and now you'd have the option. ARRAY_LENGTH would hide the details, you'd fix up ARRAY_ADD (etc.) to use it, and now you could have the size variable called arraySize (or whatever) and everything would fall into line.)
For a full implementation you'll probably also need a way of generating a static array. (I mainly found myself needing this for test code, which uses globals for convenience; most arrays I create normally are locals, or parts of structs.)
You'll also need a parameters list for use in a function declaration or definition, and a macro that expands to all 3 variables.
#define ARRAY_PARAMS(T,S) T *S,size_t S##_length,size_t S##_capacity
#define ARRAY_ARG(S) S,S##_length,S##_capacity
Like then you might have a function that takes a pointer to an "array":
(I found this cropped up often enough that I needed the macro, but it was less common than I thought.)
There's more you can do, but the above is the long and the short of it.
This might all look terrible - or perhaps it sort of looks OK, but you're just not sure that it would actually work - but I've used this in a prototype project and thought it worked out well. (I mainly do C++ nowadays, but I had a good stint of C before that. So even if I've got no taste, hopefully I've at least got a rough feel for what'll work out OK and what'll end up a disaster.)
> Many vector types include a capacity field, so that resizing on every push can be avoided. I do not include one, because simplicity is more important to me and realloc often does this already internally. In most scenarios, the performance is already good enough.
I think this is the wrong decision (for a generic array library).
Without a capacity field, each push operation potentially triggers a realloc, causing O(n) copying and possible memory fragmentation - especially problematic for large vectors or performance-critical code.
> realloc often does this already internally
Is Martin claiming that realloc is "often" maintaining a O(1) growable array for us?
That's what the analogous types in C++ or Rust, or indeed Java, Go, C# etc. provide.
No, I claim that the performance of realloc is good enough for most use cases because it also does not move the memory in case there is already enough space left.
I then mention that for other use cases, you can maintain a capacity field only in the part of the code where you need this.
Whether this is the right design for everybody, I do not know, but so far it is what I prefer for myself.
I think it's a very interesting design choice. I haven't read the code, so maybe you already thought of this, but one idea that comes to mind is that instead of reallocing new_size, you realloc f(new_size) where f is some function that rounds up to discrete steps. This should ensure good asymptotics as realloc can then realize that the requested allocation size is identical to the current one and nothing needs to be done.
However one possible issue is if someone pushes and pops repeated just at the boundary where f increases in value. To address that you would have to use more advanced techniques, and I think "cheat" by inspecting internal structures of the allocator.
Edit: malloc_usable_size could be used for this purpose I think.
Yes!
I try do this here (this code is not tested and may not be up-to-date): https://github.com/uecker/noplate/blob/main/src/vec.h#L30
The issue with the boundary is what I meant with hysteresis in the article.
I think the claim that it's good enough for "most use cases" to have an O(n) growable array container needs some serious backing data.
If I have some time, I will do some benchmarking.
In all my current code where I tried it makes no noticeable difference, and I am not a fan of premature optimization. But then, I could always switch to the alternative API.
Popular allocators will indeed grow your allocation in-place without moving when possible. This is essentially the same as if you'd tracked it yourself in your vector and grown it once in a while, though it will work with bytes instead of number of items. See for instance the size classes in jemalloc at https://jemalloc.net/jemalloc.3.html. If you ask for 1 byte, you actually have 8 bytes, so realloc within the same size class will be cheap compared to actually moving.
Exactly! Why would I want to add my own memory management logic on top of the memory management logic that already exist.
One valid reason might be that I can't rely on realloc not be poor, but then I would rather use my own special allocation function. Other valid reasons would be to have very precise control or certain guarantees, but then I would prefer a different interface. In any case, I do not think that this logic belongs into my vector. But it is also possible that I change my mind on this...
As you said in the article, the reason is performance and that stands even if a realloc implementation is not poor.
It can't preallocate, so when adding many items it will make many expensive realloc calls that could have been avoided. Though you could have an interface that allows pushing or popping many items at once to make up for it, but this is less convenient.
You can currently forgo shrinking on pop but you can't if mixing popping and pushing. You need to know the capacity for that, otherwise it may actually shrink on a consequent push. This could incur many expensive reallocs if used similarly to a stack.
Even the cheap realloc calls will cost more than checking an int in code that's likely small, inlined, and with its data in one cache line (number of items and capacity next to each other in one struct, which are also next to the first item). The realloc function is probably dynamically linked, more complex, and has to access additional data.
If you don't want to add more memory overhead to the vector, consider just using two ints and it won't be any larger, that's enough unless a vector should be many gigabytes. Otherwise if you think that's not enough and feel like leaning into the complexity instead, use bit fields or bit twiddling to split up one 64 bit int into say, 56+8 bit ints. Let the smaller int track additional capacity rather than total capacity and also use that to solve your hysteresis problem. Well, or just use two 64 bit ints, but what's the fun in that?
When I implemented this, I did some minimal benchmarking. Realloc was easily fast enough to cover all my use cases. I am not Facebook who need to optimize the single last byte of their string type. For the fast case, there is the alternative API which allows one to keep track of the capacity, without having the cost of storing it in the type.
But realloc also does overallocate a little bit, and for large blocks it would remap the pages. The inlining argument makes sense though. (Edit: Pushing and then popping a billion integers takes about 2 seconds with std::vector<int>, 10 with vec(int) and realloc each time, and 4.3 seconds with a simple preallocation logic that does not require a capacity on my laptop. Having a call to "rand" in it already adds more overhead.).
Many C programmers need proper generic programming mechanisms (perhaps something like Zig's comptime) in C, but macros are the worst possible approach, and they don't want to switch to a different language like C++. As a result, they struggle with these issues. This is what I think the standardization committee should focus on, but instead, they introduced _Generic.
The biggest issue is the ABI for C - it's the lingua-franca of language interoperability and can't really be changed - so whatever approach is taken it needs to be fully compatible with the existing ABI. `_Generic` is certainly flawed but doesn't cause any breaking ABI changes.
That's also a major reason why you'd use C rather than C++. The C++ ABI is terrible for language interoperability. It's common for C++ libraries to wrap their API in C so that it can be used from other language's FFIs.
Aside from that another reason we prefer C to C++ is because we don't want vtables. I think there's room for a `C+` language, by which I mean C+templates and not C+classes - perhaps with an ABI which is a subset of the C++ ABI but superset of the C ABI.
> we don't want vtables
Then don't use virtual functions. Then there will be no vtables.
You might have known that already, but in general I'm surprised how many engineers think that all C++ classes have vtables. No, most in fact do not. C++ classes generally have the same memory layout as a C struct as long as you don't use virtual functions.
> I think there's room for a `C+` language, by which I mean C+templates and not C+classes - perhaps with an ABI which is a subset of the C++ ABI but superset of the C ABI.
indeed, i have spoken to a lot of my colleagues about just that. if overloading is not allowed, perhaps there is still some hope for a backwards compatible abi ?
I don't think we can get away with just using the C ABI - or even if we did, we would need a standardized name-mangling scheme, and then any language which consumes the ABI would need to be aware of this name-mangling scheme, so it would effectively be a new ABI.
We might be able to make this ABI compatible with C if no templates are used, which wouldn't cause breaking changes - but for other compilers to be able to use templates they would need to opt-in to the new scheme. For that we'd probably want to augment libffi to include completely new functions for dealing with templates. Eg, we'd have an ffi_template_type, and an ffi_prep_template for which we supply its type arguments - then an ffi_prep_templated_cif for calls which use templates, and so forth. It would basically be a new API - but probably still more practical than trying to support the C++ ABI.
Another issue is that if we compile some library with templates and expose them in the ABI, we need some way to instantiate the template with new types which were not present when the library was compiled. There's no trivial solution to this. We'd really need to JIT-compile the templates.
> ... we would need a standardized name-mangling scheme, ...
may you please elaborate on _why_ you think this is needed ?
If the templates are monomorphized, each instantiation of a templated function will have a different address. To acquire the address of any given instantiation we need a symbol in the object file.
What isn't clear to me why one would ever want monomorphization in the first place.
Not ever wanting monomorphization seems like a bit of a strong claim to me. Why do you take that position?
I was asking the question why one would ever want it.
exactly !
How can you have templates without name mangling and overloads?
This is true. I agree with this statement. It's the holy cow of C. However, the problem with generic programming and metaprogramming isn't going away, and many people continue to struggle with it. Introducing something like compile-time reflection might be a solution...
They showed something they think it’s neat. You start a topic with the assumption that they struggle, not sure how you get that information from the original post or you just want to state that claim anyway?
The most insulting thing about _Generic is the name. Really? _Generic? For a type-based switch with horrific syntax? What were they thinking...
That said, generic programming in C isn't that bad, just very annoying.
To me the best approach is to write the code for a concrete type (like Vec_int), make sure everything is working, and then do the following:
A macro Vec(T) sets up the struct. It can then be wrapped in a typedef like typedef Vec(int) Vec_i;
For each function, like vec_append(...), copy the body into a macro VEC_APPEND(...).
Then for each relevant type T: copy paste all the function declarations, then do a manual find/replace to give them some suffix and fill in the body with a call to the macro (to avoid any issues with expressions being executed multiple times in a macro body).
Is it annoying? Definitely. Is it unmanageable? Not really. Some people don't even bother with this last bit and just use the macros to inline the code everywhere.
Some macros can delegate to void*-based helpers to minimize the bloating.
EDIT: I almost dread to suggest this but CMake's configure_file command works great to implement generic files...
There are less annoying ways to implement this in C. There are at least two different common approaches which avoid having macro code for the generic functions:
The first is to put this into an include file
Then inside vector.h the code looks like regular C code, except where you insert the argument. The other is to write generic code using void pointers or container_of as regular functions, and only have one-line macros as type safe wrappers around it. The optimizer will be able to specialize it, and it avoids compile-time explosion of code during monomorphization,I do not think that templates are less annoying in practice. My experience with templates is rather poor.
An idea I had was to implement a FUSE filesystem for includes, so instead of the separate `#define type_argument` (and `#undef type_argument` that would need to follow the #include), we could stick the type argument in the included filename.
The written `vector.h(type_argument)` file could just be a regular C header or an m4 file which has `type_argument` in its template. When requesting `vector.h(int32_t)` the FUSE filesystem would effectively give the output of calling `gcc -E` or `m4` on the template file as the content of the file being requested.Eg, if `vector.h(type_argument)` was an m4 file containing:
Then `m4 -D type_argument=int32_t vector.h(type_argument)` gives the output: But the idea is to make it transparent so that existing tools just see the pre-processed file and don't need to call `m4` manually. We would need to mount each include directory that uses this approach using said filesystem. This shouldn't require changing a project's structure as we could use the existing `include/` or `src/` directory as input when mounting, and just pick some new directory name such as `cfuse/include` or `cfuse/src`, and mount a new directory `cfuse` in the project's root directory. The change we'd need to make is in any Makefiles or other parts of the build, where instead of `gcc -Iinclude` we'd have `gcc -Icfuse/include`. Any non-templated headers in `include/` would just appear as live copies in cfuse/include/, so in theory this could work without causing anything to break.That's the craziest idea on this topic I've seen so far! I'm not sure that's a good or a bad thing, but it sure is a thing!
Those techniques being less annoying is highly debatable ;). Working with void* is annoying, header includes look quite ugly with the ## concatenation everywhere or even a wrapper macro. It also gets much worse when you need to customize the suffix (because type_argument is char* or whatever).
Sometimes the best option is an external script to instantiate a template file.
It may be debatable, but I would say C++'s template syntax is not nicer. I do not think working with void pointers is annoying, but I also prefer the container_of approach. The ## certainly has the limitation that you need to name things first, but I do not think this much of a downside.
BTW, here is some generic code in C using a variadic type. I think this quite nice. https://godbolt.org/z/jxz6Y6f9x
Running a program for meta programming are always a possibility, and I would agree that sometimes the best solution.
I don't think
is that much different from Same for: vs:The name has to be ugly, new names in C are always taken from the set of reserved identifiers: those starting with an underscore & a capital letter, or with two underscores. Since they didn't reserve any "normal" names, all new keywords will be stuff like `_Keyword` or `__keyword`, unless they break backwards compatibility. And they really hate breaking backwards compatibility, so that's quite unlikely.
The problem is not the _G, the problem is the "generic". It is a completely wrong name for what it does.
This is an old tradition in ISO C. unsigned actually means modulo and const actually means immutable.
Hey, I understand you and know this stuff well, having worked with it for many years as a C dev. To be honest, this isn't how things should generally be done. Macros were invented for very simple problems. Yes, we can abuse them as much as possible (for example, in C++, we discovered SFINAE, which is an ugly, unreadable technique that wasn't part of the programming language designer's intent but rather like a joke that people started abusing), but is it worth it?
username checks out
I don't struggle, I switch from C++ to C and find this much nicer.
I'm currently at a crossroads: C++ or Zig. One is very popular with a large community, amazing projects, but has lots of ugly design decisions and myriad rules you must know (this is a big pain, it seems like even Stroustrup can't handle all of them). The other is very close to what I want from C, but it's not stable and not popular.
Why not C?
Its only real issue is that people will constantly tell you how bad it is and how their language of choice is so much better. But if you look at how things work out in practice, you can usually do things very nicely in C.
My choice in this situation is indeed C, but every once in a while I hit a problem that makes me yearn for better metaprogramming.
Perfect hashing that you’d ideally use two different approaches for depending on whether the platform has a cheap popcount (hi AArch32), but to avoid complicating the build you give up and emulate popcount instead. Hundreds of thousands of lines of asynchronous I/O code written in a manual continuation-passing style, with random, occasionally problematic blocking synchronization sprinkled all over because the programmer simply could not be bothered anymore to untangle this nested loop, and with a dynamic allocation for each async frame because that’s the path of least resistance. The intense awkwardness of the state-machine / regular-expression code generators, well-developed as they are. Hoping the compiler will merge the `int` and `long` code paths when their machine representations are identical, but not seeing it happen because functions must have unique addresses. Resorting to .init_array—and slowing down startup—because the linker is too rigid to compute this one known-constant value. And yes, polymorphic datastructures.
I don’t really see anybody do noticeably better than C; I think only Zig and Odin (perhaps also Hare and Virgil?) are even competing in the same category. But I can’t help feeling that things could be much better. Then I look at the graveyard of attempted extensions both special-purpose (CPC[1]) and general (Xoc[2]) and despair.
[1] https://github.com/kerneis/cpc
[2] https://pdos.csail.mit.edu/archive/xoc/
It would be interesting to understand better where language feature are actually needed or helpful, and where the code should be organized differently. I also observe that often cure if worse than the disease.
Many example I see where people argue for metaprogramming features are not all convincing to me. For example, there was recently a discussion about Zig comp-time. https://news.ycombinator.com/item?id=44208060 This is the Zig example: https://godbolt.org/z/1dacacfzc Here is the C code: https://godbolt.org/z/Wxo4vaohb
Or there was a recent example where someone wanted to give an example for C++ coroutines and showed pre-order tree traversal (which I can't find at the moment), but the C code using vec(node) IMHO was better: https://godbolt.org/z/sjbT453dM compared to the C++ coroutine version: https://godbolt.org/z/fnGzszf3j (from https://news.ycombinator.com/item?id=43831628 here). Edited to add source.
Macros are the best possibly approach, compared to C++ templates or _Generic
From my experience, trying to make C type safe is counter productive.
It's perfectly possible to do generic vectors in C without twisting the language. This implementation isn't as safe as alternatives in other languages, but plays well on C's strengths.
https://github.com/codr7/hacktical-c/tree/main/vector
Why would you think this? My implementation is type and bounds safe and nice to use.
Because that level of (type) safety and C is a bad fit.
I assume you do not want to elaborate why you think it is a "bad fit". IMHO it is perfect.
I think the overwhelmingly better approach for C is codegen here. Better ergonomics, tab completion, less error prone, etc. As long as your codegen is solid!
Why? I do not find the ergonomics bad.
It is also not clear how you get tap completion with code generation. But you could also get tab completion here, somebody just has to add this to the tab completion logic.
By far the best implementation for type-safe containers in C I found is stb_ds, it provides both a vector and a hashmap (including a string one).
https://nothings.org/stb_ds
I do something similar, but I don't implement the logic in a macro, instead I have a Vec struct which looks like
I then use a macro to define a new type Using the zero sized filed I can do typeof(*ty) to get some type safety back.All of the methods are implemented on the base Vec type and have a small wrapper which casts/assets the type of the things you are trying to push.
The post is more of a quick-n-dirty (and rather trivial) proof of concept as the code includes only sporadical checks for allocation errors and then adds a hand-wavy disclaimer to improve it as needed.
E.g. in production code this
should really be but it doesn't really roll of the tongue.If this is in a library code, then I tend to disagree. As an user of a library, I would rather be able to handle errors the way I want, I do not want the library to decide this for me, so just return an error value, like "VEC_ERR_NOMEM", or whatever.
If all you do is call abort anyway, you do not need an interface that makes you test for errors.
It's amazing how many people try to write generic containers for C, when there is already a perfect solution for that, called C++. It's impossible to write generic type-safe code in C, and this version resorts to using GCC extensions to the language (note the ({…}) expressions).
For those afraid of C++: you don't have to use all of it at once, and compilers have been great for the last few decades. You can easily port C code to C++ (often you don't have to do anything at all). Just try it out and reassess the objections you have.
Except that I find C++ far from being perfect. In fact, I switched from C++ to C (a while ago) to avoid its issues and I am being much happier I also find my vec(int) much nicer.
In fact, we are at the moment ripping out some template code in a C code base which has some C++ for cuda in it, and this one file with C++ templates almost doubles the compilation time of the complete project (with ~700 source files). IMHO it is grotesque how bad it is.
My problem with C++, and maybe this is just me, is RAII.
Now, Resource Aquisition Is Initialization is correct, but the corollary is not generally true, which is to say, my variable going out of scope does not generally mean I want to de-aquire that resource.
So, sooner or later, everything gets wrapped in a reference counting smart pointer. And reference counting always seemed to me to be a primitive or last-resort memory managment strategy.
> my variable going out of scope does not generally mean I want to de-aquire that resource.
But it does! When an object goes out of scope, nobody can/shall use it anymore, so of course it should release its (remaining) resources. If you want to hold on the object, you need to revisit its lifetime and ownership, but that's independent from RAII.
Your problem is not with RAII, but with reference counting, which you correctly identified should be the last resort, not the default; at least for the applications typically written in C++.
Why should reference counting be a last resort?
Because it's insanely slow, and doesn't support cyclic structures.
... in C++. Because, at least in C++, it is a bad, slow and inconvenient GC. If you want or need generalized GC there are significantly better languages for that.
Instead of reference counting, consider having two types. An "owner" type which actually contains the resource and the destructor to dequire the resource. And "lender" types which contain a reference (a pointer or just logically (e.g., an fd can just be copied into the lender but only closed by the owner) to the resource which don't dequire on destruction.
Same thing as what Rust does with `String` and `str`.
If you want to take back manual control, use the release() function
And if I want a vec(int *)? These token pasting 'generic' macros never work for non-trivial types.
Correct, complex types must be typedef'd. At least, until c2y integrates _Record as per N3332: https://thephd.dev/_vendor/future_cxx/papers/C%20-%20_Record...
I am not terribly excited about this proposal. It is overly complex.
I agree, but the current specification is complex too: two identical "tagged" structs are compatible, two identical "untagged" structs are not. And before C23 it was even worse, depending on whether the two structs were defined in the same file or not.
We're applying a patch over a patch over a patch... no surprise the end result looks like a patchwork!
Sure, but _Record would add even more complexity. The tag rules I had changed in C23 were a step to remove complexity, so a step towards cleaning it up. I wasn't able to fix the untagged case, because WG14 had concerns, but I think these can be addressed, making another step. It is always much harder to undo complexity than to add it.
I made a pretty convenient C vector library a while back that lets you use the [] operator directly on the vector pointers: https://github.com/Mashpoe/c-vector
A neat project that was posted here a while back uses it: https://x.com/kotsoft/status/1792295331582869891
Here is my version of this concept, I tried to keep it as simple as possible: https://github.com/camel-cdr/cauldron/blob/main/cauldron/str...
An important property of C++ templates is that their instantiations are de-duplicated at link-time.
If you have two dependencies in C which both use the same generic container macros, and both instantiate vec(int) you'll fail-out with redefinition errors.
Not with C23.
Only if the macro only declares structs, and not any functions.
Sorry, I don't understand what macro you're referring to. Your post addressed "vec(int)", and suggested it will fail if repeated, which was true before C23 but not anymore. What functions are defined through vec(int)?
You're right.
The authoritative treatise of this topic is "C - Interfaces and Implementations" (known as CII) by David R. Hanson.
His code is here: https://github.com/drh/cii
Towards safe containers in C.
Here's a generic vector used in real world code: https://gitlab.com/nbdkit/nbdkit/-/blob/master/common/utils/...
Why do you need to pass the type to vec_push? Can't T be replaced by typeof(v->data[0]) ?
It could. I like things being spelled out explicitly. Otherwise I would probably not use C.
I doubt I'd want to use a dynamic array in C without a custom allocator.
You don't need an explicit allocator: you can create an empty object and vec_push() does the magic of (re)allocating the memory when needed.
Instead, what is missing is an automatic deallocator, one that's automatically called when the variable goes out of scope. For example:
This example doesn't use the same definition of vec as TFA, but something more similar to 'span' by the same author.And you could easily add a version with a custom allocator if you need it.
Why just copy the code by yourself?
What is an array? It's 3 variables: base, length and capacity. So why not decide that an array is just that. 3 variables of the right size and type.
Then you can make one like this: You'll also need to initialise and these destroy array "objects". Add you'll probably want to add an item to an array too. So you might use them like this: Array_Free is very simple, and Array_Grow is barely more complicated (however I wrote it off the cuff, so of course it could still be wrong). Both of these mainly exist just to keep stdlib.h out of the header. Array accesses and iteration and the like are just done in the traditional way. Even performs nicely with -O0. You might want to hide the size naming policy behind a macro: But it'd often be shorter just to type the name out. (On the other hand, if attempting to fully productize this, maybe it'd be nice to let the consumer customize the naming convention - and now you'd have the option. ARRAY_LENGTH would hide the details, you'd fix up ARRAY_ADD (etc.) to use it, and now you could have the size variable called arraySize (or whatever) and everything would fall into line.)For a full implementation you'll probably also need a way of generating a static array. (I mainly found myself needing this for test code, which uses globals for convenience; most arrays I create normally are locals, or parts of structs.)
You'll also need a parameters list for use in a function declaration or definition, and a macro that expands to all 3 variables.
Like then you might have a function that takes a pointer to an "array": And you call it like this: (I found this cropped up often enough that I needed the macro, but it was less common than I thought.)There's more you can do, but the above is the long and the short of it.
This might all look terrible - or perhaps it sort of looks OK, but you're just not sure that it would actually work - but I've used this in a prototype project and thought it worked out well. (I mainly do C++ nowadays, but I had a good stint of C before that. So even if I've got no taste, hopefully I've at least got a rough feel for what'll work out OK and what'll end up a disaster.)