One problem I was struggling with: What's the shortest, unambiguous decimal representation of a float?
For example, if you use single precision floats, then you need up to 9 digits of decimal precision to uniquely identify a float. So you would need to use a printf pattern like %.9g to print it. But then 0.1 would be output as 0.100000001, which is ugly. So a common approach is to round to 6 decimal digits: If you use %.6g, you are guaranteed that any decimal input up to 6 significant digits will be printed just like you stored it.
But you would no longer be round-trip safe when the number is the result of a calculation. This is important when you do exact comparisons between floats (eg. to check if data has changed).
So one idea I had was to try printing the float with 6 digits, then scanning it and seeing if it resulted in the same binary representation. If not, try using 7 digits, and so on, up to 9 digits. Then I would have the shortest decimal representation of a float.
This is my algorithm:
int out_length;
char buffer[32];
for (int prec = 6; prec<=9; prec++) {
out_length = sprintf(buffer, "%.*g", prec, floatValue);
if (prec == 9) {
break;
}
float checked_number;
sscanf(buffer, "%g", &checked_number);
if (checked_number == floatValue) {
break;
}
}
I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?
Your problem is practically important as it can be considered as the "canonical" string representation of given floating point number (after adding the closestness requirement). There are many efficient algorithms as a result, selected algorithms include Dragon4, Grisu3, Ryu and Dragonbox. See also Google's double-conversion library which implements first two of them.
This is totally pointless when serialization to and from an unsigned integer that's binary equivalent would be perfectly reversible and therefore wouldn't lose any information.
double f = 0.0/0.0; // might need some compiler flags to make this a soft error.
double g;
char s[9];
assert(sizeof double == sizeof uint64_t);
snprintf(s, 9, "%0" PRIu64, *(uint64_t *)(&f));
snscanf(s, 9, "%0" SCNu64, (uint64_t *)(&g));
If you want something shorter, apply some sort of heuristic that doesn't sacrifice faithful reproduction of the original representation, e.g., idempotency.
If the goal is just to reliably serialize and deserialize floating point numbers, use `%a` instead. It will print and read hexadecimal floating point literals.
My favorite FP Fun Fact is that float comparisons can (almost) use integer comparisons. To determine if a > b, reinterpret a and b as signed ints and just compare those like any old ints. It (almost) works!
The implication is that the next biggest float is (almost) always what you get when you reinterpret its bits as an integer, and add one. For example, start with the zero float: all bits zero. Add one using integer arithmetic. In int-speak it's just one; in float-speak it's a tiny-mantissa denormal. But that's the next float; and `nextafter` is implemented using integer arithmetic.
Learning that floats are ordered according to integer comparisons makes it feel way more natural. But of course there's the usual asterisks: this fails with NaNs, infinities, and negative zero. We get a few nice things, but only a few.
This isn't accurate. It's true for positive numbers, and when comparing a positive to a negative, but false for comparisons between negative numbers. Standard floating point uses sign-magnitude representation, while signed integers these days use 2s-complement. On negative numbers, comparisons are reversed between these two encodings. Incrementing a float as if it were an integer will, in ordinary circumstances, get you the next one larger in magnitude, but with the same sign -- i.e., you go up for positives but down for negatives. Whereas with signed integers, you always go up except when there's an overflow into the sign bit.
A more correct version of the statement would be that comparison is the same as on sign-magnitude integers. Of course, this still has the caveats you already mentioned.
One of the unambiguously nice things about Posits (unlike floats) is that they use a 2s compliment scheme which makes it actually true for all values that they sort like integers.
This came during my OMSCS Game AI course as an example of the dangers of using floats to represent game object location. If you get further from the origin point (or another game element you are referencing from) you are losing precision as the float needs to use more of the significand to store the larger value.
I love how that became part of the "mythology" of Minecraft as the "Far Lands", where travelling far from the world origin made terrain generation and physics break down, subtly at first, and less so as you kept going.
It's like the paranormal trope of an expedition encountering things being disconcertingly "off" at first, and then eventually the laws of nature start breaking down as well. All because of float precision.
If you have a large set of lets say floats in the range between 0 and 1 and you add them up, there is the straightforward way to do it and there is a way to pair them all up, add the pairs, and repeat that process until you have the final result. You will see that this more elaborate scheme actually gets a way more accurate result vs. simply adding them up. This is huge, because there is a lot of processing done with floats where this inherent issue is entirely disregarded (and has an impact on the final result).
Donald Knuth has all of that covered in one of his "The Art of Computer Programming" books, with estimations about the error introduced, some basic facts like a + (b + c) != (a + b) + c with floats and similar things.
And believe it or not, there have been real world issues coming out of that corner. I remember Patriot missile systems requiring a restart because they did time accounting with floats and one part of the software didn't handle the corrections for it properly, resulting in the missiles going more and more off-target the longer the system was running. So they had to restart them every 24 hours or so to keep that within certain limits until the issue got fixed (and the systems updated). There have been massive constructions breaking apart due to float issues (like material thicknesses calculated too thin), etc.
There are some really simple examples of that. Just try adding 1 to a half precision float in a loop. The accumulator will stop increasing at a mere 2048, since 2049 is not representable it rounds 2048 + 1 back down to 2048.
Define boundary conditions -- how much precision do you need? Then you can compute the min/max distances. If the "world" needs to be larger, then prepare to divide it into sectors, and have separate global/local coordinates (e.g. No Man's Sky works this way).
Really though, games are theater tech, not science. Double-Precision will be more than enough for anything but the most exotic use-case.
The most important thing is just to remember not to add very-big and very-small numbers together.
You mean ±9,007,199,254,740,993.0 for 64-bit floats. :-)
For other curious readers, these are one beyond the largest integer values that can be represented accurately. In other words, the next representable value away from zero after ±16,777,216.0 is ±16,777,218.0 in 32 bits -- the value ±16,777,217.0 cannot be represented, and will be rounded somewhat hand-wavingly (usually towards zero).
Precision rounding is one of those things that people often overlook.
For 64-bit, 9007199254740991 is also known as `Number.MAX_SAFE_INTEGER` in JavaScript. Note that this value is not even; the next integer, ..0992 is still safe to represent by its own, but it can't be distinguished from a definitely unsafe integer ..0993 rounded to ..0992.
Good job on showing the hexadecimal version. I recently had a challenging C bug where I was using printf("%a") to show the underlying representation, which uses hexadecimal, and it was a little confusing at first. This site would have been helpful.
Far too superficial because it lacks explanation of non-normal conditions (denormals, zeroes, infinities, sNaNs, and qNaNs) and the complete mapping of values. This isn't properly educational.
> THE .EXPOSED TLD This TLD is attractive and useful to end-users as it better facilitates search, self-expression, information sharing and the provision of legitimate goods and services. Along with the other TLDs in the Donuts family, this TLD will provide Internet users with opportunities for online identities and expression that do not currently exist. In doing so, the TLD will introduce significant consumer choice and competition to the Internet namespace – the very purpose of ICANN’s new TLD program.
> .EXPOSED will be utilized by registrants seeking new avenues for expression on the Internet. There is a deep history of progressivity and societal advancement resulting from the online free expressions of criticism. Individuals and groups will register names in .EXPOSED when interested in editorializing, providing input, revealing new facts or views, interacting with other communities, and publishing commentary.
So windows.sucks and linux.sucks are available and 2000 USD/year, emacs.sucks is 200 USD/year and vi.sucks is already registered (but no website unfortunately)!
On the other hand linux.rocks and windows.rocks are taken (no website), vi.rocks is 200 USD/year and emacs.rocks is just 14 USD/year.
microsoft.sucks redirects to microsoft.com, but microsoft.rocks is just taken :thinking:
Pretty sure there's a domain monitoring service for similarly or something along these lines that buys up domains like these to prevent usage.
On that note, I've been trying to see if GoDaddy will buy a domain and resell for higher price by searching for some plausibly nice domain names on their site. They haven't took the "bait" yet.
The real shocking fact about floating point is that they are even used at all.
It's throwing out of the window the most basic property operations on number should have : "associativity" and all that for a gain in dynamic range which is not necessary most of the time.
The associativity we expect to hold is (a+b)+c == a+(b+c) and (ab)c == a(bc) and these don't hold for floats even though most math formulas and compiler optimizations rely on these to hold. It's a sad miracle that everything somehow still works out OK most of the time.
You lose determinism most of the time with respect to compiler optimizations, and platform reproducibility if processor don't exactly respect IEE-754 (or is it IEE-854).
The real problem comes when you want to use parallelism. With things like atomic operations and multiple processor doing things out of order, you lose determinism and reproducibility, or add a need for synchronisation or casting operations everywhere.
Even more problematic, is that because number operations are used so often, they are set in "stone", and are implemented at the hardware level. And they use much more transistor because they are more complex than integer arithmetic.
Real programmers don't use floating points, only sloppy lazy ones do.
Real programmers use fixed point representation and make sure the bounds don't overflow/underflow unexpectedly.
Let's ban all hardware floating-point implementation : Just imagine future alien archeologists having a laugh at us when they look at our chips and think "no wonder they were doomed they can't even do a+b right : its foundations were built on sand".
The limitation is the minimal quantization level. But for a 3d engine let's say your base increment is nanometers. Then you set your maximum dimension let's say 1000km. You only have to be able to represent number up 10^20 so 64-bit fixed point number is good enough.
Do everything in 128-bit fixed point numbers, and float are no more problem for anything scientific.
No 3D engine in the real world uses 64-bit coordinates. With 32-bit coordinates, you could not hope to represent things in nanometers (you'd be stuck in a cube roughly 4x4x4 meters). Realistically you might choose millimeters, but that would certainly start to produce visible artifacts.
For games and most simulation, the "soft failure" of gradual precision loss is much more desirable than the wildly wrong effects you would get from fixed-point overflow.
I hope nobody listens to lunatics like you. Fixed point sucks. Who on earth wants to analyze every single algorithm for its numerical stability? If you work with FPGAs, then converting a known to be working algorithm to fixed point is one of the most time consuming things you can do and it eats DSP slices like crazy. Every square or square root operation will cause a shiver to run down your spine because of the lack of dynamic range.
>There is one important downside though, which relates to the fact that designing fixed-point algorithms is a significantly more complicated task as compared to similar floating-point based algorithms. This fact essentially has two major consequences:
>1) The designer must have an extended mathematical knowledge about the numerical characteristics of the algorithms, and
>2) The development time is in some cases longer than for equivalent floating-point systems.
This remains one of the best explanations on the topic: https://fabiensanglard.net/floating_point_visually_explained... Saw this when I just started using HN and such posts only inspired me to stick to it: https://news.ycombinator.com/item?id=29368529
I've never seen this topic so well explained, thank you for sharing!
One problem I was struggling with: What's the shortest, unambiguous decimal representation of a float?
For example, if you use single precision floats, then you need up to 9 digits of decimal precision to uniquely identify a float. So you would need to use a printf pattern like %.9g to print it. But then 0.1 would be output as 0.100000001, which is ugly. So a common approach is to round to 6 decimal digits: If you use %.6g, you are guaranteed that any decimal input up to 6 significant digits will be printed just like you stored it.
But you would no longer be round-trip safe when the number is the result of a calculation. This is important when you do exact comparisons between floats (eg. to check if data has changed).
So one idea I had was to try printing the float with 6 digits, then scanning it and seeing if it resulted in the same binary representation. If not, try using 7 digits, and so on, up to 9 digits. Then I would have the shortest decimal representation of a float.
This is my algorithm:
I wonder if there is a more efficient way to determine that shortest representation rather than running printf/scanf in a loop?Your problem is practically important as it can be considered as the "canonical" string representation of given floating point number (after adding the closestness requirement). There are many efficient algorithms as a result, selected algorithms include Dragon4, Grisu3, Ryu and Dragonbox. See also Google's double-conversion library which implements first two of them.
std::numeric_limits<float>::max_digits10
https://en.cppreference.com/w/cpp/types/numeric_limits/max_d...
No, just no. And, never use sscanf().
This is totally pointless when serialization to and from an unsigned integer that's binary equivalent would be perfectly reversible and therefore wouldn't lose any information.
If you want something shorter, apply some sort of heuristic that doesn't sacrifice faithful reproduction of the original representation, e.g., idempotency.Off-topic, For this kind of pointer casting, shouldn't you be using a union? I believe this is undefined behaviour, as written.
If the goal is just to reliably serialize and deserialize floating point numbers, use `%a` instead. It will print and read hexadecimal floating point literals.
This is cool, looks visually a lot like a CIDR range calculator [1] I built a few years ago to help me understand network ranges better.
These types of visualizations are super useful.
[1] - https://cidr.xyz
My favorite FP Fun Fact is that float comparisons can (almost) use integer comparisons. To determine if a > b, reinterpret a and b as signed ints and just compare those like any old ints. It (almost) works!
The implication is that the next biggest float is (almost) always what you get when you reinterpret its bits as an integer, and add one. For example, start with the zero float: all bits zero. Add one using integer arithmetic. In int-speak it's just one; in float-speak it's a tiny-mantissa denormal. But that's the next float; and `nextafter` is implemented using integer arithmetic.
Learning that floats are ordered according to integer comparisons makes it feel way more natural. But of course there's the usual asterisks: this fails with NaNs, infinities, and negative zero. We get a few nice things, but only a few.
This isn't accurate. It's true for positive numbers, and when comparing a positive to a negative, but false for comparisons between negative numbers. Standard floating point uses sign-magnitude representation, while signed integers these days use 2s-complement. On negative numbers, comparisons are reversed between these two encodings. Incrementing a float as if it were an integer will, in ordinary circumstances, get you the next one larger in magnitude, but with the same sign -- i.e., you go up for positives but down for negatives. Whereas with signed integers, you always go up except when there's an overflow into the sign bit.
A more correct version of the statement would be that comparison is the same as on sign-magnitude integers. Of course, this still has the caveats you already mentioned.
You're right, thank you for the correction.
One of the unambiguously nice things about Posits (unlike floats) is that they use a 2s compliment scheme which makes it actually true for all values that they sort like integers.
I was going to say "well, you've still got NaR", but apparently that's been defined to sort less than all other posits? Huh, OK.
Thanks for HexFiend man! I use several times a week.
This came during my OMSCS Game AI course as an example of the dangers of using floats to represent game object location. If you get further from the origin point (or another game element you are referencing from) you are losing precision as the float needs to use more of the significand to store the larger value.
I love how that became part of the "mythology" of Minecraft as the "Far Lands", where travelling far from the world origin made terrain generation and physics break down, subtly at first, and less so as you kept going.
It's like the paranormal trope of an expedition encountering things being disconcertingly "off" at first, and then eventually the laws of nature start breaking down as well. All because of float precision.
If you have a large set of lets say floats in the range between 0 and 1 and you add them up, there is the straightforward way to do it and there is a way to pair them all up, add the pairs, and repeat that process until you have the final result. You will see that this more elaborate scheme actually gets a way more accurate result vs. simply adding them up. This is huge, because there is a lot of processing done with floats where this inherent issue is entirely disregarded (and has an impact on the final result).
Donald Knuth has all of that covered in one of his "The Art of Computer Programming" books, with estimations about the error introduced, some basic facts like a + (b + c) != (a + b) + c with floats and similar things.
And believe it or not, there have been real world issues coming out of that corner. I remember Patriot missile systems requiring a restart because they did time accounting with floats and one part of the software didn't handle the corrections for it properly, resulting in the missiles going more and more off-target the longer the system was running. So they had to restart them every 24 hours or so to keep that within certain limits until the issue got fixed (and the systems updated). There have been massive constructions breaking apart due to float issues (like material thicknesses calculated too thin), etc.
There are some really simple examples of that. Just try adding 1 to a half precision float in a loop. The accumulator will stop increasing at a mere 2048, since 2049 is not representable it rounds 2048 + 1 back down to 2048.
Define boundary conditions -- how much precision do you need? Then you can compute the min/max distances. If the "world" needs to be larger, then prepare to divide it into sectors, and have separate global/local coordinates (e.g. No Man's Sky works this way).
Really though, games are theater tech, not science. Double-Precision will be more than enough for anything but the most exotic use-case.
The most important thing is just to remember not to add very-big and very-small numbers together.
The award for "most fun integer" in 32 bit float is 16777217 (and 9007199254740992 for 64bit).
It's sometimes fun to have these kinds of edge cases up your sleeve when testing things.
You mean ±9,007,199,254,740,993.0 for 64-bit floats. :-)
For other curious readers, these are one beyond the largest integer values that can be represented accurately. In other words, the next representable value away from zero after ±16,777,216.0 is ±16,777,218.0 in 32 bits -- the value ±16,777,217.0 cannot be represented, and will be rounded somewhat hand-wavingly (usually towards zero).
Precision rounding is one of those things that people often overlook.
For 64-bit, 9007199254740991 is also known as `Number.MAX_SAFE_INTEGER` in JavaScript. Note that this value is not even; the next integer, ..0992 is still safe to represent by its own, but it can't be distinguished from a definitely unsafe integer ..0993 rounded to ..0992.
Good job on showing the hexadecimal version. I recently had a challenging C bug where I was using printf("%a") to show the underlying representation, which uses hexadecimal, and it was a little confusing at first. This site would have been helpful.
Previously I was using this site to explore floating point representations:
https://www.h-schmidt.net/FloatConverter/IEEE754.html
This one has the extra feature of showing the conversion error, but it doesn't support double precision.
fantastic explanation and very nice to see both the decimal and binary representations together
https://raku.org defaults to Rationals (Rats) and provides FatRat for arbitrary precision
otherwise even relatively normal calculations (eg what’s the mass of electron in quectogram) fail
Far too superficial because it lacks explanation of non-normal conditions (denormals, zeroes, infinities, sNaNs, and qNaNs) and the complete mapping of values. This isn't properly educational.
Why tf is there a .exposed TLD?
> THE .EXPOSED TLD This TLD is attractive and useful to end-users as it better facilitates search, self-expression, information sharing and the provision of legitimate goods and services. Along with the other TLDs in the Donuts family, this TLD will provide Internet users with opportunities for online identities and expression that do not currently exist. In doing so, the TLD will introduce significant consumer choice and competition to the Internet namespace – the very purpose of ICANN’s new TLD program.
> .EXPOSED will be utilized by registrants seeking new avenues for expression on the Internet. There is a deep history of progressivity and societal advancement resulting from the online free expressions of criticism. Individuals and groups will register names in .EXPOSED when interested in editorializing, providing input, revealing new facts or views, interacting with other communities, and publishing commentary.
-- https://icannwiki.org/.exposed
For the same reason there's a .sucks TLD. There's a market for it.
So windows.sucks and linux.sucks are available and 2000 USD/year, emacs.sucks is 200 USD/year and vi.sucks is already registered (but no website unfortunately)!
On the other hand linux.rocks and windows.rocks are taken (no website), vi.rocks is 200 USD/year and emacs.rocks is just 14 USD/year.
microsoft.sucks redirects to microsoft.com, but microsoft.rocks is just taken :thinking:
Pretty sure there's a domain monitoring service for similarly or something along these lines that buys up domains like these to prevent usage.
On that note, I've been trying to see if GoDaddy will buy a domain and resell for higher price by searching for some plausibly nice domain names on their site. They haven't took the "bait" yet.
MarkMonitor
not obvious you need to press enter to change the value
For a .exposed domain, it's not really shocking.
The real shocking fact about floating point is that they are even used at all.
It's throwing out of the window the most basic property operations on number should have : "associativity" and all that for a gain in dynamic range which is not necessary most of the time.
The associativity we expect to hold is (a+b)+c == a+(b+c) and (ab)c == a(bc) and these don't hold for floats even though most math formulas and compiler optimizations rely on these to hold. It's a sad miracle that everything somehow still works out OK most of the time.
You lose determinism most of the time with respect to compiler optimizations, and platform reproducibility if processor don't exactly respect IEE-754 (or is it IEE-854).
The real problem comes when you want to use parallelism. With things like atomic operations and multiple processor doing things out of order, you lose determinism and reproducibility, or add a need for synchronisation or casting operations everywhere.
Even more problematic, is that because number operations are used so often, they are set in "stone", and are implemented at the hardware level. And they use much more transistor because they are more complex than integer arithmetic.
Real programmers don't use floating points, only sloppy lazy ones do.
Real programmers use fixed point representation and make sure the bounds don't overflow/underflow unexpectedly.
Let's ban all hardware floating-point implementation : Just imagine future alien archeologists having a laugh at us when they look at our chips and think "no wonder they were doomed they can't even do a+b right : its foundations were built on sand".
Real programmers use an abacus and never take roots or calculate logarithms/powers.
Amazing confidence on display here.
try rotating & shading triangles in 3D with integer math … FPUs have earned their place
https://en.wikipedia.org/wiki/Fixed-point_arithmetic : allows you to have some thing which is integer math but works like floats. It's integer operations and bit shifts so really fast.
The limitation is the minimal quantization level. But for a 3d engine let's say your base increment is nanometers. Then you set your maximum dimension let's say 1000km. You only have to be able to represent number up 10^20 so 64-bit fixed point number is good enough.
Do everything in 128-bit fixed point numbers, and float are no more problem for anything scientific.
for general computation, I think Rationals (https://raku.org) are a good choice - and Raku has big Int as standard also
nevertheless, us Weitek guys made 32-bit FPUs to do 3D graphics (pipeline, 1 instruction per clock) P754, IBM, DEC standards to power SGI, Sun etc
this is still the best format to get graphics throughout per transistor (although the architectures have got a bit more parallel)
then 64-bit became popular for CAD (32-bit means the wallpaper in your aircraft carrier might sometimes be under the surface of your wall)
No 3D engine in the real world uses 64-bit coordinates. With 32-bit coordinates, you could not hope to represent things in nanometers (you'd be stuck in a cube roughly 4x4x4 meters). Realistically you might choose millimeters, but that would certainly start to produce visible artifacts.
For games and most simulation, the "soft failure" of gradual precision loss is much more desirable than the wildly wrong effects you would get from fixed-point overflow.
In modern systems float ops are often as fast as corresponding integer ops, so fixed point numbers are not necessarily faster now.
I hope nobody listens to lunatics like you. Fixed point sucks. Who on earth wants to analyze every single algorithm for its numerical stability? If you work with FPGAs, then converting a known to be working algorithm to fixed point is one of the most time consuming things you can do and it eats DSP slices like crazy. Every square or square root operation will cause a shiver to run down your spine because of the lack of dynamic range.
Edit:
For those who want more context:
https://vbn.aau.dk/ws/portalfiles/portal/494103077/WPMC22_Ko...
Here is a big fat warning:
>There is one important downside though, which relates to the fact that designing fixed-point algorithms is a significantly more complicated task as compared to similar floating-point based algorithms. This fact essentially has two major consequences: >1) The designer must have an extended mathematical knowledge about the numerical characteristics of the algorithms, and >2) The development time is in some cases longer than for equivalent floating-point systems.
Quantization is also precision loss.