(in case anyone reading doesn't know: FMA = Fused Multiply and Add, as in a\*b+c...

simonbyrne · on Nov 12, 2021

-ffp-contract=fast will enable FMA contraction, i.e. replacing a * b + c with fma(a,b,c). This is generally okay, but there are a few cases where it can cause problems: the canonical example is computing an expression of the form:

a * d - b * c

If a == b and c == d (and all are finite), then this should give 0 (which is true for strict IEEE 754 math), but if you replace it with an fma then you can get either a positive or negative value, depending on the order in which it was contracted. Issues like this pop up in complex multiplication, or applying the quadratic formula.

a_e_k · on Nov 13, 2021

C99 [0] and C++11 [1] both have fma() functions that let you directly request it without the need to mess around with sloppier FP contracts to infer it.

[0] https://en.cppreference.com/w/c/numeric/math/fma

[1] https://en.cppreference.com/w/cpp/numeric/math/fma

StefanKarpinski · on Nov 12, 2021

The way Julia handles this is worth noting:

- `fma(a, b, c)` is exact but may be slow: it uses intrinsics if available and falls back to a slow software emulation when they're not

- `muladd(a, b, c)` uses the fastest possibly inexact implementation of `a*b + c` available, which is FMA intrinsics if available or just doing separate `*` and `+` operations if they're not

That gives the user control over what they need—precision or speed. If you're writing code that needs the extra precision, use the `fma` function but if you just want to compute `a*b + c` as fast as possible, with or without extra precision, then use `muladd`.

adgjlsfhk1 · on Nov 12, 2021

Note that this is only true in theory. In practice, there are still some bugs here that will hopefully be fixed by julia 1.8

dahart · on Nov 12, 2021

> which suggests that there ARE ways of getting FMAs, without the sloppiness of fast-math.

There are ways, indeed, but they are pretty slow, it’s prioritizing accuracy over performance. And they’re still pretty tricky too. The most practical alternative for float FMA might be to use doubles, and for double precision FMA might be to bump to a 128 bit representation.

Here’s a paper on what it takes to do FMA emulation: https://www.lri.fr/~melquion/doc/08-tc.pdf

an1sotropy · on Nov 12, 2021

I remember a teacher who said (when I was a student) something like "if you care about precision use double". Now that I'm teaching, I force students to only use single-precision "float"s in their code, with the message that FP precision is a finite resource, and you don't learn how to manage any resource by increasing its supply. I think my students hate me.

dahart · on Nov 13, 2021

Knowing said teacher ;) I wonder if he’d still say the same thing now… It’s good practice to have to use single precision (or even half-precision!) now and then in order to be forced to deal with precision issues. Yes, use doubles if you really need them and aren’t trying to learn. But they’re often a lot more than 2x more expensive, and they might not be necessary at all. I’ve heard people who develop commercial rendering software for movies you’ve probably seen say out loud that you never need doubles, you just need to understand how to use floats.

couchand · on Nov 13, 2021

Perhaps you were in the same lecture as me, when I asked the lead developer on Big Hero 6 why they didn't just use doubles to solve their precision woes, and he informed me that they literally couldn't afford to use doubles at that scale.

dahart · on Nov 13, 2021

You know, that is actually ringing a bell, I think I might have indeed. Above I was thinking of someone else who works on a certain renderer made in New Zealand, but it’s true that many studios using doubles either sparingly or not at all. That might be getting even more true as GPUs blend into production…

a_e_k · on Nov 13, 2021

I worked on a certain hopping lamp renderer for more than 11 eleven years. I can confirm that probably 99+% of the floating point math in it was in single precision.

And to this day, typing out the 'f' suffix on single precision literals is muscle memory for me after having had Steven Parker for my Ph.D. advisor.

dahart · on Nov 13, 2021

Is everyone on this thread in Steve’s sphere?? I’m surprised (in a good way) to see so many familiar faces, and I guess a little surprised it’s in a thread about fast-math and not a thread about ray tracing. Okay on second thought it’s not very surprising.

Pixar has told me recently they still use doubles for some things in the CPU side of RenderMan, but I don’t know what for. There are some legitimate cases for it, and occasionally I dip my toes in hot water attempting to give advice to avoid doubles to people who know more than I do about how floats work and why they need doubles.

an1sotropy · on Nov 13, 2021

In the coding framework I created for students we have a "real" typedef that is either for float or double, and with C99's <tgmath.h> you just write "cos" once, and it will turn into cosf or cos depending on how the typedef is set, which allows controlled experimentation on how FP precision affects performance. But for submitted code the grading scripts grep for "double" and turn on various extra warnings to ensure that there are no implicit casts from double to float, in an effort to ensure that single precision is always being used (but I should probably scan the assembly).

Steve Parker was the first person to explain to me (while he was still a student, and I was a much younger one) the sometimes surprising cost of having image sizes be powers of two (because of cache conflicts). Small world.

johncowan · on Nov 13, 2021

Memory is a finite resource too, but would you force your students to run all their programs in 12K of memory, just because that was how much memory I had in the machine I learned to program on in 1972?

dahart · on Nov 14, 2021

Why not? It’s a professor’s absolute prerogative what lessons they’re offering, and working in low memory is a great lesson to learn. Kids these days are lazy and spoiled with their gigabytes of ram and terabytes of disk. In my day… wait, never mind, I’m starting to sound old, eh?

The flip side question to you is, why should students get away with more than they need? Memory and cycles are wasting energy. We need engineers to understand how to be deeply efficient, not careless with resources. Memory is generally much more expensive than compute cycles in terms of energy use. Yes, please, teach the students how to program with less memory.

Low memory programming is a fantastic exercise for learning modern GPU programming, since you still need to conserve individual bytes when you’re trying to run ten thousand threads at the same time. Or if you’re just into Arduinos.

Other lessons that are great to learn, but take time to appreciate are how to avoid using any dynamic memory, how to avoid recursion, how to avoid function pointers or any of today’s tricky constructs (closures/futures/monads/y-combinators/etc.) I’m of course referring to how some people (like NASA) think of safety critical code https://en.wikipedia.org/wiki/The_Power_of_10:_Rules_for_Dev... But I will add that many of these rules have applied to console video game programming for a long time. They’re easing up lately, but the concepts still apply since coding for a console is effectively embedded programming.

adgjlsfhk1 · on Nov 14, 2021

One reason is that one of the main resources students should learn to be efficient with is their time. There are definitely places where low memory use is important, but 95% of the time, the first place you should go is to use all the tricks you have to make writing code faster. Knowing how to be careful with precision is great, but so is just using Double (or even BigFloat) to get something that will work robustly without having to analyze as carefully.

dahart · on Nov 14, 2021

I agree students should learn how to be efficient with their time to learn the concepts they need to learn and pass the courses they choose to take in the school they’re choosing to attend, knowing the lessons are going to help them in their future careers. If a student thinks learning how floats work isn’t valuable, Computer Science might not be their thing.

It’s not the professor’s job to minimize the student’s effort, it’s the student’s job. The arrangement is the opposite of your implication. The professor’s job is to get lazy students to confront and learn these concepts and have them practice enough to understand the concepts. Having to analyze carefully is the whole point.

I also agree about the first place you should go is to use all the tricks you have to make writing code faster… at least in business. I’m not sure that applies in school. But either way, this is precisely why school should have students practice things like floating point analysis and low-memory programming until they are part of the students’ bag of tricks, until they can do high quality engineering fluently.

BTW, just using doubles in the name of not having to analyze is not particularly great outside of school either. That does not fly where I work now (on GPU ray tracing), and would not have been acceptable when I worked in CG films or video games either. You might be underestimating how expensive doubles are. If you don’t know whether you need doubles, you probably don’t. If you have a problem that needs more than floats, and accuracy is that important, then you’ll need to justify why doubles are enough, so in practice you’ll have to analyze carefully anyway.

Maybe you’re just teasing me with the BigFloat suggestion, I can’t tell. Since they might be orders of magnitude slower than floats, they’re rarely justifiable as a robustness replacement, especially by someone who hasn’t analyzed carefully. That might be a firing offense at some jobs if done more than once. :P

an1sotropy · on Nov 14, 2021

(setting aside the anachronistic snark) you may have noticed other comments here attesting to how managing 32 bits of FP precision endures today as a relevant skill.

shoo · on Nov 12, 2021

each time complaints are raised about single precision, you could deduct 1 bit from the allowance of bits per floating-point value for the next assignment

adgjlsfhk1 · on Nov 12, 2021

That's not what the parent meant. The parent meant that there are ways of generating fma instructions without using fast-math. Emulating an fma instruction is almost always a bad idea (I should know I've written fma-emulation before. It sucks)

dahart · on Nov 12, 2021

Oh, my mistake, thanks. Yes you can use FMA instructions without the fast-math compiler flag for sure. Emulation being a bad idea is the impression I got; I’m glad to hear the confirmation from experience.