How does that work? The binary format embeds variants of the same program?

pjmlp · on July 24, 2023

Yes, here is an example how it works for GCC.

https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Function-Multi...

brucethemoose2 · on July 24, 2023

On linux distros, the package manager downloads different binaries based on your CPU. Skylake would be x86-64-v3, Zen 4 would be x86-64-v4, for example.

And there are different schemes for multiple architectures in the same program, like hwcaps.

kergonath · on July 24, 2023

Isn’t this going to get very unmanageable very soon? Intel seems to add extensions every other year or so.

brucethemoose2 · on July 24, 2023

The extensions can be kinda broken down into 4 levels. Basically ancient, old (SSE 4.2), reasonably new (AVX2, Haswell/Zen 1 and up), and baseline AVX512.

https://developers.redhat.com/blog/2021/01/05/building-red-h...

There is discussion of a fifth level. Someone in the Intel Clear Linux IRC said a fifth level wasn't "worth it" for Sapphire Rapids because most of the new AVX512 extensions were not autovectorized by compilers, but that a new level would be needed in the future. Perhaps they were thinking of APX, but couldn't disclose it.

jcranmer · on July 24, 2023

AVX10/APX does sound like a good baseline for v5.

Bulat_Ziganshin · on July 25, 2023

except that it doesn't support full AVX-512, making the whole idea of backward compatibility between these levels meaningless. "It's Intel!!!"

brucethemoose2 · on July 26, 2023

Well that's an even better justification, as a x86-64-v5 level would be needed for the newer CPUs.

We can throw away any hope of v4 being a standard baseline.

jiggawatts · on July 24, 2023

It’s easy to fully automate and storage is relatively cheap these days.

xxpor · on July 24, 2023

I'd think the issue would be more build infra, every new variant means you have to build the world again

jiggawatts · on July 24, 2023

Again, compute is surprisingly cheap these days.

Work out what it would cost to compile - say - a terabyte of C code at typical cloud spot prices.

A large VM with 128 cores can compile the 100 MB Linux kernel source tree in about 30 seconds. So… 200 MB/minute or 12 GB/hour. This would take 80 hours for a terabyte.

A 120 core AMD server is about 50c per hour on Azure (Linux spot pricing).

So… about $40 to compile an entire distro. Not exactly breaking the bank.

xxpor · on July 25, 2023

you'd have to separate out compiling and linking at a bare minimum to get even a semi accurate model. plus a lot of userspace is c++, which is much, much slower.

kergonath · on July 24, 2023

Yes. Also, test it.

jiggawatts · on July 24, 2023

That can also be largely automated.

brucethemoose2 · on July 24, 2023

LTO does rarely break things in hard to detect ways, but I have never heard of a -march x86 compilation bug.

slt2021 · on July 24, 2023

in the end it will be like any other modern hardware appliance:

the hardware is the same design for cost saving purposes, but different features are unlocked for $$$ by a software license key.

You want AVX-512? pay up and unlock feature in your CPU and you can now use the feature. This could also enable pay-as-you-go license scheme for CPUs, creating recurring revenue for Intel

from the hardware perspective - the same silicon, but different features sold separately