The more interesting thing to me than SGX is that all of TSX-NI is deprecated? Not just HLE, but RTM too? Meaning there'll be no more software transactional memory at all?! Anyone able to shed any light on why they're doing this? Is it just security or is it just not worth it even regardless of that? Is there a chance they'll reintroduce it in some form, perhaps in other processor series?
TSX has always been kind of useless. They got repeated feedback that they had to document what would cause deterministic spurious rollbacks (usually bad cache interactions within the same transaction causing lines to spill), so that applications could be written without the slow (non hardware assist) fallback path.
Oh well. Maybe some other company will build hardware transactions in a way that takes developer feedback into account.
Crosspoint DIMMs had an analogous problem. There was no way to pin a cache line for updates, so that it couldn’t spill to persistent storage mid page update. Some cool workarounds came out of the research community, but, in the end, that technology is dead on arrival too.
Looks like it causes memory ordering issues [1] on the affected processors. You can turn the capability back on with a flag, but Intel warns this is "not for production use"
It is a fundamentally difficult problem to solve. On top of that, the costs grow exponentially with core count and processor scaling compounds.
If you look back at so many of the intel architectural extensions it is so hard NOT to draw the conclusion that Intel has a POOR understanding of where problems should be solved. They constantly try to solve problems which should be solved in software with hardware. This is why they are now an architectural generation behind AMD and why they are most likely going to lose the CPU market to Apple imitators.
Unless AMD can maintain its success despite the talent bleed it will start to face — the days of x86 are limited.
Why is Intel wrong for trying to solve software problems in hardware, while ARM/CHERI are celebrated for being ahead in trying to solve memory safety in hardware using pointer upper-bit tagging and now CHERI's provenance metadata (performing fine-grained bounds checks on every single pointer dereference, rather than just following instructions without performing extra work)?
I think it's less a criticism of trying to solve problems in hardware vs the _kinds_ of problems they're focusing on.
While CHERI is, from a pure theory standpoint, something that is perfectly avoidable with proper programs (i.e. memory unsafety _is_ efficiently avoidable in software), we ended up needing it we made the wrong choice in software too long ago to turn back (nobody is rewriting the Linux kernel anytime soon). In this way, CHERI is a good optimization because it does something we cannot _practically_ solve in software. ARM-PA plays a similar role in that hardware CFI can be made irrelevant by a) not having memory safety issues b) software CFI, but neither have really worked out in practice and it's cheap and efficient in hardware, so it's a worthwhile tradeoff.
Stuff like Intel TSX and ARM TME are sort of at the other end. Transactional memory is _super_ cool and it's been a common thread throughout architecture papers for the past twenty years. The thing is, we've never had transactional memory in commodity hardware (and nobody buried their heads in the sand about not having it like we did with memory safety) so all our software found decent work arounds eventually. TSX/TME does do what it says, the issue is just that it's not quite good enough when compared to existing software techniques and so the actual added value (cache noise and the resulting spurious aborts included) made it a less good deal. When adding the cost to both update software and the likely strongly polynomial (?) hardware cost of transaction support as core count grows (this is why ARM's Exclusive Monitor performs SO bad on systems with 64+ cores and why they added new atomics just to avoid the monitor), it just doesn't work anymore.
Not a hardware architect, but my spitballing as a compiler writer:
Transactional memory is one of those things that constantly sounds like it's a good idea in theory, but it doesn't live up to those ideas in practice. One of the issues with hardware transactional memory is the challenge of spurious aborts or otherwise running up against hardware limits as to how big transactions can be. Another (as far as I'm aware) unsolved issue is defining a memory model that supports both transactional memory and the modern C/C++ memory model. I also don't think there's a lot of practical benefits--you can make it quite far with existing parallelism libraries that expose something vaguely task-based or fork-join, such as OpenMP or STL's parallel executors, and there's relatively little need for algorithms where you don't necessarily know if there's going to be contention or not.
Having used GHC Haskell's software transactional memory features to build a concurrent service with caching, I do think there are significant practical benifits. To program threaded code without needing to worry about global reasoning of fine-grained locking is a godsend. Transactional memory solves the issue of writing composable code that works on shared mutable state. Task libraries do not solve this problem.
Of course retrofitting such a feature into C/C++ in a satisfactory manner might not be possible. But the practical benifits are real.
There's more to life than C++. It was implemented in OpenJDK and seemed to get some good results there in micro-benchmarks at least. The nice thing is, it was a transparent upgrade. Synchronized blocks just became TSX transactions, unless they aborted too much, in which case they went back to being ordinary lock based critical sections.
The bigger problem was actually that in many cases where the optimization could be applied there was always a conflict, usually because of updating some sort of statistical counters. So a lot of attempts to optimize this way would de-opt. It could be fixed by changing the way stats were aggregated to be more tx-friendly but few developers ever did it.
"Anyone able to shed any light on why they're doing this?"
Same question here, As usual, no reasons are given (must admit I'm getting sick of these corporations failing to explain their actions). Right, I'm old enough to remember when companies used to issue detailed revisions sheets/documentation wherein they described what the changes were and the reason for them just as a matter of course.
(Even after being an unwilling member of the users' mushroom club for some 20/30 years, I'm still having difficulty adjusting.)
TSX has never worked, and it seems a little over-ambitious in general. SGX has a few fatal flaws in its security model, and we have moved on to newer models there. The principle of tech debt applies to hardware as well as software, and I hope they come up with a new idea for transactional memory in future cores that actually works, and isn't so ambitious.