Machine code is generally much safer than C - e.g. it usually lacks undefined be...

IshKebab · 2025-04-12T13:08:48 1744463328

Not true on RISC-V. That's full of undefined behaviour.

But anyway this is kind of off-topic. I think OutOfHere was imagining that this somehow skips the type checking and borrow checking steps which of course it doesn't.

dzaima · 2025-04-12T15:37:39 1744472259

What's all that undefined behavior? Closest I can think of is executing unsupported instructions, but you have to mess up pretty hard for that to happen, and you're not gonna get predictable behavior here anyway (and sane hardware will trap of course; and executing random memory as instructions is effectively UB on any architecture).

(there's a good bit of unpredictable behavior (RVV tail-agnostic elements, specific vsetvl result), but unpredictable behavior includes any multithreading in any architecture and even Rust (among other languages))

IshKebab · 2025-04-13T08:20:22 1744532422

Accessing non-existent CSRs is another big one, which also means you can't probe for features.

There's loads more though. Just search for "reserved" in the ISA manual.

Of course a Rust to C compiler is not going to hit any of these. I was just pointing them out.

dzaima · 2025-04-13T12:57:29 1744549049

Fair point on CSRs, though I'd count that as a subset of unsupported/not-yet-specified instructions; pretty sure all of the "reserved"s in the spec are effectively not-yet-defined instructions too, which'll have equivalents in any architecture with encoding space left for future extensions, not at all unique to RISC-V.

But yeah, no try-running-potentially-unsupported-things-to-discover-what-is-supported; essentially a necessary property for an open ISA as there's nothing preventing a vendor from adding random custom garbage in encoding space they don't use.

IshKebab · 2025-04-13T19:47:04 1744573624

Yeah I guess the difference is once an instruction/CSR has been defined in x86 or ARM the only two options are a) it doesn't exist, and b) it's that instruction.

In RISC-V it can be anything even after it has been defined.

Actually... I say that, but they do actually reserve spaces in the CSR and opcode maps specifically for custom extensions so in theory they could say it's only undefined behaviour in those spaces and then you would be able to probe in the standard spaces. Maybe.

I think they just don't want people probing though, even though IMO it's the most expedient solution most of the time. Otherwise you have to go via an OS syscall, through the firmware and ACPI tables, device tree or mconfigptr (when they eventually define that).

dzaima · 2025-04-13T22:48:49 1744584529

On getting supported extension status - there's a C API spec that could potentially become an option for an OS-agnostic way: https://github.com/riscv-non-isa/riscv-c-api-doc/blob/main/s.... libc already will want to call whatever OS thing to determine what extensions it can use for memcpy etc, so taking the results from libc is "free".

cv5005 · 2025-04-12T10:08:47 1744452527

Not any different from C - a given C compiler + platform will behave completetly deterministically and you can test the output and see what it does, regardless of UB or not.

lmm · 2025-04-12T10:54:43 1744455283

> a given C compiler + platform will behave completetly deterministically and you can test the output and see what it does, regardless of UB or not.

Sure[1], but that doesn't mean it's safe to publish that C code - the next version of that same compiler on that same platform might do something very different. With machine code (especially x86, with its very friendly memory model) that's unlikely.

(There are cases like unused instructions becoming used in never revisions of a processor - but you wouldn't be using those unused instructions in the first place. Whereas it's extremely common to have C code that looks like it's doing something useful, and is doing that useful thing when compiled with a particular compiler, but is nevertheless undefined behaviour that will do something different in a future version)

[1] Build nondeterminism does exist, but it's not my main concern

baq · 2025-04-12T11:01:15 1744455675

CPUs get microcode updates all the time, too. Nothing is safe from bitrot unless you’re dedicated to 100% reproducible builds and build on the exact same box you’re running on. (…I’m not, for the record - but the more, the merrier.)

lmm · 2025-04-12T11:19:47 1744456787

> CPUs get microcode updates all the time, too.

To fix bugs, sure. They don't generally get updates that contain new optimizations that radically break existing machine code, justifying this by saying that the existing code violated some spec.

carlmr · 2025-04-12T11:51:41 1744458701

>To fix bugs, sure.

Maybe your program worked due to the bug they fixed.

lmm · 2025-04-12T14:21:52 1744467712

Extremely unlikely. CPU bugs generally halt the CPU or fail to write the result or something like that. The Pentium FDIV bug where it would give a plausible but wrong result was a once in a lifetime thing.

baq · 2025-04-12T19:14:59 1744485299

Spectre and Meltdown exploits stopped working, too. Some of them on some CPUs, anyway.

lmm · 2025-04-13T07:15:01 1744528501

Sure. But those were obviously exploits from the start. You wouldn't write code like that accidentally.

ryao · 2025-04-13T01:58:01 1744509481

Do a web search for rdrand and systemd.

lmm · 2025-04-13T07:17:10 1744528630

> Do a web search for rdrand and systemd.

RDRAND always returning all-FF is exactly the kind of thing that's an obvious bug, not a plausible-but-wrong result.

ryao · 2025-04-14T02:21:47 1744597307

The other guy said "Maybe your program worked due to the bug they fixed.". The RDRAND fix achieved exactly that.

uecker · 2025-04-12T11:04:56 1744455896

It is not terribly hard to generate C code that does not use undefined behavior.

lmm · 2025-04-12T11:17:59 1744456679

Maybe. But when carefully investigated, the overwhelming majority of C code does in fact use undefined behaviour, and there is no practical way to verify that any given code doesn't.

uecker · 2025-04-13T20:33:37 1744576417

It is easy to create code where this can be verified. It is difficult to verify for arbitrary code.

uecker · 2025-04-12T10:10:48 1744452648