Why does assembly use cryptic instruction and variable names? imulq could be int...

flohofwoe · on March 2, 2021

Because they're called mnemonics for a reason ;) They stop being cryptic after a few days of use, but you'd be stuck forever with overly long descriptive instruction names that repeat over and over and over again.

It's not about saving space, but about reducing visual noise and simplifying "visual pattern matching".

PS: Other CPUs do have somewhat friendlier assembly dialects (Motorola 68k comes to mind), but they all have short 3..5 letter mnemonics.

glhaynes · on March 2, 2021

But that's approximately as true for other programming languages. (EDIT: Good point elsewhere in the discussion where it’s pointed out that assembly is significantly more verbose than nearly any other language so it’s desirable to keep it short.)

It'd be interesting to see an editor mode that could jump back and forth between a mnemonic view and a more descriptive one, perhaps even with argument labels to get rid of src vs. dest confusion.

flohofwoe · on March 2, 2021

I think much of the "modern confusion" around assembly code comes from being mainly exposed to raw disassembled compiler output instead of "sane" assembly code written by humans.

Back when writing assembly code was more or less mainstream, "high-level" macro assemblers were used to wrap assembly snippets into fairly advanced macros which could lead to assembly code that was nearly on the same abstraction level as C code, you could define structs, named constants, write complex constant expressions and so on..

There were also dedicated assembly IDEs like ASM-ONE on the Amiga or Turbo Assembler on the PC, which made assembly programming quite comfortable (I guess the same can be achieved today with relatively little effort by writing a VSCode plugin).

skissane · on March 2, 2021

The good reason for this terseness is that assembly is inherently verbose. A single line of code in a high-level language can easily become a dozen in assembly. Your proposal would be taking an already verbose language and making it even more verbose.

“imulq” is not just less typing than “integer_multiply_64bit”, it is less reading too.

rectang · on March 2, 2021

thissentenceistersebutitisstillhardtoparse.

wruza · on March 2, 2021

If you are annoyed by something innocent like i-mul-q, then check SSE and AVX.

Btw, intel syntax doesn’t have these [bdq] suffixes (uses operand typing instead) and looks more clean for simple commands like mov, mul, add, and, etc. Personally, I don’t see any reason for them to be too wordy.

Upd: it is also easier to read when operands are aligned on the same column, because values move between registers form line to line.

rayiner · on March 2, 2021

There is a fixed set of instructions so anyone with a bit of experience will know what they mean. It’s different than being verbose with variable names that can be different in different programs.

unnah · on March 2, 2021

There are a few CPUs, for which the official assembly language is based on algebraic expressions: instructions look like "R1 = R2 * 4", or "R4 = R1 AND R3". See for example the SHARC instruction set: https://fayllar.org/sharc-instruction-set.html

Algebraic assembly is obviously a brilliant idea, yet so few assemblers have followed suit. I guess the tradition of cryptic mnemonics is too strong...

amelius · on March 2, 2021

I don't see how that would help much, as registers are a terrible way to name variables. I think you just want to avoid assembly as much as possible, in general.

cmrdporcupine · on March 2, 2021

Came here to comment and say the same thing. Assemblers already "know" what the opcode is doing (division, etc.) so it's not like it's a high-level-language intrusion to just write it out in a form which is readable in this manner.

rectang · on March 2, 2021

The cryptic instruction names in assembly drive me nuts, too.

`integer_multiply_64bit` is much easier and faster for my eye to parse than `imulq`, which is a blob I have to stop and tease apart. I would also be fine with `imul_64` or `int_mul_64`; over time I could probably get used to `imul_q`.

But `imulq` jammed all together is a nope. I still hate `strrchr` and `atof` and all those silly names which parse poorly from the C standard library, and I've been programming C for many years.

I figure you could create a 1:1 assembly transpiler which does two things:

* Maps a bunch of aliases like `integer_multiply_64bit` to their official names.

* Uses parentheses, argument order, and dereferencing operators according to conventions which are more in line with what people are accustomed to seeing in popular modern programming languages.

Register naming seems a bit tougher, though, and I haven't figured out an approach for deriving intuitive aliases. Suggestions?

> “it’s not so bad. if you are smart like me, it’s easy”?

There are going to be responses that read like this, well-intentioned or no, but we just have to press on regardless.

tralarpa · on March 2, 2021

The question is: why the effort to make assembly even more verbose?

People who write assembly (they still exist, I guess) will prefer "imulq" after a short time because it's much faster to type. Remember that a line like "a=2+b*f(x+1)" corresponds to 5 to 10 machine instructions.

People who have to read assembly don't really care because after a few minutes you know the mnemonics of the 20 most frequently used instructions (which constitute probably 99% of the code) anyway.

Computers that have to write or read a lot of assembly (some compilers do not directly generate binary machine code) are more efficient with a compact representation.

Okay, if we ignore the above cases, there are maybe five or six people in the world who might prefer "integer_signed_multiply_32bit" or "jump_relative_if_unsigned_less_or_equal".

tom_mellior · on March 2, 2021

> People who have to read assembly don't really care

I have to read assembly, and I do care. I'd much rather read something like imul.q than imulq. Assemblers could allow something like this by ignoring such periods or underscores, the same way that many modern programming languages allow you to write 1_000_000 as a synonym for 1000000.

Though a lenient assembler frontend wouldn't necessarily help me, since the assembly code I most often read is dumped by a disassembler I have no control over.

ksherlock · on March 2, 2021

Any assembler worth using has macros so you can integer_multiply_64bit to your hearts content.

MaxBarraclough · on March 5, 2021

It's about the readability of existing code, not the writeability of new code.

joeblau · on March 2, 2021

Each letter takes space and back in the day, there weren’t free flowing 8TB had drives and 128bit cpus flowing from the distribution centers of Amazon with free 2-day shipping. You had to save space every way you could: mnemonic commands, tabs over spaces, etc...

scoutt · on March 2, 2021

You can't apply that rule to all instructions.

For example, movslq becomes move_sign_extended_double_word.

ARM instruction QADD16 would be signed_saturating_parallel_halfword_wise_addition.

QASX: signed_saturating_parallel_add_and_subtract_halfwords _with_exchange.

OnlyOneCannolo · on March 2, 2021

Early assemblers and compilers had character limits for identifiers because it avoided extra memory allocation and pointer indirection.

oilbagz · on March 2, 2021

The answer: Decades and decades of tooling. You wanna go refactor all that code to make it more verbose?