Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My point is that you cannot get design much faster in terms of clock frequency by just pipelining. Pipeline unrolls state machine and overlaps different executions of the state machines. But the bottleneck, which is addition, is there in all designs and you need additional effort to break it.

(also MIPS has [i]ntelocked [p]ipeline [s]tages - that "IPS" in MIPS; I implemented it, I know - exception in execution should inform other stages about failure)

By the 1997 Intel has already bought Elbrus II design team, lead by Pentkovski [1]. That Pentkovski guy made Elbrus 2 a superscalar CPU with a stack machine front-end. E.g., Elbrus 2 executed stack operations in a superscalar fashion. You can entertain yourself by figuring out how complex or simple can that be.

[1] https://en.wikipedia.org/wiki/Vladimir_Pentkovski

So at the time your professor complained about Intel's inferior architecture being faster, that inferior architecture implementation has a translation unit inside it to translate x86 opcodes into superscalar-ready uops.



I think the Wikipedia page [1] agrees with your main point.

I said pipelining allowed you to increase the clock rate, which isn't the best thing to say.

The wiki page says, "instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps (the eponymous "pipeline") performed by different processor units with different parts of instructions processed in parallel."

And, "This arrangement lets the CPU complete an instruction on each clock cycle. It is common for even-numbered stages to operate on one edge of the square-wave clock, while odd-numbered stages operate on the other edge. This allows more CPU throughput than a multicycle computer at a given clock rate, but may increase latency due to the added overhead of the pipelining process itself."

[1] https://en.wikipedia.org/wiki/Instruction_pipelining


Addition was not the bottle neck for the 386. It had a FO4 delay of 80+ per clock. An adder is much faster.

Maybe you meant that it was (one, just one!, of many of) the bottle neck(s) in an optimized implementation?


> Addition was not the bottle neck for the 386.

It is a bottleneck for MIPS, SPARC, Alpha and not for 386. How so?


The 386 wastes so many FO4 gate delays on other things. I thought I made that extremely clear?


Can you elaborate on where the delays came from?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: