Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headline...

> Any headline that ends in a question mark can be answered by the word no.

As confirmed by the article, of course.



It wasn't readily clear to me that this was an obvious pipe-dream.

I remain cautiously optimistic. There are often large performance gains left unclaimed for the purpose of "generality". My favourite example is Postgres vs TimescaleDB; by exploiting the structure of certain tables (in the case of TimescaleDB, time-order), we can get better performance -- but of course that only works with time-series data.

Could it be that by focusing on parallel operations in a separate chiplet, these workflows can be made much faster? Maybe someone here has the background to tell me why I should be more pessimistic


Glanced at it quickly, and they mention new parallelism primitives that can be used to write faster parallel code, as well as speeding up existing parallel code. So if anything is 100X faster, it's probably some very specific routine written to take advantage of the primitives.


> Our investors were especially excited about the innovativeness and uniqueness of Flow’s technology, its strong IP portfolio and the potential to enable a new era of superCPUs (CPU 2.0) for the AI revolution.

This part makes me think particularly they are building off hype - they showed a tech demo that did well for some specific case that happens to be a buzzword.

The following statements:

> A. Nonexistent cache coherence issues. Unlike in current CPU systems, in Flow’s architecture there are no cache coherence issues in the memory systems due to the memory organization excluding caches in the front of the intercommunication network.

And

> E. Low-level parallelism for dependent operations. In Flow-enabled CPUs, it is possible to execute dependent operations with the full utilization within a step (with the help of chaining of functional units), whereas in current CPUs the operations executed in parallel need to be independent due to parallel organization of the functional units.

Raise my eyebrows, more than a little bit. If to compute e.g. (assume for a second you do not have optimized instructions for this) you have E = D * C and C = B + A, how exactly you can compute E without first knowing C? You can't, not without computing all possible values of E for C (inefficient).

Introduce locking into the situation (where cache coherence issues come into play), their "excluding caches in the front of the intercommunication network." makes no sense.

You need locks to make guarantees about memory safety, saying that you have none is fine if you have nothing to lock, but then you're likely not doing general processing, you're likely doing something specific.

Which I think is a clue as to what they are really doing:

> D. Flexible threading/fibering scheme. Flow-computing technology allows an unbounded number of fibers at the model level, which can also be supported in hardware (within certain bandwidth constraints). In current-generation CPUs, the number of threads is - in theory - not bounded, but if the number of hardware threads is exceeded in the case of interdependencies, the results can be very bad. In addition, the operating systems typically limit the number of threads to a few thousand at most. The mapping of fibers to backend units is a programmable function allowing further performance improvements in Flow.

To me, this sounds like someone CPU-ified a GPU. That's the primary purpose of a gpu, efficiently run a shit-ton of threads. Except of course GPUs aren't great at processing. But it fits the use case of AI algorithms well, which stand to gain a lot from general GPU improvements.


> Raise my eyebrows, more than a little bit. If to compute e.g. (assume for a second you do not have optimized instructions for this) you have E = D * C and C = B + A, how exactly you can compute E without first knowing C? You can't, not without computing all possible values of E for C (inefficient).

I think what the mean (according to their diagram) is each result has to be written back to the register file before another unit can use it. So conventionally you would compute C = A + B and update the C register and in the next step compute E = D*C.

What they seem to claim is they directly bypass the computed C result from the add unit the the multiply unit, hence it is pipelined. This is a bit disingenuous as any performant cpu worth it's salt will have operand bypassing.


> This is a bit disingenuous as any performant cpu worth it's salt will have operand bypassing.

Right, but that's an operation optimization. Nobody disputes that particular instructions (combined add/multiply) can be optimized and pipelined, but that's very different from claiming that arbitrary calculations no longer have dependent steps.


The article doesn't say no, it says idk. The author asks readers if they know the answer in the body. This is the first question-marked headline I've ever seen that's actually meant as a question.

Also, the studies on that Wikipedia page disprove the law.


> But the company can’t quite show any of that today — because Flow hasn’t built a chip and doesn’t necessarily intend to build one, its co-founders tell The Verge.

The answer is clearly, no, they didn't make CPUs 100x faster. Maybe they intend to, but that's not the same thing as having done it.


I guess if you take the title literally. I understood it to mean, did they just figure out how to do it.


They claim that they think they know how to do it. That's two conditionals.


Well yeah the thing is extremely likely a scam.


The article also ends in a question mark. Journalist: “here are their white papers, what do you think?”

That’s… an interesting way to write an article. Maybe they will write an article about comments to bring it full circle.


Thanks for posting this. I with more people were aware of this law.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: