The connection between "AI" and "GPU" in everyone's mind is a testament to the P...

midko · on Nov 9, 2017

Indeed, and that's why there are a couple of startups working on new chips and why Google has the TPU. Here's a nice technical talk from Graphcore's CTO about that https://youtu.be/Gh-Tff7DdzU

dharma1 · on Nov 9, 2017

Was at that talk and it looks very interesting, the team has delivered before.

They just raised more money too - I just wonder how painless the developer experience will be with using their drivers with latest versions of your chosen DL framework, and how price/perf will compare with DL specific tensor processor/GPU hybrids like Volta.

sspiff · on Nov 9, 2017

What you need is a massive amount of parallel cores. Currently, the cheapest and most efficient way to achieve this is GPUs. It's true that some graphics-specific parts of a GPU are not needed for compute-only kernels (such as those for ML or other AI tools), but it's still a lower overhead than a CPU in yet another box.

Who knows, perhaps Intel will be developing more general purpose massively parallel compute processors, but intends to integrate some of the knowledge and experience accrued from the field of graphics processors.

dnautics · on Nov 10, 2017

I think they'll learn from the knights series of processors:. Intel seems to keep shooting itself in the foot by backing powerful computational cores with terrible on chip network architecture

mtgx · on Nov 9, 2017

And yet Intel seems to be wanting to make GPUs for machine learning now...so I guess Nvidia's PR worked against Intel, too?

But as I said in another comment, the truth is Intel doesn't seem to be knowing what it's doing, which is why it's pushing in 5 or 6 different directions with many-core accelerators, FPGAs, custom ASICs, neuromorphic CPUs, quantum computers, graphcores, and so on.

By the time Intel figures out which one of these is "ideal" for machine learning, and behind which arrows to "put more wood," Nvidia will have an insurmountable advantage in the machine learning chip market, backed by an even stronger software ecosystem that Intel can't build because it doesn't yet know "which ML chips will win out".

If I would describe Intel is a sentence these days is "Intel doesn't have a vision." It's mostly re-iterating on its chips and rent-seeking these days by rebranding weak chips with strong chip brands, and adding names like "Silver" and "Gold" to Xeons (and charging more for them, because come on - it says Gold on them!), as well as essentially bringing the DLC nickle-and-diming strategy from games to its chips and motherboards.

Meanwhile, it's wasting billions every year on failed R&D projects and acquisitions because it lacks that vision on what it really needs to do to be successful. Steve Jobs didn't need to build 5 different smartphones to see which one would "win out" in the market.

newlyretired · on Nov 12, 2017

Non-incremental advances require a lot of wasted-path R&D. If any of Intel's projects creates a generational leap, it will pay off handsomely. When the way forward isn't clear, I like to use the concepts from path finding algorithms to drive strategy. Assuming you can afford multiple parallel efforts.

It's not clear if doing this in-house, or closely monitoring the state of the art and then buying a company that develops a winner, is superior.

p1esk · on Nov 9, 2017

Nvidia is probably "wasting" just as much money on R&D to figure out "which ML chips will win out" 5 years from now, so that they can built it first and sell it as a "GPU".

marcosdumay · on Nov 9, 2017

> many-core accelerators, FPGAs, custom ASICs, neuromorphic CPUs, quantum computers, graphcores

Most of those are completely different technologies that will almost certainly not share a niche.

paulmd · on Nov 9, 2017

> If you can afford to start from scratch, you can lose a lot of this baggage.

Effectively, this argument is much like saying that your personal workloads don't use AVX and demanding that Intel tape out a whole different die without it. You would very rightly be laughed out of town for even suggesting it.

Much like the economics of cryptomining cards that lack display outputs, this comes down to whether there is actually enough of a market to justify taping out a whole specialty product just for this one niche, vs the economies of scale that come from mass production. AFter all that is the logic behind using a GPU in the first place, instead of a custom ASIC for your task (like Google's Tensor Processing Unit). On the whole it is probably cheaper if you just suck it up and accept that you're not going to use every last feature of the card on every single workload. It's simply too expensive to tape out a different product for every workload.

This only gets more complicated when you consider that many types of GPGPU computation actually do use things like the texture units, since it allows you to coalesce memory requests with 2D/3D locality rather than simple 1D locality. I would also not be surprised if delta compression were active in CUDA mode, since it is a very generic way to increase bandwidth.

The GPGPU community absolutely does use the whole buffalo here, there is very little hardware that is purely display-specific. If you want hardware that is more "compute-oriented" than the consumer stuff, that's why there's GP100 and GV100 parts. If you want even more compute-oriented than that, you're better off looking at something like that's essentially fixed-function hardware dedicated to your particular task, rather than a general-purpose GPU unit.

So, it doesn't really make any economic sense.

mrec · on Nov 9, 2017

I'm curious - what's getting "increasingly complex" about rasterizers?

unsigner · on Nov 9, 2017

I was referring to the move towards tile-based operation, and towards more paralellism in the front-end (e.g. for a long time even the most powerful GPUs had a bottleneck of processing one triangle at a time at a certain point in the pipeline, and recently increased it to... two).

amelius · on Nov 9, 2017

I hope we can also get a more scalable form factor, so that we can keep stacking these new compute engines even if we run out of PCI slots or physical space inside the case.

likelynew · on Nov 11, 2017

Internal bandwidth is usually the limiting factor for neural networks, not the ALU.