Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Researchers: Are we on the cusp of an ‘AI winter’? (bbc.co.uk)
110 points by m-i-l on Jan 17, 2020 | hide | past | favorite | 117 comments


I think there's an interesting disconnect right now between research and practice. Cutting-edge research does feel like it's reaching a plateau - across most AI fields even "major" breakthroughs are only gaining a couple percentage points and we're probably starting to hit the limits of what current approaches can achieve. When the state-of-the-art is 97% on a task, there's only so much room for improvement. Yoav Goldberg posted a tweet about Facebook's RoBERTa model that summed it up pretty well: "oh wow seems like this boring public hyperparameter search is going to take a while" [1]. There's a vague feeling of "What's next?" now that all the benchmarks are fairly well-solved but AI in general clearly doesn't feel solved.

However, state-of-the-art models aren't really used in production yet. I think the trend of "use AI/ML to solve X" has only started to pick up in the past 2 years, and it'll continue well into the 2020s. The process of taking research models and putting them into production is not standardized yet, and many models don't even really work in production - if your model takes a second to do an inference step that's fine for research but maybe not for a real product.

I think in the next decade, on the research side, benchmarks will be beaten less often, and instead there will be more focus on trying out radically new things, understanding weaknesses in current techniques, and finding new measurements that assess those weaknesses. On the industry side, there will still be lots of cool and exciting new achievements as already-known techniques are applied to old problems that haven't been addressed by AI yet.

As an aside, this was the first time in my life that I read the phrase "10s" referring to the 2010-2019. Kind of an odd-feeling moment!

[1] https://twitter.com/yoavgo/status/1151977499259219968


> When the state-of-the-art is 97% on a task, there's only so much room for improvement.

When models commonly achieve 97% on a task, it means it's time to define a harder task, as it's long stopped providing any useful signal.


I think when performance on a real world task can be expressed as a single percentage, we've over-simplified the hell out of it and it's time to rethink the problem.


Or worse, overfitted - which means the solution will implode when faced with real data (RIP EH).


Andrej Karpathy showed in the Tesla autonomy day how Tesla had to retrain their DNNs such that they don't get confused by bicycles mounted on vehicles. If 97% means your models get confused by something you see on the road every day, I wouldn't be too pleased about the state of the art.


Autonomous vehicles are clearly several steps up the difficulty ladder from bread and butter tasks such as speech recognition. The progress over the last 10 years is such that some subfields have exceeded human parity while others are only just getting started.

Depending on which subfield interests you, progress may be slowing or accelerating. That's why another "AI winter" is a bogus and alarmist concept. Winter for whom?


I'm always bemused by the idea that AI is nothing but machine learning, and ML is nothing but predictive analytics. Equating research in AI with a "boring hyperparameter search" shows how narrow it's become; saying you've "gotten 97% on a problem" refers to, obviously, classification accuracy of a model on a set of labeled instances. "Use AI/ML to solve X" means finding a way to translate X into a prediction task over feature vectors.

There's an old saying "If all you have is a hammer, all your problems start to look like nails." We may see an AI winter come about simply because we run out of things to pound with our hammer.


If all you have is a function, everything starts to look like a mapping between two sets.


97% is also 3 failures out of every 100 attempts. In a lot of day to day experience I suspect humans do much better then this still.


It depends. DL models have legitimately achieved superhuman accuracy on many tasks. Part of this is because deep learning is incredibly effective for a certain class of problems. But part of this is because humans are remarkably bad at some problems. Humans tend to be surprisingly bad at context-stripped tasks like "identify what object this blurry image is", and "what sequence of syllables is this short audio file?". But we have countermeasures to correct for our inaccuracy. Most importantly, we understand and use context to sanity-check the hell out of our imperfect senses, and nobody has any idea how we're going to get AI to do that.


> Cutting-edge research does feel like it's reaching a plateau

It's really not. The second half of last year alone had MuZero and Megatron-LM, to name just a couple that most scream to me that we are actually progressing towards AGI.

You say ‘When the state-of-the-art is 97% on a task’, but solved tasks are the least interesting tasks.


Also, Google released the Reformer just 2 days ago, and they claim it can ingest orders of magnitude more data than the Transformer.

https://ai.googleblog.com/2020/01/reformer-efficient-transfo...

TPUv3 is estimated to be 12 or 16nm process node, so the performance of TPU's next versions could still double over the next years (if needed).

At that pace of model research and hardware improvement, I would say we are still in the AI spring.


> solved tasks are the least interesting tasks.

Yeah, but they also tend to be pretty profitable.


Do we have any other indicators that it's actually progressing somewhere, besides screaming? Even such a triviality as "how do we recognize that we got there"? The research is still in very early phases, IMNSHO: impressive practical applications appear, but they're side effects of what appears as random flailing: "build it bigger, see if it helps. Build it sideways, see of it helps. Build it at full moon, see if it helps".

That suggests that the applications are the low-hanging fruit, with far more interesting results still to be discovered.


MuZero is in some sense the proto-holy grail, in that it implements learning and planning into unstructured tasks over purely internal models. While there is an obvious chasm between it and the end point, this is still something that has only recently become more than an abstract goal, at least to any effect.

Being able to perform planning over ‘simple’ domains like Atari games and Go (and not even in the same trained model!) might not seem very comparable to the real thing, but evolutionary history spent the bulk of its time building up the basics—most animals fail most cognitive tasks—so I don't think this is indicative of the progress being misguided, especially given networks-on-GPUs is literally a 10 year old field.

I think MuZero is a clear example of building by principles over random flailing. I get why there does also seem to be the latter, but it's certainly not the whole of it, and anyhow it worked for evolution ;P.


Sure, that looks promising. I'm not holding my breath for The Holy Grail: in an environment where it's been just-around-the-corner for as long as the field exists, there's always another unexpected corner;)


Difference between 97% and 99.99% in perception is huge for autonomous driving purposes. 300x less likely to cause an accident.


Remember those performances are for very constrained, some might say artificial tasks of specifically image recognition into categories.

Your Tesla might be 99.99% in recognizing red lights but will continue to drive straight into paper boxes in its immediate path.


Do you know of any good resources to learn more about this idea of the rate of improvement of perception per percentage point?


https://en.wikipedia.org/wiki/Odds_ratio

odds of crash a => 97% => 3 / 100

odds of crash for b => 99.99% => 1 / 10000

improvement (odds ratio in this case) is then 300x = odds of crash for b / odds of crash for a


That may be true at a given moment, but how about time? I usually don't care about the probability that I'll crash within the next millisecond, but I do care about the probability that I'll crash over a whole trip.


I have some hope that research like what Numenta is based upon will lead us closer to an AGI.


The current SoTA achieve this 97% with a high cost in the number of samples. We are a living proof that it can be better. I believe that there Will be a push in achieving same generalization with less data.


> While AGI isn't going to be created any time soon, machines have learned how to master complex tasks like:

> translating text into practically every language

Note: they said they have mastered these tasks.

Yeah... I'm not sure a lot of native speakers would agree. Here's a great example of using Google Translate to automatically translate a video game.

https://www.youtube.com/watch?v=_uNkubEHfQU

> Driving cars

I'm not so sure about that one either.

I think we're in the valley that AI can do a lot of things, but are hitting limits in accuracy where humans are still better at some of these things sometimes. That is, the AI isn't always better than humans, even at a specific, non-general task.

Now don't get me wrong, we've made a lot of progress, but I wonder if we can get these things to a place better than humans before the next economic recession. I think the biggest risk to AI is having the money dry up. Right now the hype is strong and the money is (nearly) free. If one of those changes, we could put this back on the shelf for another decade. If we go into a recession, labor will be cheap, so why bother automating with AI?

For example, we had self driving freeway cars back in the 90's.[0][1] Here's one of the lessons learned:

> In 1987, some UK Universities expressed concern that the industrial focus on the project neglected import traffic safety issues such as pedestrian protection.

And who doesn't remember the brilliant Dr. Sbaitso, my childhood therapist. [2]

[0] https://en.wikipedia.org/wiki/Eureka_Prometheus_Project [1] https://www.youtube.com/watch?v=I39sxwYKlEE [2] https://en.wikipedia.org/wiki/Dr._Sbaitso


> Now don't get me wrong, we've made a lot of progress, but I wonder if we can get these things to a place better than humans before the next economic recession.

I don't think it's necessary to completely solve superhuman performance to achieve automation of great economic value. Some of the most famous AI achievements leverage a fairly modest intelligence improvement with massive amounts of classic automation. E.g., the AlphaGo policy/value networks, combined with MCTS.

It may also be possible for automation to reliably determine when it's encountering a situation it's going to fare badly in, and then hand off to human telepresence control. It wouldn't surprise me if the first self-driving systems worked that way.


> and then hand off to human telepresence control. It wouldn't surprise me if the first self-driving systems worked that way.

The issue is that humans fare badly in situations where they don't have to pay attention until they suddenly have a short time to react before disaster, and that's exactly the problem that "supervised" self-driving systems have been suffering from when they crash.

And if you have to pay constant attention to the road anyway, I'd rather be driving myself. What's going to win won't be self-driving tech, it will be driver assist, like advanced anti-lock brakes, or self-adjusting cruise control.


They do. FTR, this is what killed EH: "computing...computing...computing...oh well, let's alert the human - so they can watch the crash about to happen in the next second."


As amusing as the Final Fantasy translation project is, it simply is not the right comparison. It's a case of using the wrong tool for the job.

There are better systems for natural language translation that attempt to keep track of the subject, something that is very often implicit from context in Japanese. Additionally, that subject could be implicit based on the visuals of what is on screen. The enemy names are abbreviations due to artificial limits of the system...

All in all, it's like using a hammer to drive a screw. It might sort of work, but you shouldn't be surprised when it fails. It's just a case of using the wrong tool for the job. That is hardly proof that Hammers are bad, or the screws can never be successfully inserted into a board.


Right, I get what you're saying about Japanese, I took a few years of Japanese in school, and it is a really interesting grammatical problem. (Which is why I love this example even more)

If you read the released book about the project, one of the problems he brings up is that also the AI will pick words that start with the same letter as the first kana, seemingly at random, when it gets confused.

While I don't disagree that using AI might not be the best tool for Japanese translation, or that there might be better specific Japanese translators that use AI, I think the idea that AI has "mastered translation" is honestly going a bit too far, which was the point I was attempting to make.


As for the translations - even a seemingly innocent sentence like "the sand people ride single file to hide their numbers" is fraught with ambiguity. I wouldn't call "unable to tell file-the-tool from file-the-data-representation from file-the-line" mastery.

Useful for a rough meaning in a completely unfamiliar language, sure - the hyperhype surrounding current SoA is actually contributing to the descent to AI winter.

https://www.everything2.com/title/The+sand+people+ride+in+si...


What makes me somewhat bullish is that parallel compute keeps getting better, cheaper and more widely available.


I don't disagree that translation is far from mastered, but bear in mind that Google Translate isn't state of the art, mostly because of computational constraints, and 2016 GTranslate was even worse.


No true Scotsman? What is, then, the state of the art in 2020 machine translation?


I'm not actually sure, but DeepL[1] is probably the best online one, and Google's larger M4 models[2] are probably up there too.

[1] https://www.deepl.com/translator

[2] https://ai.googleblog.com/2019/10/exploring-massively-multil...


No. Maybe an "AI Fall", but I doubt there will ever be another true "AI Winter". The AI we have today is too good, and creates too much value... at this point, there is no longer any question as to whether or not there is value in continuing to research and invest in AI.

What will happen, almost without doubt, is that particular niches within the overall rubric of "AI" will go in and out of vogue, and investment in particular segments will fluctuate. For example, the steam will run out of the "deep learning revolution" at some point, as people realize that DL alone is not enough to make the leap to systems that employ common sense reasoning, have a grasp of intuitive physics, have an intuitive metaphysics, and have other such attributes that will be needed to come close to approximating human intelligence.

Disclaimer: credit for the observation about "intuitive physics" and "intuitive metaphysics" goes to Melanie Mitchell, via her recent AI Podcast interview with Lex Fridman.

One other observation... while we still don't know how far away AGI is (much less ASI), or even if it's possible, the important thing is that we don't need AGI to do many amazing and valuable things. I also doubt many people are actually all that disillusioned that we aren't yet living in The Matrix (or are we???).


We could still have the bottom fall out of the term "AI", since there's a big gap between the present reality - no matter how useful - and the aspirational nature of the phrase "Artificial Intelligence". Take any business that brands itself as an "AI startup", any quote from Mark Zuckerberg about solving Facebook's content problem with AI, etc., and replace "AI" with "statistical algorithms" and it just doesn't have nearly the same ring to it. That alone means we're due for some kind of big correction.


> the important thing is that we don't need AGI to do many amazing and valuable things

that's absolutely true but I think a lot of people still consider actual human-like intelligence, common sense and so on to be important features of what we call AI or AGI. I think it's very obvious when we look at the cultural impact of AI in fiction or even in discussion around the dangers of AI that this is what many people in and outside the field are thinking about.

I think it's true that the commercial success of current techniques will endure but then maybe we should start differentiating between a sort of 'automation science' and cognitive/intelligent systems. Because on the latter I really don't think we are seeing much or maybe even any progress.


Reminder: Most of the seminal accomplishments of this era's AI wave were actually developed in the 70s-90s. Yes, even GANs and RL. This industry has been riding on the NVIDIA welfare program for the past 10 yrs. How long until the hardware gets maxed out?

http://people.idsia.ch/~juergen/deep-learning-miraculous-yea...


We need yet another set of big breakthroughs because just adding more computing capacity is not going to carry the boom.

Maybe the correct way to measure advance in AI is Turing Awards.


Yeah, my opinion, neural networks alone aren't going to cut it. We need a better primitive.


> This industry has been riding on the NVIDIA welfare program for the past 10 yrs

Nvidia has been very profitable for the past 10 years: https://www.macrotrends.net/stocks/charts/NVDA/nvidia/net-in.... I would call it a synergy, but it does smell of intellectual welfare.


Probably until < 5nm . Once we get there we are going to see a massive influx of engineering to things like quantum computation .


I've been waiting for the hype and marketing to collapse for a few years.

Probably the fastest way to tarnish public perception of AI would be to keep pushing "AI-enhanced" products in front of the consumer as has been done. These things tend to demo well and have a nice cool factor for the first fifteen minutes or so, but after any kind of prolonged usage the limitations and rough-edges come up quick.


This is brand new technology. It's going to take a few years to reliably productionize - and most of the applied solutions will look nothing like the research. Many real world problems are going to combine multiple neural nets into systems with specific applications and there's a lot of detail to work out.

The hype may collapse in the short term, but that's only because many of the first movers are stereotypical tech startups who overpromise without truly understanding the problem or solution spaces and therefore underdeliver.

But, speaking from personal experience, some of the tech has already been proven - one example is massively accelerated modeling as an alternative to slow finite difference/finite element simulation with 99% accuracy, which will in the next 6-12 months totally change the approach to a wide range of modeling problems, and enable a totally new form of work where instead of setting up a model and waiting days or weeks, one may iterate effectively in real time. There are emerging solutions to knowledge management and "intelligent" data harvesting, where ML outputs are being manipulated in a rudimentary form of reasoning. Think specialized industries like petroleum, mechanical engineering, EM engineering - plenty of "layman" related features like recommendation engines are going to flop, but the cat is out of the bag for heavy industrial knowledge work. Just give it some time - we are on the cusp of a monumental leap in R&D across the spectrum of human endeavor. Very exciting times.


> one example is massively accelerated modeling as an alternative to slow finite difference/finite element simulation

I work in the field of numerical modelling (fluid simulation), and I haven't seen any convincing demos of anything but utterly trivial problems solved this way. Care to share some examples?

(There are some works that focus on fluid simulation for CGI. That's neat, but a) only interpolates between real simulation data, and b) doesn't achieve physical accuracy, just looks good.)


>Care to share some examples?

Unfortunately, the only example I know of is confidential, with a provisional patent filed. But what I can tell you is that we are already solving non-trivial problems some 3-6 orders of magnitude faster than FDM/FEM. Not quite navier-stokes tier, but still in the realm of PDEs.

Sorry I can't be more specific, such is the nature of cutting edge industry, as I'm sure you understand!


"Nullius in verba" - I'll believe it when I see it.

And not sure I see the point in patenting such a thing. If it's as good as you say, in less than six months a research group in some country where US patents aren't held in very high regard will have duplicated the work and either sell it as a service or publish it in the scientific literature. They might not even mention knowing about the patent at all! Things have been discovered by multiple people almost simultaneously so many times in the history of science that it's almost a natural law; it'd be impossible to prove foul play.


> in less than six months a research group in some country where US patents aren't held in very high regard will have duplicated the work and either sell it as a service or publish it in the scientific literature

US patents can be used to prevent the sale or import of violating foreign products into the US market. It's still the largest market and US patents are still the lynchpin of any patent portfolio.

Simultaneous discovery does occur and primacy is difficult to prove. Most patent offices had first-to-file as the deciding rule, the USPTO adopted first-to-file a few years back to cut down on endless litigation.


I used to do electromagnetic modeling using finite element methods (though now a product manager for AI software infra) and it would to take me on the order of hours to days or weeks to model wave interaction with real-world objects.

A machine learning model trained to understand Maxwell's Equations can in principle be used perform said simulations, resulting in probably an order or more of magnitude increase in simulation speed. Getting this to work well will reduce the time (and cost) it takes to design optical sensors, radar for autonomous vehicles, smartphone antennas, MRI machines, and more.

Having said that, it would require a lot of heaving lifting to pull this off to achieve near-physical accuracy for real-world physics problems.

A cursory search on Google for "arxiv deep learning electromagnetics" returns results of proofs of concept in this direction.


Were would the speedup come from? I don't understand.

If I understand your comment correctly, essentially you have a hand-crafted simulator for some physical process and then you train a neural net model to approximate the simulator. Why would the approximated simulator have "an order or more of magnitude increase in simulation speed"? Unless the approximation has massive losses in accuracy, of course.

Honestly asking and really interested to know what you mean.


It's all about precision heuristics, derived from joint probabilities of inputs and outputs. That, by and large, is how I am increasingly coming to understand the power of neural networks.

Imagine you are given a picture of a candle, overlaid with a grid, and asked to fill in, with colored pencils, colors for the air surrounding the candle representing relative temperature. Of course a human utilizes intuition to rapidly assign high temperature to the flame and decreasing temperature with increasing distance.

A "dumb" finite method would need, even for such a relatively simple problem (for a human), to perform calculations for a series of time steps in each grid until some steady state condition to arrive at a much more precise but still overall similar coloring of the grid cells. You can do the same task much more quickly because you have developed intuition of the physics, which is to say you have learned heuristics which capture the general trends of the problem (air is hot close to a flame and cold far away).

Neural nets take the best of both worlds - by effectively learning probability relationships between input and output pixels, they internalize heuristic approaches to produce outputs approaching finite method accuracies at a fraction of the computation. There's a lot of waste that can be optimized out of finite computation by hardcoding rules (heuristics), but doing so for real problems is impractical. Neural nets learn these rules through training - a far simpler task is organizing the data to teach the net the right trends; much like designing lessons for a child to teach a predictive ability.


I'm skeptical of the claim that it's easier to train a neural net than to hand-code a set of heuristics _when the heuristics are already known_. For the time being, optimal results with neural nets need more data and more computing power ("more" because it's never enough) and are primarily useful when a hand-coded solution is not possible.

I also don't understand how it is possible for a neural net (or any approximator, really) to approximate a "precision heuristic" faster than a hand-coded heuristic and without a gross loss of well, precision in the order that would make the results unusable for engineering or scientific tasks. Could you elaborate?


I’m also skeptical, but after reading the explanation above, I am intrigued.

Say I have a cube with 100 x 100 x 100 mesh cells inside, and ports on opposing faces. Given enough time, I can literally run through every possible combination of PEC and air for every cell and solve the FD form of maxwells equations, then save the results. Now, a user can ask my solver for any of those cases, and I simply pull the presolved result, and give the user the answer with orders of magnitude reduction in time.

Obviously, the presolving approach doesn’t scale. More materials, more mesh cells, eventually it is impractical to presolve every case. But the beauty of neural networks is that they can be very good at generalizing from a partial sample of the problem space. In effect, they can give results close enough to the presolve solution with drastically reduced numbers of computations.


>> But the beauty of neural networks is that they can be very good at generalizing from a partial sample of the problem space.

That is really not the case. Neural nets generalise very poorly, hence the need for ever larger amounts of data: to overcome their lack of generalisation by attempting to cover as many "cases" as possible.

Edit: when this subject comes up I cite the following article, by François Chollet, maintainer of Keras:

The limitations of deep learning

https://blog.keras.io/the-limitations-of-deep-learning.html

I quote from the article:

This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a dense sampling of the input space, in order to learn a reliable mapping from input space to output space.


Well...I think that take is a little overly cynical, and I disagree particularly with this:

>the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time

In my experience that isn't really true, if you have an appropriately designed net, training data which appropriately samples the problem space, and the net is not overtrained (overfit).

You can think of training data as representing points in high dimensional space. Like any interpolation problem, if you sample the space with the right density, you can get accurate interpolation results - and neural nets have another huge advantage, in that they learn highly nonlinear interpolation in these high d spaces. So the net may be unlikely to generalize to points outside of the sampled space - although now that I think of it I'm not sure of how nets handle extrapolation - but when you're dealing with space with thousands of dimensions (like each pixel in an image) you can still derive a ton of utility from the interpolation which effectively replaces hardcoded rules about the problem you're solving.


I may be jumping the gun a little because I was thinking about this in the context of another thread, but a practical problem with machine learning in general is that, for a learned model to generalise well to unseen data, the training dataset (all the data that you have available, regardless of how you partition it to training, testing and validation) must be drawn from the same distribution as the "real world" data.

The actual problem is that this is very difficult, if not impossible, to know before training begins. Most of the time, the best that can be achieved is to train a model on whatever data you have and then painstakingly test it at length and at some cost, on the real-world inputs the trained model has to operate on.

Basically, it's very hard to know your sampling error.

Regarding interpolation and dense sampling etc, the larger the dimensionality of the problem the harder it gets to ensure your data is "dense", let alone that it covers an adequate region of the instance space. For example, the pixels in one image are a tiny, tiny subset of all pixels in all possible images- which is what you really want to represent. Come to that, the pixels in many hundred thousands of images are still a tiny, tiny subset of all pixels in all possible images. I find Chollet's criticism not cynical, but pragmatic and very useful. It's important to understand the limitations of whatever tool you're using.

>> although now that I think of it I'm not sure of how nets handle extrapolation

They don't. It's the gradient optimisation. Gets stuck to local minima, always has, always will. Maybe a new training method will come along at some point. Until then don't expect exrapolation.


It doesn’t need to generalize, just do sophisticated interpolation.

Basing the results on a dense sampling of the input space is exactly what I was suggesting.


Apologies for the misunderstanding. You said "generalizing from a partial sample of the problem space" and I thought you meant generalisation to unseen data from few examples, which is generally what we would all like to get from machine learnig models (but don't).

But, if a neural net can't _extrapolate_ to unseen instances, I don't see how it can solve problems like the one you describe with any useful precision, again unless it's trained with gigantic amounts of examples (which you say is not required). And how is this reducing computational costs with respect to hand-coded solvers?


To be clear - I have absolutely no experience in this domain. I'm just speculating.

In the example I gave, everyone agrees that if you had long enough and enough processing power, you could solve every possible configuration, and store the results. Then you could instantaneously "solve" any problem.

Unfortunately, the problem I describe is a toy problem (too simple to be useful), and yet it would still take way way too long to solve all the possible configurations.

What if you solved some tiny fraction of the configurations though? That would be a sampling of the configuration space. Then a neural network could use that sampling to interpolate to the cases not solved. That would provide a significant speedup over actually solving the problem.

So the real question is what density you need to pre-solve the configuration space to make it work? It definitely depends on what accuracy you need in the solution, as well as how good you can do with the interpolation. If I said previously that gigantic numbers of examples are not needed, then I misspoke. I am sure they would be needed. Gigantic is vague though - is it the kind of number that can be rented from AWS, or is it the kind of number that would require civilization resources?

I have no idea if the math actually works out to make it a useful approach. All I am saying is that conceptually I can see that in some cases, it could be possible.


>> So the real question is what density you need to pre-solve the configuration space to make it work?

Yes, that's the main question. I don't know the answer of course but if we're talking about an engineering problem where precision is required, intuitively the more the merrier.

The thing is, with neural nets you can do lots of things in principle and many things "in the lab". When you try to take them in the real world is the tricky bit. Anyway, another poster here is saying we'll see big things in the next five years so let's hold on to our hats for now.


Well, there are pretty convincing examples in other domains: try hardcoding rules to classify animals or objects in photos, especially an algorithm which can handle thousands of different categories. Totally impractical - but if we appropriately design the net and structure the training data, you can train a pretty accurate net on a mid-range GPU in a matter of hours to do what would take far, far longer to hardcode!

Perhaps not quite appropriate to call them heuristics in this context, but the principle is the same - you are leveraging joint probabilities of pixels to generate some conditional output. Similar principle in ML accelerated modeling.


I think I understand what you meant by heuristics. I agree that it's impractical to try and hand-code image recognition rules and all attempts to do that in the past have failed as they have in similarly complex domains (like machine translation, say). My concern is particularly about the use of neural networks (or in general machine learning models that learn to approximate a function) in domains where precision is normally required, like engineering. I mean, I know there's plenty of approximation in engineering already but of course we're not talking about computing integrals here (er, I think?).

Anyway I was especially trying to understand the OP's comment about speedup using a neural network. I'm still a bit confused about that. But thanks for the conversation.


You're on the right track. A lot of this tech is a potential goldmine and I'm sure there are many players developing in secret and not publishing yet (or ever).


The hype only started in 2015ish, how many years could you have been waiting for it to collapse?


Watson has been an IBM hype-brand for a decade. It is finally getting to the point that it can reliably achieve some parts of what I was sold on back in 2012.


IBM doesn't count, or at least HN completely disregarded it. Modern AI hype really only became a thing after GAN and NVIDIA Pascal, 2014 and 2016 respectively.


In the sense of AGI, it's all been hype. We are in an ML summer and have been for the past few years.

But "deep learning" is nothing more than that, nothing to do with AGI, we're not approaching an AGI winter except for people who were daft enough to fall for the hype.

There have been no advances in AGI in decades, it's already winter, and we've long been in it.


> it's already winter, and we've long been in it

In terms of research and innovation, yes, but in economic terms, it has not even begun. There is still huuuge VC and government money being pumped into anything with AI on it. The last AI winter started when the financiers discovered the disconnect between the money they put in and delivery on promises.

AI went from an obscure hard CS field that only a few graybeards at MIT knew anything about, to this worldwide meme. Before the default thing your grandmother would tell you to study in college was business. Now your grandmother would tell you to study AI. I'm seeing a lot of people enter this space with the vague goal of getting rich quick. This is the same cohort that jumped into tech in the late 90s and the real estate market in the mid-2000s. It's not the AI of the Norvig and Marvin Minsky days.

I couldn't be more bearish about AI. I still love it though. I won't stop studying it when it becomes not cool anymore.


Agree, me too. It's one of the most fascinating problems in the world, and I'll be delighted when "they" decide it's dead and the spotlight moves on.


I think we're unfortunately fixated on a very literal reading of the famous Turing test (i.e. cleverly emulating humans = intelligence).

Consider language, for instance. Dolphin communication is intelligent, but does not emulate humans well; whereas the computer program ELIZA (1964) lacked intelligence, but was able to emulate humans well enough to entertain many people for quite some time.

Our current state-of-the-art NLP is - after copious research, talent, and computation - able to emulate human language somewhat better than ELIZA. But is it intelligence? There's certainly a lot of complexity involved, and neural networks show some interesting building-block patterns, but the lack of these algorithms' ability to generalize into new spaces, grow our fundamental understanding of the world around us, or really do anything besides pretend to be a human makes one wonder whether our current "AI" is just a (very good) party trick - a better version of ELIZA.


I have worked in the field since 1982, so I have experienced “the need to work on other things for a while” to earn a living.

My prediction is that we are going to see a small revolution in cost reduction: hardware for deep learning will get cheaper; great educational materials like fast.ai and Andrew Ng’s lessons will increase the hiring pool of people who know enough to be useful; the large AI companies will continue to share technology and trained models to help their hiring funnel and general PR; programmer less modeling will really start to be a real thing.


Alot of the cost in AI projects now are in training or education, but instead in problem solving and plumbing. AI/ML free projects using things like Kafka and Flink are not cheap.

Coding up a CNN or MLP is not a big deal, but it never really was - it was work to build a c back propagation implementation but if I did it in 1995 then anyone could. The question and real differentiator is in answering three problems :

- what's the problem? - how can we get the data to the system? - how do we frame the data and output in terms of (any) AI technology?

All of these steps are closely coupled and require expertise.

On the programmer less modelling; I still have not seen a tool that is better than code for expressing a model precisely and testably, and my experience is that until we have some running code we don't really know that we understand the system.


The AI summer/winter cycle feels to me like something similar to a search algorithm, we have a phase of exploration, which seem to have slower, if any, progress and no one is very sure what's the next big thing so they perhaps start trying many things (the winter "skepticism") and eventually someone finds some breakthrough and gets everyone to an exploitation phase in which everyone knows where to invest and comparatively small effort is required to create progress (the summer "hype"). And eventually all low-hanging fruits are over and the search seems to converge to a local maximum and again larger exploration is required.

So maybe the winter is just as important as the summer. Each winter lead to a summer with different focus points (specialist systems and logic followed by neural networks, bayesian models and SVMs and finally deep learning). And after each cycle we have more and more tools, each more useful than the last. And also maybe the key to avoid this strict cycle would be to encourage more exploration during the exploitation phase, giving full support to both incremental ideals that improve on the state of the art and (potentially) revolutionary ideas that give poor immediate results but create new venues to investigate.

Of course that's a simplification and there are many aspects to it, including data availability, hardware and tooling that can easily prevent brilliant ideas that were had too soon.


Can't comment on every industry, but in medicine - especially the 'pattern-recognition-specialties' such as formost pathology and radiology - the actual implementation/usefulness/impact of "AI" (ML/DL) has not yet taken a foothold.

Yes it's hyped, but the match between even the current state of DL and what is needed and possible in these specialties is so close to being perfect, and the gain is so close. What is holding us back is regulatory issues and technical implementation issues that have nothing to with the state of DL, just basic IT problems, lack of standards.

Investments may fall back and companies may stop adversing it as "AI", but the impact of ML/DL in medicine will not fall back.

The "AI" we see today is already effective, just not applied at scale.


There is a business opportunity here.


Oh yes - and it's seeing massive investments. Last RSNA - largest Radiology conference - saw >100 AI startups showcasing their products. (half of which will probably be gone by December this year, bankrupt, taken over or merged)


Why would there be an AI winter? Was there a car winter after cars became a growing product? Was there a processor winter after microprocessors became a growing product? ERP software?

Didn’t the previous AI winter happen because the hardware wasn’t advanced enough to make the technology useful to most people? Since that is no longer the case, why this consistent belief that there will be another winter?


2001 wasn't exactly great for the semiconductor industry...


Sure, there will be ups and downs like any industry. But a “winter” implies a decade at least without substantial industry or research, not just a bad year.


Research progress might slow down for a bit in some areas of machine learning but the commercialization of existing technology will keep us busy for the next 10 years.

Unlike the past two winters, deep learning is actually enabling a ton of applications that wouldn't have been possible otherwise and we now live in a world with a lot more data and computers to apply it to.


Exactly this


We don't want AI, we want systems that work better autonomously. We have lots of autonomous systems, mostly run by people (a shop keeper for a shop owner, for instance). Now that we have reached certain limits of pure digital systems, more innovations (ie, changes leading to better system outcomes) will happen due to human involvement in the data understanding and automation. It's just going to look more like people going to work.

The idea that AI (ML models) would be designed once is silly. The tuning and application always involves human judgment over time. We just hide the human contributions to AI/ML systems because it gets too complicated. But really, all good/practicable/in-the-wild AI systems involve a lot of people-in-the-loop!


IMO, no. Unlike the last time, things actually work this time. Perceptual things especially. People in the thread seem to be dismissive of those "single digit percentage point" gains that are being made nearly every month in some important tasks, but those last few percentage points often decide whether the system is garbage or useful. Compare e.g. Siri and Google Assistant, for example. Likely relatively small difference on metrics which results in a _huge_ difference in usability.

Another mistake people make is they look at model performance on academic dataset and make unsubstantiated conclusions about usefulness of models. Guess what, practical tasks _do not_ involve academic datasets. Some of academic datasets are _stupid hard_ on purpose (e.g. ImageNet, which forces your net to recognize _dog breeds_ that few humans can recognize). If your problem is more constrained, and the dataset is large enough and clean enough, you can often get very good results on practical problems, even with models that do not do all that well in published research.


I actually think that basic deep learning is well on its way into the plateau of productivity. It's not going to be used strictly as AI though, just a more robust type of model fitting than traditional ML which required cleaner data and better extracted features.


While the expectation vs. reality dichotomy is very real, the cost vs. return is just as vital and, ultimately, more easily solvable in the years to come. Curbing the expectations of the money-givers in regards to what they might get out of these ventures is always tough but using tech is going to be cheaper because, well, the price trends for tech have been downward for a while.

Personally, hoping to see more shifts toward trying new things rather than attempting to perfect the already existing models. This would, well, not solve but circumvent the need to try and improve something when the tools are not there yet. This way, a broader groundwork will be laid.


I heard that the top tech research labs are already experiencing cutbacks and hiring freezes. Can anyone confirm?


I think the total economic impact of AI will be greatest for tasks that output high-dimensional data, such as GANs. For the simple reason that it can replace a lot more human labor. A great many jobs could be augmented with such tech.

Furthermore, I think the results from GPT-2 and similar language models show that researchers have found a scalable technique for sequence understanding. They are likely to just work better and better as you throw more data and training time at them. Imagine what GPT-2 could do if trained on 1000x more data and had 1000x more parameters. It would probably show deep understanding in a great variety of ideas and if prompted properly would probably pass a lot of Turing tests. There is evidence that this type of model learns somewhat generally, that is, structures it learns in one domain do help it learn faster in other domains. I am not sure exactly what would be possible with such a model, but I suspect it would be extremely impressive and meaningful economically.

I think we are likely to see that type of progress in the next year or two, and for there to be no AI winter.


While I don't think there's going to be an AI winter either, I don't think GPT-2 will achieve sentience or anything close to it.

And that's for the same reason that no matter how much data they feed Tesla's self-driving AI, it will still try to kill you now and then. The problem space is just too big. All the people I know in this space don't think it will be solved for at least a decade and maybe not even then.

But I do suspect the 2020s will see the creation of agents combining classical algorithms with deep neural networks to do amazing things in domains that are closed and constant. But they're all going to be glorified (yet wonderful) unitaskers.

The only thing that worries me is that I don't trust FAANG to do the right thing ever anymore, and it's amazing to me that so many have opted into the panopticon of things in exchange for the ability to order stuff and turn their gadgets on and off.


Does GPT-2 really "understand" anything? I feel like this is pretty quickly going to devolve into a semantic argument, but having interacted with some trained GPT-2 models, it seems to produce only what Orwell would have called duckspeak[0]. There's very clearly no mind behind the words, so it's hard for me to credit it with understanding.

[0] http://www.orwelltoday.com/duckspeak.shtml


I think the only time a system can be truly be said to understand something is when its answers are derived from logic (such as old school symbolic AI). No matter how good current statistical approaches get, they won't meet that bar.

However, I do believe we see evidence of approximate logical reasoning in these models, as well as the concept of abstraction.

Furthermore we can take statements generated with statistical techniques and validate them mechanically with older techniques. This is basically what recent work in automated theorem proving using deep learning is about.

Generating logical statements using heuristics and then validating them mechanically also sounds like a reasonable approximation of what a human often does, speaking as a human.


> Generating logical statements using heuristics and then validating them mechanically also sounds like a reasonable approximation of what a human often does, speaking as a human.

I think I agree with that, but I might add that humans who understand a topic well can also make novel connections and uncover further implications that might seem illogical at first glance. This process of "insight" seems poorly understood by everyone, but I think it goes beyond validating heuristic intuition.


If something acts as though it understands a thing then it does. What else could understand mean? Behold the damage that Searl hath wrought, lol.

Maybe it makes more sense to talk about predicting the behavior of a system independent of the composition of the system.


> If something acts as though it understands a thing then it does.

I've had the misfortune of working on teams where the hiring manager subscribed to this philosophy.


AI winter came because the best in-market was Clippy and Naive Bayes spam detectors.


Perhaps Artificial Intelligence needs to be rebranded?

Maybe call it:

Cybernetic Research (CR)

Computational Cognition (CC)

Statistical Reasoning (SR)

Computational Reasoning (CR)


There might be another A(G)I winter, but there won't be an ML winter...


I'm not sure about the pace of progress in research, but as an ML engineer at a startup who has been following developments, even if AI research stalls out completely, we've been given a huge set of amazing tools to apply to all kinds of technical problems for years to come.

I also think that even in the absence of massive breakthroughs, there's still plenty of work to be done during a "winter" in filling in the gaps of understanding between various SOTA advances. I think it may be the nature of science to advance in a sort of unbalanced tree of breakthroughs, where we drill down on certain popular and lucrative branches for a while before coming back to fill out and balance the width of the tree, if that analogy makes sense.

Just between transformers/autoencoders, GANs, classic classifiers, and combinations thereof I think we are already poised to see neural networks change society in the next ten or so years in a way similar to the influence of the internet. Especially if hardware and cloud computing continues to scale.


Can you give some examples of applications that you think will have big impacts? I see places where current AI techniques can make incremental improvements but I just don't see any applications that really seem game changing. The ones that come closest tend to be dystopian unfortunately, like most applications of facial recognition.


I run a startup that analyzed litigation and judicial opinions. We can figure out what arguments are made, which were persuasive to the judge, which judges conform to the mean and which are outliers.

The long term potential is to make justice less expensive by being able to evaluate cases more cost effectively. It can also potentially identify judges that are outside of the norm.

I don't see a problem with this as long as the system is just identifying what is successful and normal. However, if judges themselves start using it and adjust their decisions, there is potential for it to create feedback loops that can move the norm into towards something that is not necessarily just. So this system can be used for good as long as there remains an independent check on it (i.e., independent judges).

This concern is not hypothetical. Today, judges are already using tools that calculate average prison times for offenders based on previous rulings. That should not be allowed as it creates feedback loops and bakes in earlier biases. However, the public should be able to use those tools to evaluate their legal position.


My startup analyzes historical sales via LSTM and makes future predictions based on geo location, description of products, price of products, weather and holidays. I did the similar work for IBM 10 years ago and we could hardly accomplish fraction of what my startup can achieve today.

Right inventory at right location in right quantity is essential to e-commerce and our ai is helping out a lot in this regard


That sounds like incremental efficiency improvements, not something that will radically change people's lives. Not that that's not valuable and may help us keep economic growth going at reasonable clip but it doesn't seem like the type of thing that will change the character of most people's lives in the way that the Internet and smartphones have.


Matt; it's easy to say "it's incremental" - for example, an application that lets you order books online or communicate with groups of friends would be seen as incremental in 1990. The CEO's of Blackberry thought that the iPhone was good and feature rich, but they saw it as incremental as well!


Our equivalent is Amazon FBA. I think the warehousing and logistics of Amazon has changed the characteristics of most people's lives in the developed world. We aim to the do the same for selling from your own website


But people can already order almost anything from Amazon and have it delivered next day in many cases. No doubt it's a good thing for a business to be able to match that themselves from their own website but it's not like the change from no e-commerce to Amazon.


I am trying to de-centralize the ecommerce industry. Most people don't do it primarily because it's hard to find customers and also logistics is hard.


My startup is using AI in many forms in order to build an accurate digital twin of the world cheaply, and extract valuable insights from it.

In a few years we will have an accurate digital twin of the world, almost indistinguishable from the real world. this would have been impossible or way too expensive without massive automation with AI


> In a few years we will have an accurate digital twin of the world

No, you will not, unless you redefine what "accurate" means.


Google has 3D models of cities nowadays. 20years ago we only had 2D maps. Why do you think this trend will not continue?


Google Street View is full of artifacts, it's not even close to being accurate. The same goes for satellite imagery of rugged mountains. I'm not even mentioning the vegetation, snow cover, river levels, etc.


Accuracy is relative to the need. By accurate I mean accurate enough that we can extract actionable information from it.

For power plant critical structures we want 0.5mm, updated every 6months. For forest management we want 5m, updated every 2 years.

But this increase in spatial and temporal accuracy will keep on going. At some point in the future a small swarm of insect-sized drones will be able to capture a whole forest in a day for a super low cost. And a few people walking with basic smartphones for will be enough to map a whole city.

In 1980 you would have said that google maps 3D and google street view would never exist...


Well, you just redefined accurate to "accurate enough that we can extract actionable information from it". Your statement is true now.

BTW, I never say never.


Collecting more data scales far better than processing more data into information. That's one bottleneck right there.


While this is true about user data, this is definitely not true in the context of real physical world data.

Getting a picture from the tip of a wind turbine blade is done with a drone in the best case, or a rope access technician in the worst (legacy) case. Analyzing this picture with AI to identify anomalies takes a few seconds. That’s one of my company’s use cases.

In the same way, ask the google maps team about what scales better: flying real airplanes to capture photographic data, or just processing this data to get 3D models...

Edit: typo


> In a few years we will have an accurate digital twin of the world, almost indistinguishable from the real world. this would have been impossible or way too expensive without massive automation with AI

Is the rent cheaper there?


By helping city and infrastructure planning it should help get more resource efficient, so hopefully yes?


Yeah, on the applied side there's still a ton of work left to be done to make productionization of machine learning systems easier. We still lack good tooling for data annotation, experiment tracking, model calibration/evaluation and monitoring.


never use the phrase "AI Winter" again. never.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: