I used to do electromagnetic modeling using finite element methods (though now a...

YeGoblynQueenne · on Jan 18, 2020

Were would the speedup come from? I don't understand.

If I understand your comment correctly, essentially you have a hand-crafted simulator for some physical process and then you train a neural net model to approximate the simulator. Why would the approximated simulator have "an order or more of magnitude increase in simulation speed"? Unless the approximation has massive losses in accuracy, of course.

Honestly asking and really interested to know what you mean.

allovernow · on Jan 18, 2020

It's all about precision heuristics, derived from joint probabilities of inputs and outputs. That, by and large, is how I am increasingly coming to understand the power of neural networks.

Imagine you are given a picture of a candle, overlaid with a grid, and asked to fill in, with colored pencils, colors for the air surrounding the candle representing relative temperature. Of course a human utilizes intuition to rapidly assign high temperature to the flame and decreasing temperature with increasing distance.

A "dumb" finite method would need, even for such a relatively simple problem (for a human), to perform calculations for a series of time steps in each grid until some steady state condition to arrive at a much more precise but still overall similar coloring of the grid cells. You can do the same task much more quickly because you have developed intuition of the physics, which is to say you have learned heuristics which capture the general trends of the problem (air is hot close to a flame and cold far away).

Neural nets take the best of both worlds - by effectively learning probability relationships between input and output pixels, they internalize heuristic approaches to produce outputs approaching finite method accuracies at a fraction of the computation. There's a lot of waste that can be optimized out of finite computation by hardcoding rules (heuristics), but doing so for real problems is impractical. Neural nets learn these rules through training - a far simpler task is organizing the data to teach the net the right trends; much like designing lessons for a child to teach a predictive ability.

YeGoblynQueenne · on Jan 18, 2020

I'm skeptical of the claim that it's easier to train a neural net than to hand-code a set of heuristics _when the heuristics are already known_. For the time being, optimal results with neural nets need more data and more computing power ("more" because it's never enough) and are primarily useful when a hand-coded solution is not possible.

I also don't understand how it is possible for a neural net (or any approximator, really) to approximate a "precision heuristic" faster than a hand-coded heuristic and without a gross loss of well, precision in the order that would make the results unusable for engineering or scientific tasks. Could you elaborate?

richk449 · on Jan 18, 2020

I’m also skeptical, but after reading the explanation above, I am intrigued.

Say I have a cube with 100 x 100 x 100 mesh cells inside, and ports on opposing faces. Given enough time, I can literally run through every possible combination of PEC and air for every cell and solve the FD form of maxwells equations, then save the results. Now, a user can ask my solver for any of those cases, and I simply pull the presolved result, and give the user the answer with orders of magnitude reduction in time.

Obviously, the presolving approach doesn’t scale. More materials, more mesh cells, eventually it is impractical to presolve every case. But the beauty of neural networks is that they can be very good at generalizing from a partial sample of the problem space. In effect, they can give results close enough to the presolve solution with drastically reduced numbers of computations.

YeGoblynQueenne · on Jan 18, 2020

>> But the beauty of neural networks is that they can be very good at generalizing from a partial sample of the problem space.

That is really not the case. Neural nets generalise very poorly, hence the need for ever larger amounts of data: to overcome their lack of generalisation by attempting to cover as many "cases" as possible.

Edit: when this subject comes up I cite the following article, by François Chollet, maintainer of Keras:

The limitations of deep learning

https://blog.keras.io/the-limitations-of-deep-learning.html

I quote from the article:

This stands in sharp contrast with what deep nets do, which I would call "local generalization": the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time. Consider, for instance, the problem of learning the appropriate launch parameters to get a rocket to land on the moon. If you were to use a deep net for this task, whether training using supervised learning or reinforcement learning, you would need to feed it with thousands or even millions of launch trials, i.e. you would need to expose it to a dense sampling of the input space, in order to learn a reliable mapping from input space to output space.

allovernow · on Jan 18, 2020

Well...I think that take is a little overly cynical, and I disagree particularly with this:

>the mapping from inputs to outputs performed by deep nets quickly stops making sense if new inputs differ even slightly from what they saw at training time

In my experience that isn't really true, if you have an appropriately designed net, training data which appropriately samples the problem space, and the net is not overtrained (overfit).

You can think of training data as representing points in high dimensional space. Like any interpolation problem, if you sample the space with the right density, you can get accurate interpolation results - and neural nets have another huge advantage, in that they learn highly nonlinear interpolation in these high d spaces. So the net may be unlikely to generalize to points outside of the sampled space - although now that I think of it I'm not sure of how nets handle extrapolation - but when you're dealing with space with thousands of dimensions (like each pixel in an image) you can still derive a ton of utility from the interpolation which effectively replaces hardcoded rules about the problem you're solving.

YeGoblynQueenne · on Jan 18, 2020

I may be jumping the gun a little because I was thinking about this in the context of another thread, but a practical problem with machine learning in general is that, for a learned model to generalise well to unseen data, the training dataset (all the data that you have available, regardless of how you partition it to training, testing and validation) must be drawn from the same distribution as the "real world" data.

The actual problem is that this is very difficult, if not impossible, to know before training begins. Most of the time, the best that can be achieved is to train a model on whatever data you have and then painstakingly test it at length and at some cost, on the real-world inputs the trained model has to operate on.

Basically, it's very hard to know your sampling error.

Regarding interpolation and dense sampling etc, the larger the dimensionality of the problem the harder it gets to ensure your data is "dense", let alone that it covers an adequate region of the instance space. For example, the pixels in one image are a tiny, tiny subset of all pixels in all possible images- which is what you really want to represent. Come to that, the pixels in many hundred thousands of images are still a tiny, tiny subset of all pixels in all possible images. I find Chollet's criticism not cynical, but pragmatic and very useful. It's important to understand the limitations of whatever tool you're using.

>> although now that I think of it I'm not sure of how nets handle extrapolation

They don't. It's the gradient optimisation. Gets stuck to local minima, always has, always will. Maybe a new training method will come along at some point. Until then don't expect exrapolation.

richk449 · on Jan 18, 2020

It doesn’t need to generalize, just do sophisticated interpolation.

Basing the results on a dense sampling of the input space is exactly what I was suggesting.

YeGoblynQueenne · on Jan 18, 2020

Apologies for the misunderstanding. You said "generalizing from a partial sample of the problem space" and I thought you meant generalisation to unseen data from few examples, which is generally what we would all like to get from machine learnig models (but don't).

But, if a neural net can't _extrapolate_ to unseen instances, I don't see how it can solve problems like the one you describe with any useful precision, again unless it's trained with gigantic amounts of examples (which you say is not required). And how is this reducing computational costs with respect to hand-coded solvers?

richk449 · on Jan 18, 2020

To be clear - I have absolutely no experience in this domain. I'm just speculating.

In the example I gave, everyone agrees that if you had long enough and enough processing power, you could solve every possible configuration, and store the results. Then you could instantaneously "solve" any problem.

Unfortunately, the problem I describe is a toy problem (too simple to be useful), and yet it would still take way way too long to solve all the possible configurations.

What if you solved some tiny fraction of the configurations though? That would be a sampling of the configuration space. Then a neural network could use that sampling to interpolate to the cases not solved. That would provide a significant speedup over actually solving the problem.

So the real question is what density you need to pre-solve the configuration space to make it work? It definitely depends on what accuracy you need in the solution, as well as how good you can do with the interpolation. If I said previously that gigantic numbers of examples are not needed, then I misspoke. I am sure they would be needed. Gigantic is vague though - is it the kind of number that can be rented from AWS, or is it the kind of number that would require civilization resources?

I have no idea if the math actually works out to make it a useful approach. All I am saying is that conceptually I can see that in some cases, it could be possible.

YeGoblynQueenne · on Jan 19, 2020

>> So the real question is what density you need to pre-solve the configuration space to make it work?

Yes, that's the main question. I don't know the answer of course but if we're talking about an engineering problem where precision is required, intuitively the more the merrier.

The thing is, with neural nets you can do lots of things in principle and many things "in the lab". When you try to take them in the real world is the tricky bit. Anyway, another poster here is saying we'll see big things in the next five years so let's hold on to our hats for now.

allovernow · on Jan 18, 2020

Well, there are pretty convincing examples in other domains: try hardcoding rules to classify animals or objects in photos, especially an algorithm which can handle thousands of different categories. Totally impractical - but if we appropriately design the net and structure the training data, you can train a pretty accurate net on a mid-range GPU in a matter of hours to do what would take far, far longer to hardcode!

Perhaps not quite appropriate to call them heuristics in this context, but the principle is the same - you are leveraging joint probabilities of pixels to generate some conditional output. Similar principle in ML accelerated modeling.

YeGoblynQueenne · on Jan 19, 2020

I think I understand what you meant by heuristics. I agree that it's impractical to try and hand-code image recognition rules and all attempts to do that in the past have failed as they have in similarly complex domains (like machine translation, say). My concern is particularly about the use of neural networks (or in general machine learning models that learn to approximate a function) in domains where precision is normally required, like engineering. I mean, I know there's plenty of approximation in engineering already but of course we're not talking about computing integrals here (er, I think?).

Anyway I was especially trying to understand the OP's comment about speedup using a neural network. I'm still a bit confused about that. But thanks for the conversation.

allovernow · on Jan 18, 2020

You're on the right track. A lot of this tech is a potential goldmine and I'm sure there are many players developing in secret and not publishing yet (or ever).