Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How was early Machine Learning different from statistics?

New names makes things exciting for people to oick up. Who wants to estimate multinomial regression when you can learn a shallow softmax activated neural network!

Its all about creating hype.



>> How was early Machine Learning different from statistics?

Some of the very early work in machine learning, in the 1950's and '60s was not statistical. The first "artificial neuron", the Pitts & McCulloch neuron, from 1938 was a propositional logic circuit. Arthur Samuel's 1952 checkers-playing programs used a classical minimax search with alpha-beta pruning.

Machine learning in the '70s and '80s was for the most part not statistical, but logic-based, in keeping with the then-current trend for logic-based AI. Early algorithms did not use gradient descent or other statistical methods and the models they learned were sets of logic rules, and not the parameters of continuous functions.

For instance, a lot of work from that time focused on learning decision lists and decision trees, the latter of which are best remembered today. The focus on rules probably followed from the realisation of the problems with knowledge acquisition for expert systems, that were the first big success of AI.

You can find examples of machine learning research from those times in the work of researchers like Ryszard Michalski, Ross Quinlan (know for C4.5 and IDR and the first-order inductive learner FOIL), (the) Stuart Russel, Tom Mitchell, and others.


   How was early Machine Learning 
   different from statistics?
I'd argue: in two ways.

First: ML's algorithmic focus. Just about anything in modern AI/ML works because it uses compute at extreme scale. For example neural nets seem to work well only when trained with huge amounts of data. Statisticians lacked the background to make this happen.

Second: most work in statistics assumed that data was generated by given stochastic data model. In contrast, ML has been using algorithmic models and the data given by an unknown mechanism. In most real-world situations, the mechanism is unknown.

It's not just hype. Statistics was stuck in a local optimum, and it was ML's focus on algorithms, data structures, GPUs/TPUs, big data, ... together with the jump into 'weird' data (e.g. the proverbial cat photos), that propelled ML ahead of statistics.


> Statistical Modeling: The Two Cultures (2001), Breiman

> There are two cultures in the use of statistical modeling to reach conclusions from data. One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown. The statistical community has been committed to the almost exclusive use of data models. This commitment has led to irrelevant theory, questionable conclusions, and has kept statisticians from working on a large range of interesting current problems. Algorithmic modeling, both in theory and practice, has developed rapidly in fields outside statistics. It can be used both on large complex data sets and as a more accurate and informative alternative to data modeling on smaller data sets. If our goal as a field is to use data to solve problems, then we need to move away from exclusive dependence on data models and adopt a more diverse set of tools.

http://www2.math.uu.se/~thulin/mm/breiman.pdf


There are completely non statistical learning algorithms (many at that) which is part of why the distinction is needed. Stats are definitely crucial in parts of the ML domain but not everywhere. Another related reason is just one of focus, where ML doesn't care how to get to a result, it only concerns itself with getting the result. Stats can be a tool to get there along with many other things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: