This tutorial originally started as an email that I wrote to one of my friends in November 2000, and is intended for people who are not working in AI, but experts in AI or related areas might also find the philosophical aspects of this article entertaining :-).
Machine Learning is a class of AI that is motivated by the most obvious definition of intelligence - humans, primates, octopus, squids, and some other animals are not born with only pre-programmed, simplistic behavioral and adaptation code common among simpler animals and plants for dealing with the real world, but can learn an almost infinite variety of very complex set of skills and tasks needed to survive in their current environment.
Similarly, machine learning based programs learn by themselves how to solve a problem; they are not told by the human programmers the truth but just how to find it. Heuristics is used in traditional programming and is not machine learning. By this definition, anything that can learn and adapt can be considered to be intelligent. As time goes on, the system can get wiser about it's world, assuming that the world it interacts with does not change too rapidly and too drastically over time. But how complex a phenomena a single intelligent system can model depends on the maximum capacity of such an intelligent system. A group of such interconnected systems working in harmony and with high bandwidth connectivity can often lead to a dramatically higher capacity. The maximum "potential" intelligence of such a learning system could then be defined in terms of the most complex phenomena that it can model, with the "higher forms" of intelligence having higher "capacity" or "potential" as defined above.
The modern human society forms such a massively interconnected "system of learning systems", and has a higher modeling capacity than the smartest single individual in our society. The phrases "tail economics" and mass collaboration are increasingly being noticed by businesses and intellectuals, and may be the first signs of the dawn of a Gaia with a central nervous system, where the interconnectivity of social, political and economic groups is increasingly defining, driving and directing the changes in the human society and perhaps even the long term survival of the planet. There are signs now that the Gaia Earth as a whole may even be developing primitive intelligence and/or sentience, and is now aware of its mortality. Such a sentient Gia may even be contemplating (or at least some of its neurons are) long-term active survival strategies such as moving away as the Sun gets hotter. At this point in human (individual and societal) evolution however, it is unclear if Gaia will remember to move every 6000 years or so as the Sun gets hotter; that may depend upon whether the primitive brain Gaia is developing (represented by an increasingly integrated human society) will survive and evolve into something stable, or will decay (as in the human society fails to integrate for the greater good, implodes upon itself and eventually decays into anarchy).
However in an optimistic scenario where our species does evolve as a society and gets beyond its petty wars, racism, politics and needless environmental destruction, such a growing group with an ever-increasing network connectivity bandwidth as an entity (in animals for example, higher connectivity and bandwidth always results in a smarter brain), intra-entity bandwidth (cybernetics, faster electronic networks, better human-human and human-machine interfaces, Google), and processing capacity (more computing power, better learning algorithms), such a system can potentially reach very high intelligence levels, that can only be limited by the speed of light (assuming that it is a barrier that cannot be broken) and the storage capacity of this universe. It is important to note here that our society as an intelligent being with real-time communication at a global scale has only been connected for a few decades, although weak, bacterial colonies type of communication has globally connected us for a few millenia. It is hard to fathom at this point what will happen to this mass intelligent system in a few millenia from now, if we do not destroy ourselves, and mature as a single intelligent Gaia, even at a reasonable pace of technology (e.g. computers, medicine, space travel and colonization) and consequently social (direct universal democracy, a scientifically literate world, elimination of many evils: genetic defects, major diseases, depression, mental disorders, ability to look the way you want to look thanks to advancements in genetics, guaranteed essentials: food, education, medicine, clothing and housing for everyone and a meritrocacy elsewhere, no national borders, no environmental pressures...) advancement. A realistic description of the robustness and intelligence level of any advanced galactic civilization that is billions of years old is probably beyond any human brain reading this article, and I do not dare to attempt that myself (to force you into thinking more on what I mean by that, I will give one unimaginable example of characteristics of such a super-intelligent Gaia though; some neurons in the brain survive for the full biological lifespan of the individual, although millions die everyday- if a mature Gaia ever develops, clearly the life-span of individuals will be limited by accidents and rare diseases and not by aging; leading to some surviving for billions of years- I dare say that some lucky ones walking among us today might even get to see the end of our Sun, although they may not remain very "human" in 5 billion years from now. All you need to do to survive to eternity is to survive until the next technology comes along to extend your life long enough for the next breakthrough- and at some point the critical breakthrough will reverse all the biological damage in your body, make you look as young or whatever you want to look like, and get you integrated into Gaia if you want- no you will not be a Borg element, but a voluntary contributor to the Gaia society.).
The ultimate intelligence in a "God does not play dice" type Einsteinian universe would be a system that learns and knows everything that currently exists in our Universe, and can exist in our Universe, to the last minute detail. Such a system should model and predict all events that can happen in any arbitrary time in the future with infinite precision, given the current state of the Universe. Such a system will be indistinguishable from the Universe itself and it can simulate the present and the future of the Universe perfectly. From an information processing point of view, it will be identical to the Universe itself. Physicists are beginning to model our Universe by combining information theory with quantum mechanics into ideas such as a multiverse and the holographic principle. A direct mapping to perceived reality is not important in these newer models to predict reality. From the point of view of all the almost infinite possible multiverse entities and worlds cloned by such a system, the system is the Universe - a phenomena that can remind you of the movie Matrix (minus the malevolent robots). Perhaps such a level of intelligence is impossible to create inside our Universe, since it is a infinitesimal of the much larger Multiverse, but barring any major decline in the relentless, ever-increasing growth of the human civilization's capacity to work together over the past 10,000 years, we are moving in that direction, albeit we may always remain infinitely far from becoming the Universe itself!
Coming back to Earth, the above discussion is important not only from a philosophical perspective but also from an information theory perspective. Thus, any intelligence tries to build an approximate model of the reality that it is trying to predict. Another way of stating this fact is that we are trying to simultaneously optimize for two conflicting goals: (1) compress the possibilities into a simple model that can fit in it's "head" and be learnable with a small number of experiences (examples) while (2) maximizing accuracy, i.e. minimizing loss of information or "errors" on past and future situations that cannot be predicted correctly by the compressed model. This problem is closely related to function approximation: interpolation and extrapolation in numerical analysis, although many high-school numerical techniques are only the starting point for more complex algorithms for modeling many complex, real-life phenomena. In such real-life problems the only way to find the best model: exhaustive search, is often too slow. Most of the techniques in Machine Learning, and the natural processes in the human brain, and evolution itself, try to find practical (loaded word: approximate) solutions (another loaded word: models) that are simultaneously good enough and fast enough but do not guarantee finding the best (yet another:optimal) solution.
Traditional machine learning comes in two flavors: supervised and unsupervised (there are also other special forms such as semi-supervised, active-learning, reinforcement-learning but they can mostly be categorized into one of the two, or a mix of the two, irrespective of what they are actually called by their inventors. In some academic communities, such as in statistics, some researchers might even get offended if you claim their models are AI models, but that does not change inherently the fact that their algorithms are typically used for solving the same problem: building approximate models for an isolated piece of reality- a small subset of this Universe, using raw historical data or using partially digested data preprocessed by an expert in that domain. In the most general of all machine learning algorithms, often referred to as "non-parametric" models, the programmer does not tell the machine the model or the set of rules for a given task (model), but just how to learn the task. In supervised learning the programmer not only tells the program how to learn, but also what to learn. In all cases of machine learning, although the degree of human supervision can vary, it is never complete. If it is, then it becomes classical programming of heuristics and then there is no learning involved in it, and it is a simple copying of what the human expert knows. In such a situation even though the program can "act" infinitely smart and "random" (unpredictable), it has zero adaptability, and has an "intelligence", if you can call it so, inherently inferior than that of even an algae, since it cannot evolve into something more intelligent. In my opinion, the term "intelligence" should not be used for such systems but rather a term such as an "intelligence archive" might be more suitable. Many Knowledgebases and symbolic reasoning systems fall into this category.
Here is a a simple example of supervised learning from data: Telling a program to learn how to identify an apple from an orange based on three numerical features that you measure and give it - such as the color of the fruit, a number describing deviation from a sphere. Basically, you are telling the learning program three things -
Imagine the colors are Red, Green Blue, and you can measure the brightness values of each of these 3 colors over all the pixels in the given image. Imagine also that you have a measure between 0 and 1 extracted from the same image that tells you how spherical the fruit was. You give the program some examples of apples and some examples of oranges, with feature values for color and "sphericalness". The example data should contain sufficient examples with all major variations of Apples and oranges. Lets look at some dummy data below where -
As you can see this is a 4-dimensional feature space (features are R,G,B and S) and there is a clear separation. When you plot these points in 4-D you will see a clear separation between oranges and apples. Oranges tend to be more spherical and less blue part of spectrum light in it. Apples tend to be more reddish and more deformed.
By looking at the sample, any program that will make the best hypothetical cut in that space separating oranges from apples in the examples. This cut will be a 3-d plane in the 4-d space causing all the oranges to lie on one side and apples on the other. When you get a new fruit and you know its an apple or an orange (since that was the problem given to you), you just check the position of this fruit in this 4-d space. If it lies on th orange side of the hyperplane its an orange else an apple right. Your model is ready to automatically sort apples and oranges!
This is a simple example of how a kind of neural-network called MLP works. Such a network would have solved the above problem with one neuron, since we needed only one cut. Decision Trees work in a similar manner.
Sometimes you would need multiple cuts to isolate the classes. Such problems need more than one neurons. Also non-linear techniques like Radial Basis Function Neural Networks, don't make hyperplane cuts but curved cuts. They are more complex and slower but more powerful. There at least a 100 other techniques but all try to achieve the same goal.
Unsupervised Learning is another major area in machine learning. In this kind of learning, the programmer will give the program the training samples with all the features, without telling it what are their labels, i.e. what features he wants to predict, given some others are missing. The simplest form of unsupervised learning involves partitioning of the given data based on the specified set of features. Such a program is simply trying to find distinct dense regions in the feature space (More sophisticated learners are possible that would learn conditional probabilities for all combinations of features, but all available computing hardware is grossly ill-suited for such problems A mammalian brain for example, is excellent at such a task.). For example you can give a mixture of apples and oranges to the program without telling it that you "think" there are two distinct fruits in it. The program will then try to figure out distinct "clusters" of the fruits in the feature space and will tell you that there are two distinct fruits, but there are two kinds of fruit A (green and red apples) and say 3 variations of oranges. Note that we still use the same data. In one way, this is similar (humans get supervision too in many forms - both reinforcement based and direct) to how the humans learn to make sense of the universe. We also create labels in our mind for things which look similar. This is probably related to the phrase stereotyping in psychology - but this is a way of approximating reality so that it can be stored in our brain. Also the approximation lets you predict (by interpolation) values that you cannot observe at a given instance. Prediction helps in survival of a species, and also helps in deduction among the smarter ones. Most parametric math and physics invented by humans is also a form of stereotyping of physical concepts to enable deduction to work on it.
Google on following topics for more info -
Perhaps this list will get shorter when our computers become more suitable for machine-learning. Right now here is the state of machine-learning: Human experts (PhDs in AI, statistics, machine learning, bioinformatics, information theory, optimization, OR and so on..) in conjunction with domain experts pre-digest the data manually (by using preprocessing to extract the right features, removing outliers, selecting the right learning methods/cost functions) and then vomit (like some birds and wild dogs) the half-digested data to a relatively simple learning algorithm. Sometimes there will be a series of such steps and manual validations, but eventually a workable system is engineered and deployed that solves typically, one specific problem, well enough, and much better than what could have been done, at a reasonable cost, without the automated learning.