Machine learning is the study of methods that allow computer programs to improve their performance on tasks by using experience instead of relying only on explicit, hand-written rules (see Arthur Samuel, 1959). It is generally considered a subfield of computer science that grew out of work on artificial intelligence.
What it does
At its core, machine learning develops and applies algorithms that can learn patterns from examples and use those patterns to make predictions or decisions about new inputs. These methods take collections of data and produce a model that captures relationships observed in the samples. Although such systems follow some programmed instructions, much of their behavior is determined by the model inferred from the data rather than by explicit rule lists.
Where it helps
Machine learning is especially valuable when writing an explicit solution is difficult or impossible. Common applications include:
- Spam filtering for email and messaging;
- detecting network intrusions or malicious insiders attempting data breaches;
- optical character recognition (OCR) to convert images of text into digital text;
- improving the relevance and ranking used by search engines;
- interpreting images and video in computer vision systems.
How models are obtained
Different approaches exist depending on the available data and the problem. Some algorithms are trained with labeled examples to learn direct input–output mappings, while others seek structure in unlabeled data or learn to make decisions from sequential feedback. The output is typically a model that can be evaluated, tuned, and updated as more data become available.
Limitations and risks
Using machine-learned models carries potential downsides. Many high-performing models behave as a black box, making their internal reasoning hard to interpret. Models can also reproduce or amplify biases present in training data, with real-world consequences in areas such as hiring, criminal justice, and face recognition. For these reasons, practitioners emphasize careful dataset selection, testing for fairness and robustness, and techniques that improve explainability and accountability.