What Is Machine Learning and How Does It Work?

Machine learning is the process of using computers to detect patterns in massive datasets and then predict what the computer learns from those patterns. As a result, machine learning is a specific and limited type of artificial intelligence. Full artificial intelligence entails machines that can perform tasks associated with human and intelligent animal minds, such as perception, learning, and problem-solving.

Algorithms underpin all machine learning. Algorithms, in general, are sets of specific instructions that a computer uses to solve problems. Algorithms in machine learning are rules for analyzing data statistically. These rules are used by machine learning systems to identify relationships between data inputs and desired outputs–typically predictions. To begin, scientists provide machine learning systems with a set of training data. The systems use this data to train their algorithms on how to analyze similar inputs they will receive in the future.

One area where machine learning holds great promise is cancer detection in computer tomography (CT) imaging. To begin, researchers collect as many CT images as possible for use as training data. Some of these images depict cancerous tissue, while others depict healthy tissue. Researchers also collect data on what to look for in an image to detect cancer. This could include the appearance of cancerous tumor boundaries, for example. Following that, they develop rules based on the relationship between the data in the images and what doctors know about detecting cancer. The rules and training data are then fed into the machine learning system. The rules and training data are used by the system to teach itself how to recognize cancerous tissue. Finally, the system receives new CT images from a new patient. The system uses what it has learned to determine which images show signs of cancer faster than any human could. Doctors may be able to use the system’s predictions to help them decide whether a patient has cancer and how to treat it.

Machine learning systems are classified into two types based on how training data is organized: supervised and unsupervised. The system is supervised if the training data is labeled. Labeled data informs the system about the nature of the data. CT images, for example, could be labeled to indicate cancerous lesions or tumors next to healthy tissues. Essentially, this means that the machine learning system learns by doing. For the large amounts of data required for training datasets, labeling data can be time-consuming.

The machine learning system is unsupervised if the training data is not labeled. In the case of cancer scans, an unsupervised machine learning system would be given a large number of CT scans and tumor type information then left to teach itself what to look for in order to detect cancer. This eliminates the need for humans to label the data used in the training process. Unsupervised learning has the disadvantage of producing less accurate results due to the lack of explicit labels.

Based on the feedback received on the predictions, some machine learning systems can improve their abilities. Reinforcement machine learning systems are what they’re called. For example, the system could be informed of the results of other tests performed by doctors to determine whether or not patients have cancer. The system’s algorithms could then be tweaked in the future to produce more accurate predictions.

Quick Facts

Summit, the newest DOE supercomputer at Oak Ridge National Laboratory, has an architecture that is particularly well-suited for artificial intelligence applications.
Machine learning enables scientists to analyze previously inaccessible amounts of data.
Machine learning has been used by DOE-funded researchers to develop new cancer screening methods, better understand the properties of water, and autonomously steer experiments.
To solve supervised learning tasks and scientific problems, physics-informed machine learning employs deep neural networks that can be trained to incorporate specific physics laws.
Machine learning algorithms are not a panacea. Machine learning system development is prone to human error and bias and necessitates the same level of care as software engineering.

DOE Office of Science: Machine Learning Contributions

Through its Advanced Scientific Computing Research (ASCR) program, the Department of Energy Office of Science funds machine learning research. ASCR’s portfolio includes data management, data analysis, computer technology, and related research, all of which contribute to machine learning and artificial intelligence. DOE owns some of the world’s most powerful supercomputers as part of this portfolio.

The DOE Office of Science is committed to using machine learning to support scientific research. Big data is essential to science, and Office of Science user facilities like particle accelerators and X-ray light sources generate mountains of it. Researchers are using machine learning to identify patterns or designs in data from these facilities that humans would find difficult or impossible to detect, at speeds hundreds to thousands of times faster than traditional data analysis techniques.