Why Machine Learning Struggles With Causality
If there's one thing people know how to do, it's guess what caused something else to happen. Usually, these guesses are good, especially when making a visual observation of something in the physical world. AI continues to wrestle with such inference of causality, and fundamental challenges must be overcome before we can have "intuitive" machine learning.
In a paper titled “Towards Causal Representation Learning,” researchers at the Max Planck Institute for Intelligent Systems, the Montreal Institute for Learning Algorithms, and Google Research discuss the challenges arising from the lack of causal representations in machine-learning models and provide directions for creating artificial intelligence (AI) systems that can learn causal representations.
This is one of several efforts that aim to explore and solve machine learning’s lack of causality, which can be key to overcoming some of the major challenges the field faces today.
Independent and Identically Distributed Data
Why do machine learning models fail at generalizing beyond their narrow domains and training data?
“Machine learning often disregards information that animals use heavily: interventions in the world, domain shifts, temporal structure. By and large, we consider these factors a nuisance and try to engineer them away,” write the authors of the causal representation learning paper. “In accordance with this, the majority of current successes of machine learning boil down to large-scale pattern recognition on suitably collected independent and identically distributed (IID) data.”
IID is a term often used in machine learning. It supposes that random observations in a problem space are not dependent on each other and have a constant probability of occurring. The simplest example of IID is flipping a coin or tossing a die. The result of each new flip or toss is independent of previous ones, and the probability of each outcome remains constant.
When it comes to more complicated areas such as computer vision, machine-learning engineers try to turn the problem into an IID domain by training the model on very large corpora of examples. The assumption is that, with enough examples, the machine-learning model will be able to encode the general distribution of the problem into its parameters. But, in the real world, distributions often change because of factors that cannot be considered and controlled in the training data. For instance, convolutional neural networks trained on millions of images can fail when they see objects under new lighting conditions or from slightly different angles or against new backgrounds.
Efforts to address these problems mostly include training machine-learning models on more examples. But, as the environment grows in complexity, it becomes impossible to cover the entire distribution by adding more training examples. This is especially true in domains where AI agents must interact with the world, such as robotics and self-driving cars. Lack of causal understanding makes it very hard to make predictions and deal with novel situations. This is why you see self-driving cars make weird and dangerous mistakes even after having trained for millions of miles.
“Generalizing well outside the IID setting requires learning not mere statistical associations between variables but an underlying causal model,” the AI researchers write.
Causal models also allow humans to repurpose previously gained knowledge for new domains. For instance, when you learn a real-time strategy game such as Warcraft, you can quickly apply your knowledge to other similar games StarCraft and Age of Empires. Transfer learning in machine-learning algorithms, however, is limited to very superficial uses, such as fine-tuning an image classifier to detect new types of objects. In more complex tasks, such as learning video games, machine-learning models need huge amounts of training (thousands of years’ worth of play) and respond poorly to minor changes in the environment (e.g., playing on a new map or with a slight change to the rules).
“When learning a causal model, one should thus require fewer examples to adapt as most knowledge (i.e., modules) can be reused without further training,” the authors of the causal machine learning paper write.