Introduction to Causality in Machine Learning

栏目: IT技术 · 发布时间: 4年前

Despite the hype around AI, most Machine Learning (ML)-based projects focus on predicting outcomes rather than understanding causality. Indeed, after several AI projects, I realized that ML is great at finding correlations in data, but not causation. In our projects, we try to not fall into the trap of equating correlation with causation.

This issue significantly limits our ability to rely on ML for decision-making. From a business perspective, we need to have tools that can understand the causal relationships between data and create ML solutions that can generalize well.

In this article, I will present the current issues we have as a company already using Machine Learning algorithms and why causality matters from a business perspective.

Issues with Machine Learning

Machine Learning-based solutions suffer from different issues. As you may know, ML algorithms in their current state can be biased, suffer from a relative lack of explainability, and are limited in their ability to generalize the patterns they find in a training data set for multiple applications. It has become important to improve generalization.

Generalization:Model’s ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to create the model. ( 1 )

Moreover, current machine learning approaches tend to overfit the data. Indeed, they try to learn the past perfectly, instead of uncovering the real/causal relationships that will continue to hold over time.

In my industry (healthcare), our models simply support that symptoms occur in the presence of disease, and disease occurs in the presence of symptoms.

As of today, the more successful AI systems are deep learning models that leverage bigger datasets with more examples of different possible situations. It might be tempting to simply rely on more data (big data), but it would be a mistake.

Even though we can observe correlation, it does not prove causation.

Judea Pearl and Dana Mackenzie’s The Book of Why. The New Science of Cause and Effect highlights the main limitations of current machine learning solutions and the causal inference challenge. They note that the hype that big data will solve many of the big challenges we face is misplaced.

Because Deep Learning (DL) has focused too much on correlation without causation, data won’t answer the question when the problem moves away from very narrow situations. Actually, a lot of real-world data is not generated in the same way as the data that we use to train AI models. In other words, Deep learning is good at finding patterns in terms of data, but can’t explain how they’re connected.

Most solutions are unable to generalize past the domain of examples present in a dataset.

For a growing number of business applications, ML’s ability to find correlations is more than enough (ex: price prediction, object classification, better targeting, etc. ). Indeed, ML systems excel in learning connections between input data and output predictions, but lack in reasoning about cause-effect relations or environment changes. ML models that could capture causal relationships will be more generalizable.

Causality:influence by which one event, process or state, a cause, contributes to the production of another event, process or state, an effect, where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. ( 2 )

The ability to uncover the causes and effects of different phenomena in complex systems would help us build better solutions in areas as diverse as health care, justice, and agriculture. Indeed, these areas should not take risks when correlations are mistaken for causation.

Causal inference and use cases

First of all, it is key to better define this term. As humans, we often think in terms of cause and effect — if we understand why something happened, we can change our behavior to improve future outcomes.

In other words, our goal is trying to learn causality from data (what was the cause and what was the effect). As mentioned before, in many use cases, correlation suffices so far . However, causal inference would enable us to go one step further and figure out what would happen if we decide to change some of the underlying assumptions in our model.

Understanding cause and effect would make existing AI systems smarter and more efficient. For instance, “think about a robot that understands that dropping things causes them to break would not need to toss dozens of vases onto the floor to see what happens to them” ( 3 ).

Furthermore, the ability to understand causality would help us create business models as well as new startups specialized in helping companies better understand their data. For instance, we recently started a project trying to identify leads using different sources. We believe that causality would help us identify new leads based on elements we never thought about.

From a business perspective, we are thinking about the following questions/scenarios:

#1:In an e-commerce context, we could determine which specific factor impacts the most the decision to purchase a product. With this information, we could better allocate resources to improve a specific KPI. We could also rank the impact of different factors on the purchasing decision. We could determine if a given customer would have purchased a specific product if he/she had not bought other products for the last two years.

#2:In a broader sense, we could discover how and what negative impacts could have been avoided by a given business strategy? We could also determine by how much should we expect our sales to increase by implementing a specific training program to our business developers. the impact of a specific training program

#3:In the agricultural field, we often try to predict if a farmer’s crop yield will be lower this year. However, using casual inference, it will become to better understand what steps should we take to increase the harvest.

Beyond these potential use cases, the development of more causality in Machine Learning is a necessary step in building more human-like machine intelligence (possibly Artificial General Intelligence).

Current & Future solutions

As of today, some solutions do exist. However, current solutions (ex: Monte Carlo simulations, Markov Chain analysis, Naïve Bayes, Stochastic modeling and a few open source package such as DAGitty ) are not up to our expectations when it comes to business applications.

Meta-learning causal structures

In 2019, Yoshua Bengio and his team posted a research paper outlining an approach. Indeed they seem to be working on a version of deep learning capable of recognizing simple cause-and-effect relationships. They used a dataset that maps causal relationships between real-world phenomena, such as smoking and lung cancer, in terms of probabilities. They also generated synthetic datasets of causal relationships.

In other words “The resulting algorithm essentially forms a hypothesis about which variables are causally related, and then tests how changes to different variables fit the theory”. ( 4 )

Structural Equation Modeling (SEM)

The other worth mentioning approach is called Structural Equation Modeling. Without getting too much into details, “the fundamental math established by Judea Pearl and the rapid evolution of graph models are helping make causality tools available” ( 5 ).

Causal Bayesian Network

This method estimates the relationships between all variables in a data set and can be considered as a true discovery method. It enables the discovery of multiple causal relationships at the same time.

Basically, it results in an intuitive visual map showing which variables influence each other, as well as the extent of their influence. Indeed, Causal graphic models make it possible to simulate many possible interventions simultaneously.

Causal Bayesian networks require a lot of data to capture all possible variables.

The other important element to keep in mind is that Causal AI does not operate within a black box. Researchers can check the model’s reasoning and reduce the risk of biases.

From a business point of view, this approach allows for the incorporation of expert knowledge to counter the possible limitations of a purely data-driven approach. Business experts can help:

Place conditions on the model to improve its accuracy,
Determine which variables should go into the model
Help understand counterintuitive results.

Reinforcement Learning VS Causality

Finally, I wanted to mention Reinforcement Learning (RL). As you may know, Reinforcement learning is a method for learning incrementally using interactions with an environment.

Some leading figures in the AI community believe that RL is inherently causal, in the sense that the agent experiments with different actions and learns about how they affect performance through trial and error. This type of learning is called “model-free” because it can learn effective behaviors without having to learn an explicit model of how the world works.

In reinforcement learning , a model-free algorithm is an algorithm that does not use the transition probability distribution (and the reward function) associated with the Markov decision process, which, in RL, represents the problem to be solved. ( 6 )

However, it is only learning about the causal relationship between actions and performance, rather than how actions affect the world directly. For example, this might involve learning that flipping over a full water bucket above a fire puts it out, without understanding the relationship between water and fire.

As mentioned by George Lawton , “if the agent was given a hose instead of a bucket of water, it would not know what to do with it without learning from scratch, since it did not learn the causal relationship between water and fire”. I believe RL is more about testing a belief to find some optimal point in the search space.