AI/machine learning

Explainability: Cracking Open the Black Box

What is explainability in artificial intelligence, and how can we leverage different techniques to open the black box of AI and peek inside? This practical guide offers a review and critique of the various techniques of interpretability.


Explainable artificial intelligence (XAI) is a subfield of AI that has been gaining ground in the recent past. Humans have always had a dichotomy when faced with the unknown. Some of us deal with it using faith and worship it, like our ancestors who worshipped fire, the skies, etc. And some of us turn to distrust. Likewise, in machine learning, some people who are satisfied by rigorous testing of the model (i.e., the performance of the model) and some want to know why and how a model is doing what it is doing. There is no right or wrong here.

Yann LeCun, Turing Award winner and Facebook’s chief AI scientist, and Cassie Kozyrkov, Google’s chief decision intelligence engineer, are strong proponents of the line of thought that one can infer a model’s reasoning by observing its actions (i.e., predictions in a supervised learning framework). On the other hand, Microsoft Research’s Rich Caruana and a few others have insisted that the models inherently have interpretability and not just derived through the performance of the model.

We can spend years debating the topic, but, for the widespread adoption of AI, explainable AI is essential and is increasingly demanded from the industry.

What is Interpretability?

Interpretability is the degree to which a human can understand the cause of a decision. And, in the AI domain, it means it is the degree to which a person can understand the how and why of an algorithm and its predictions. There are two major ways of looking at this: transparency and post-hoc Interpretation.

Transparency. Transparency addresses how well the model can be understood. This is inherently specific to the model used.

One of the key aspects of such transparency is simulatability. Simulatability denotes the ability of a model of being simulated or thought about strictly by a human. Complexity of the model plays a big part in defining this characteristic. While a simple linear model or a single layer perceptron is simple enough to think about, it becomes increasingly difficult to think about a decision tree with a depth of, say, five. It also becomes harder to think about a model that has a lot of features. Therefore, it follows that a sparse linear model (regularized linear model) is more interpretable than a dense one.

Decomposability is another major tenet of transparency. It stands for the ability to explain each of the parts of a model (input, parameter, and calculation). It requires everything from input (no complex features) to output to be explained without the need for another tool.

The third tenet of transparency is algorithmic transparency. This deals with the inherent simplicity of the algorithm. It deals with the ability of a human to understand fully the process an algorithm takes to convert inputs to an output.

Post-Hoc Interpretation. Post-hoc interpretation is useful when the model itself is not transparent. So, in the absence of clarity on how the model is working, we resort to explaining the model and its predictions using a multitude of ways.

  • Visual Explanations—These sets of methods try to visualize the model behavior to try to explain them. The majority of the techniques that fall in this category uses techniques such as dimensionality reduction to visualize the model in a human-understandable format.
  • Feature Relevance Explanations—These sets of methods try to expose the inner workings of a model by computing feature relevance or importance. These are thought of as an indirect way of explaining a model.
  • Explanations by Simplification—These sets of methods try to train the whole new system based on the original model to provide explanations.

Read the full story here.