Explainable Machine Learning

Today’s machine learning algorithms, and in particular neural networks, mostly act as blackboxes: They make very good predictions, but we don’t really understand why. This is problematic for various reasons: Why should users (e.g. physicians) trust these algorithms? Will blackbox methods contribute to the advancement of science, when they produce numbers, not insight? How can one legally challenge an objectionable machine decision? Explainable machine learning attempts to solve these problems by opening the blackbox. The material of our seminar “Explainable Machine Learning” gives a good overview about the state-of-the-art in the field.

We will address these challenges in various ways:

Representation learning: Neural networks transform the raw data into some internal representation (i.e. into activations of interior neurons). We want to design training objectives and algorithms that structure these internal representations in a meaningful way, i.e. into features whose meaning can be understood by a human. High activation of certain features and low activation of others can thus serve as an explanation of what the networks thinks is happening.
Uncertainty analysis: Networks shall be equipped with self-diagnosis capabilities that clearly identify uncertain decisions and point out alternative outcomes that are almost equally plausible.
Explanation by similarity: Networks shall learn to point out examples that are similar to the present case, reproducing a human-like notion of similarity. Similarity should present a balanced view to the context: it should not only include instances supporting the present decision, but also ones which are easily confused with the current situation but are actually different.

Applications include medicine and natural sciences where the extraction of diverse solutions for ill-posed inverse problems is very important. Moreover, explanations derived from learned solutions may give domain experts new clues about promising directions for future explicit analysis. Very interesting initial results in this direction can be found in our paper “Analyzing Inverse Problems with Invertible Neural Networks”.

We follow a theoretical approach to explaining neural networks in our project “Transport theory for Invertible Neural Networks (TRINN)”.