Building a good model is hard
Explaining how a good model work is even harder
Exhibit A: The good model
What if you could
poke around and find out
how this model works?
Introducing
XAI can help you look at
How the model reacts to different features overall using Global Interpretability Methods
How the model gives a prediction to one single instance using Local Interpretability Methods
There are several key local interpretability methods that are related to each other in how they approach the problem
For a given data point
Formally
a rule or a set of predicates that satisfy the given instance and is a sufficient condition for \(f(x)\) (i.e. the model output) with high probability
Simply put
We are going to find a big enough boundary box in the feature space containing other points that would have the same model prediction as the anchoring point.
Predicates are simple logical statements. In this context a predicate is made up of
A feature
Age
A logical operator
>
A constant value
42
Simply put
A predicate is a boundary line that divides a feature into two subsets.
This is achieved by formulating the problem as a Multi Armed Bandit problem to purely explore the feature space
Multi-arm bandit problems are like playing different slot machines. You want to design a strategy to get the most rewards by choosing the best machines, even when you don’t know how likely they are to pay out.
Simply put
Imagine you are the local point and you are trying to make a wall of similar friends like yourself by changing the walls of the room.
Your options are to either change the north, east, west, or south walls to put all of your friends inside the walls.
When you have lots of space you get rewarded and when you find like minded friends you get rewarded as well.
Instance 1 (Purple)
Instance 2 (Yellow)
Tip
In a practical setting we can redo this for multiple instances and get an understanding of the model’s decision boundary and get an approximate understanding of the high and low feature values that affect a model prediction.
The Australian bushfires, notably during the 2019/2020 season, was devastating to animals and humans alike.
Objective : Build a model that can predict the cause of a possible bushfire and use explainable AI to uncover the decision process of the model
The problem is,
Given the location, the date, the weather and human activity data, predict the most likely cause for a bushfire.
As a baseline model, A Random forest model was fitted on a training set from 2000 to 2020 while the testing set contained data from 2021 to 2022 with an F1 score on the testing dataset of 0.83.
The easiest to explain is Anchors , which gives explanations that are high dimensional.
Challenges
Possibilities
It’s hard to see a pattern in this, as the data is noisy!
The discovered anchor of 4 dimensions in a two dimensional space.
Purple dots refer to the Chinstrap species while the golden dots refer to the Adelie species.
Can you notice a clear separation and a majority of points being gold?
Have any suggestions or ideas?
janithwanni.quarto.pub/seeing-the-smoke-before-the-fire