A Bayesian approach to understanding data breaches
A new report from Javelin Research, Data Breach Notifications: Victims Face Four Times Higher Risk of Fraud, has some interesting information in it. In particular, it says that 11 percent of American consumers received a data breach notification letter in the past year. Of the people who received these letters, 19.5 percent claimed to have been victims of identity theft in the same period, while only 4.3 percent of those who didn't receive such a letter were victims.
Any time I see data like this, the first thing I do is try to apply Bayes' theorem to it. Here's what I get when I do that.
Let T represent the event that a person suffers identity theft and N be the event that a person receives a breach notification letter. If my interpretation of Javelin's data is correct, this means that we're given that
P(T|N) = 0.195
P(T|not N) = 0.43
P(N) = 0.11
and
P(not N) = 0.89
From the data that the Department of Justice has, it looks like
P(T) = 0.0275
Using Bayes' theorem we have that
P(not N|T) = P(T|not N) P(not N) / P(T)
= (0.43) (0.89) / 0.0275
= 0.22
In other words, if you're a victim of identity theft, there's probably about a 22 percent chance that you won't have received a breach notification letter. On the other hand, that also means that there's about a 78 percent chance that you will have received a breach notification letter. I'll bet that number's higher than it was a few years ago.





Comments