Breach

Friday, 03 February 2012

Why data breaches have a lognormal distribution

I had an interesting discussion this morning about data breaches, and in this discussion the following following idea about the distribution of the size of data breaches came up.

It certainly looks like the size of data breaches follows a lognormal distribution, so that the number of records exposed in breaches doesn't follow a normal distribution, but the logarithm of the number of records exposed does.

Why should we expect this to be true?

One approach to understanding this gets fairly arcane. You might look at axioms for a reasonable metric for either security or vulnerability and then look at a maximum entropy distribution that fits the constraints that that suggests.

But there's probably a simpler approach.

The size of organizations seems to also follow a lognormal distribution. Let's suppose that the amount of sensitive data that an organization has is roughly proportional to the size of the organization. If data breaches are becoming a virtual certainty, we'd then expect to see that lognormal distribution of breach sizes from that alone, wouldn't we?

Thursday, 19 January 2012

President's Challenge hacked

It looks like the President's Challenge web site has been hacked and users' data stolen. Here's what the email to users of the site said:

We are writing to inform you about a security issue involving the President’s Challenge website [www.presidentschallenge.org]. 

Hackers recently accessed our database, which included personal information such as your username, password, security question and answer, email address, date of birth, city and state, and, if you provided it, your name. The hackers were also able to access data such as your logged activities, your nutrition goals, what groups you are in, and messages you had sent and received within the online tracker. 

After we learned about the attack, we quickly took down the President’s Challenge website on January 11 and began the process of determining what information the hackers accessed and how it may affect you. We also contacted law enforcement to alert them to the hackers’ illegal activity.

Please note that we do not keep credit card numbers or Social Security numbers for users of our online tracker and shop. Regardless, we are alerting you so you can change your login information on any website where you might have used the same or similar username and/or password, and so you can generally monitor your personal and financial information.

We are in the process of securing the President’s Challenge website, and we expect to bring it back online within the next few days. Before you log in, you will be prompted to reset your password. You will then be able to log your activities and, for PALA+ users, your nutrition goals for the past three weeks. All of your previously logged activities and nutrition goals are still stored in the database.

We are sincerely sorry for this situation and any inconvenience or concern it causes you. We take your privacy very seriously. Before the attack, our website was routinely reviewed for security flaws. We are currently reviewing our security practices to make them even stronger and to reduce the probability of a future breach.

I haven't heard how many users were affected by this breach. The President's Challenge is somewhat popular with Boy Scouts, who can get some sort of recognition for completing it, so there may actually be lots of people affected by this breach, including lots of children.

Friday, 06 January 2012

Not those sort of logs

As I've mentioned before, the number of records exposed in data breaches seems to follow a lognormal distribution, so that the size of breaches doesn't follow a normal distrubution, but the logarithm of their sizes does. This has led to more than one conversation that went roughly like this.

"The number of records exposed in data breaches follows a lognormal distribution. That means that the number of records exposed in breaches doesn't follow a normal distribution, or 'bell curve,' but the log of the number of records exposed does."

"Does that mean that we can use event logs to predict data breaches?"

"No."

Thursday, 01 December 2011

Breaches and states of the US

There have been a few very big data breaches recently, which has changed the list of the 10 biggest breaches. That means that it must be time to create a bubble chart that compares the size of the 10 biggest data breaches with the populations of the 10 largest US states. Here's what that looks like:

Bubble

Tuesday, 22 November 2011

What's more important - compliance or security?

I just came across an interesting bit of information in Kaspersky Lab's June 17, 2011 Global IT Security Survey (PDF) report. Most other surveys that I've read about say that people in IT security are more worried about regulatory compliance than about actual strong and useful security. The Kaspersky report, on the other hand, says the exact opposite. Of the 11 areas that they asked people to prioritize, "Preventing IT security breaches" was deemed to be the most important and "Complying with industry regulations and standards" was deemed to be the least important. This varies so much that I'm left wondering how this puzzlinng result could be explained.

My first thought was that the Kaspersky survey might have polled a different type of person than the other surveys did. Here's how the Kaspersky report describes their methodology:

More than 1300 senior IT professionals from 11 countries took part in the survey. All respondents had an influence on IT security policy, and a good knowledge of both IT security issues of general business matters (finance, HR, ,etc.) Geographically, the survey was conducted in 11 countries, including both those with developing and mature economies.

Other surveys, like the CSI Computer Crime and Security Survey and Ernst & Young's Global Information Security Survey give a fairly detailed breakdown of the roles of the people that responded to the survey. That's something that's missing in the Kaspersky report, so it might be the case that people like CSOs tend to reply to the CSI and E&Y surveys while people responsible for getting the work done tend to reply to the Kaspersky survey. That difference might explain the different focus on what's the most important. Or it might not. In any case, I'll definitely be looking at future reports to see whether what Kaspersky found also appears in other reports.

Thursday, 17 November 2011

Data breaches caused by internal agents

According to Verizon's 2011 DBIR, here are the percentages of breaches caused by various types of internal agents. Average employees cause most breaches, not people with above-average access or permissions. Here's what their data looks like. The values are percentage of breaches caused by a particular type of agent.

Untitled 

Tuesday, 15 November 2011

The list of the 10 biggest data breaches changes AGAIN

The recent data breach at the on-line gaming company Steam that exposed roughly 35 million customer records is now tied for 6th place in the list of the biggest breaches ever. Here's how the list currently looks. Note that three of the top ten have actually happened in the past year. That's probably not a good sign.

Organization

Records

Year

Heartland Payment Systems

130,000,000

2009

TJX

94,000,000

2007

TRW

90,000,000

1984

Sony Corporation

77,000,000

2011

CardSystems

40,000,000

2005

SK Communications

35,000,000

2011

Steam

35,000,000

2011

RockYou

32,000,000

2009

U.S. Department of Veterans Affairs

26,500,000

2006

HM Revenue and Customs

25,000,000

2007

Monday, 14 November 2011

#voltagelive Voltage Customer Summit Video

Tuesday, 08 November 2011

Data-centric security for a data-centric world - #voltagelive 2011 in NYC


image description

New innovation and emerging technology brings with it opportunities for streamlining costs, eliminating hurdles for end users and reducing risks to the business. However, implementing game changing solutions can be unique to your environment, policies and processes.

That's why I invite you to join Voltage Security at its first customer summit in New York City on November 9, 2011. The summit will focus on data-centric security and will feature top Voltage customers such as Amex, Wells Fargo, State Street and others, who will discuss how they implemented encryption projects for mail, data and payments. Also presenting will be Eric Ouellet, research vice president with Gartner Group, who is currently developing new analyses of how companies use encryption. The summit features customers talking to customers—at last count this includesAmerican Express, BJ's Wholesale Club, Citigroup, Deutsche Bank, Fidelity Investments, JPMorgan Chase, UBS, State Street Bank andWells Fargo. The goal of the summit is to enable Voltage customers to network with each other and pick up valuable best practices. If you're interested in attending, please visit www.voltage.com/live.

The theme is 'Data-centric Security for a Data-centric World,' it's an area of huge attention. Data is the lifeblood of industry, commerce and leisure, every business and every transaction. That's why protecting it is such a serious and difficult responsibility. 

Here's a quick scan of the Hot Topics we have on tap at Voltage Security Live 2011:

  • Cloud Data Security
  • Data-centric Encryption
  • Ecommerce Security
  • Email Encryption
  • Mobile Data Security
  • Payment Security

There's no question that every company continues to face a long serious of challenges related to these topics. The conference is designed to tackle specific issues and help formulate achievable solutions. The areas to be covered include:

  • How to fund and integrate a data-centric strategy into your overall security program
  • Best practices for data-centric encryption based on real-world implementation at a Fortune 50 Bank
  • How to roll out encryption projects successfully across the organization and end-user community
  • Successful phases for fast and non-disruptive implementationwhat you need to do before during and after an implementation
  • Elements of key management architecture and design
  • The role of cloud and mobile data-centric security

Voltage Security Live 2011 will bring together the brightest minds in our field, all with considerable experience. There will be representatives from teams responsible for implementation, as well as enterprise and security architects looking for, and developing, best practices for data-centric encryption. 

The sessions will cover customer project case studies addressing issues such as how to maximize end-user adoption for your B2C implementations and implementing data-centric encryption projects, while the Customer Track focuses on panel discussions and presentations on topics such as protecting outsourced data, eDiscovery and Archiving and securing application emails. There's also an Architecture Track, featuring panel discussions and presentations on topics such as key management architecture, security policy, enterprise applications and the web services API and scalable design considerations. And there's the Security Panel, with a discussion and general Q&A featuring leaders from the security community—Gartner Security Analyst, Encryption Architects, QSAs. 

There's going to be a broad cross-section of security specialists attending, but some executives will find it particularly enlightening: CXOs and security leaders responsible for security strategy and programs; VPs/Directors responsible for security implementation; and architects responsible for security and application and enterprise architecture. If you have one of these roles, we think this conference is exactly right for you. 

We know there are constant demands on your time - we hope to see you there.

Register at www.voltage.com/live


Thursday, 27 October 2011

Voltage Customer Summit #VoltageLive - Only 23 Spaces left

301504408bf043ff9f6f8d3c6445dc11

 *** Only 23 spaces left ***

Voltage Security invites you to "Voltage Security Live 2011" at Bridgewaters in New York City on November 9, 2011. This customer summit will focus on data-centric security and will feature several leading Voltage customers, such as American Express, Wells Fargo, State Street and others, who will discuss how they have implemented encryption projects for email encryption, data-centric encryption and end-to-end payment encryption. Also presenting will be Eric Ouellet, research vice president with Gartner Group, who is currently working on a new analysis of how companies use encryption. The goal of the summit is to enable Voltage customers to network with each other and pick up valuable best practices.

Stop Press: Thanks to our sponsors - Coalfire, OpenPath, Teradata and Thales, we are able to offer registration for eligible particpants at no cost.  Register now at www.voltage.com/live

Join Voltage customers including: ADP, American Express, Bank of America, AT&T, Citigroup, Deloitte, Deutsche Bank, Elavon, Fidelity Investments, Heartland, JPMorgan Chase, McGraw-Hill, UBS and Wells Fargo
.
Highlights of the agenda include:

  • CxOs Panel – Business dynamics for data-centric encryption security – How to get your security project funded
  • Key Note – Eric Ouellet, Vice President Research, Gartner Group                      
  • How to maximize customer adoption – Kim Mroczkowski, Wells Fargo
  • 4. How to structure a data-centric encryption project – Emily Mossberg, Deloitte
  • 5. “Birds of a Feather” Networking lunch
  • 6. Tracks: Customer and Best Practices – American Express, State Street, Thales, PwC, Coalfire 
  • 7. Security Leadership Panel – Gartner Group, State Street, American Express, Wells Fargo

Stop Press: Thanks to our sponsors - Coalfire, OpenPath, Teradata and Thales, we are able to offer registration for eligible particpants at no cost.  Register now atwww.voltage.com/live 

 

Wednesday, 26 October 2011

The entire country of Israel the victim of a data breach

Although this particular breach actually happened quite a while ago (back in 2006) and information about it has just been released to the press, it looks like the country of Israel has suffered a fairly big data breach - one that exposed the personal information of 9 million Israelis. For a country with a population of about 7.5 million, that's a lot of data to be exposed.

Here's how the Yeshiva World News described the breach:

A major break has been announced as Justice Ministry officials have been investigating an unprecedented breach in the nation’s population database registry in 2006. The information was made available via the internet in many cases, including one’s name, identification number, date of birth and family relations, a database including over 9 million names, including deceased family members. Sensitive information pertaining to adoptions and natural parents as well as kuppot cholim (HMOs) were also contained in the database.

It will be interesting to see what the fallout from this breach is going to be. It's probably not going to be nice.

Wednesday, 19 October 2011

Are successful CISOs good or just lucky?

A while ago I noted how information security is probably more like poker than craps because there's more than just chance involved. Recent research (PDF) by Steven Levitt and Thomas Miles seems to indicate that this is actually true. They found that more successful players tend to win more at poker than average players do. That's something that you wouldn't expect to see in games that were just games of chance.

But not all successes that we might think of as being due to skill are really due to skill. Some research has suggested (PDF), for example, that the performance of successful mutual fund managers is more easily explained as good luck instead of a higher level of skill or superior knowledge.

What about CISOs? Do successful CISOs have superior skills or knowledge that significantly affect the performance of their organizations? Or are they just lucky?

I haven't seen any research that tries to answer this question, but I'd guess that the element of luck is getting more and more important. Today's software is extremely complicated, and with that complexity comes all sorts of bugs, some of which affect security. And because all software has bugs, it's probably possible for a clever hacker to find them and exploit them in any software. You might be able to find strategies that minimize your chance of hackers finding and exploiting them, but the chances of this happening never drops to zero. This means that no matter how good a CISO is, there's always a chance of their systems being hacked. And because the chance of being hacked is always there, maybe it's more luck than CISO skill that determines whether or not a particular business gets hacked.

And because so many decisions are now made for compliance reasons instead of a CISO thinking that a particular strategy is good, I wouldn't be surprised if the affects of chance are getting greater and the affects of CISO skill are getting smaller. And because software is likely to get more complicated in the future and regulatory compliance is likely to become a bigger factor in information security strategies than it is now, it might also become more and more difficult for good CISOs to make a difference in the future.

Tuesday, 18 October 2011

Gaffney v. TRICARE: legal wrangling begins over the SAIC/TRICARE data breach

It looks like a Maryland law firm has already filed a class-action suit (Virginia E. Gaffney , J.G. , E.G.  and Adrienne Taylor v. TRICARE Management Activity, United States Department of Defense and Leon E. Panetta, or Gaffney v. TRICARE for short) that tries to recover damages of $1,000 per person affected by the recent data breach that exposed the personal information of roughly 4.9 million TRICARE members. That's a total of $4.9 billion in damages, of which the lawyers will no doubt try to keep a significant fraction.

But courts have been very reluctant to award anything to victims of data breaches who can't show that actual financial damages resulted from a breach. Just saying that your risk of damage has increased usually isn't enough. So unless some judge somehow discovers a new interpretation of the law that applies to this particular breach, I expect to see this suit eventually dismissed.

Monday, 17 October 2011

Problems with the Ponemon data breach studies

Ponemon 

The Ponemon data breach studies are one of the few sources of information that we have about data breaches, but their results may either overestimate or underestimate the true cost of a data breach because the breaches that are looked at in these studies aren't really representative of all breaches.

As I've noted before, the size of data breaches follows a lognormal distribution fairly closely. Historically this distribition has had a logmean (base 10) of about 3.4 and a logdeviation (base 10) of about 1.2. In other words, the base 10 logarithm of the breach size follows a normal distribution or "bell curve."

But when we look at the breaches that the Ponemon studies look at, the breaches don't seem to be representative of all breaches. The 2010 report (PDF), for example, looked at US breaches that exposed between 5,010 and 101,000 records. Here's what we get when we graph that range (of the log) of breach sizes:

Normal 
    

So it certainly looks like that range of breaches isn't really representative of all breaches. It only includes breaches that are above-average in size but aren't too big, and it only represents about 31 percent of all breaches.

The Ponemon reports claim that they're carefully tailored to be representative of companies that suffer data breaches. As their 2010 U.S. Cost of a Data Breach report said,

This benchmark study examines data breach costs resulting in the loss or theft of protected personal data. As a benchmark study, Cost of a Data Breach differs greatly from the standard survey study, which typically requires hundreds of respondents for the findings to be statistically valid. Benchmark studies are valid because the sample is designed to represent the population studied. They intentionally limit the number of organizations participating and involve an entirely different data-gathering process.

A more representative sample of breaches would also include companies that suffered breaches that are both much larger and smaller than those interviewed for the 2010 report. Because those breaches weren't considered in this report, there's a good chance that the report either overestimates or underestimates the true cost of data breaches. Maybe we'll find out which one in a future report.

Tuesday, 11 October 2011

Another breach of 1M+ records

There's been yet another data breach that exposed over 1 million records. This time it was at Nemours, an organization based in Delaware, Florida, New Jersey and Pennsylvania that provides health care for children. From the official announcement of the breach:

Three unencrypted computer backup tapes containing patient billing and employee payroll data have been reported missing from a Nemours facility in Wilmington, Delaware. The tapes were stored in a locked cabinet following a computer systems conversion completed in 2004. The tapes and locked cabinet were reported missing on September 8, 2011 and are believed to have been removed on or about August 10, 2011 during a facility remodeling project.

and

The information on the tapes dates principally between 1994 and 2004 and relates to approximately 1.6 million patients and their guarantors, vendors, and employees at Nemours facilities in Delaware, Pennsylvania, New Jersey and Florida.  The missing backup tapes contained information such as name, address, date of birth, Social Security number, insurance information, medical treatment information, and direct deposit bank account information.

Thursday, 06 October 2011

Has cloutage.org ceased operations?

The people at the Open Security Foundation do some excellent work in tracking down details of data breaches and maintaining a database of these incidents. That's why I was very interested a while ago when they announced that they would also be tracking cloud security incidents and posting what they learned at cloutage.org.

But it looks like cloutage.org hasn't been updated in quite a while. The last incident that they have information for dates back to April 21 and their most recent news dates back to May 24. It looks like they might have actually abandoned the idea of tracking cloud security incidents entirely.

Maybe cloud security incidents turned to just not be as interesting as data breaches.

The Betfair breach is the 12th breach of 1M+ this year

The recent data breach at on-line gambling web site Betfair that exposed sensitive data of 3.15 million users is actually the twelfth breach so far this year that exposed one million or more records. Breaches that big seem to be getting so common that they're almost not newsworthy any more.

I haven't been following Betfair, but they seem to have had other security problems in the past. Page 43 of their 2011 Annual Report says this:

Not withstanding Betfair's IT and data security and other systems, it has experienced a limited number of security breaches in the past (which have not had a significant effect on Betfair's reputation, operations, financial performance and prospects and in respect of which remedial action has been taken).

Tuesday, 04 October 2011

Only four states still don't have data breach reporting laws

It looks like only four US states stilll don't have laws that require businesses to notify victims of a data breach. The four that don't are Alabama, Kentucky, New Mexico and South Dakota. I must have lost track of the various state laws  over the past few years. The last that I remember, more states than that didn't have such laws.

Monday, 03 October 2011

The SAIC TRICARE breach

Although there are so many data breaches that expose 1 million records that such breaches really aren't big news any more, a breach that exposes almost 5 million records still makes the news, and that's roughly how big the recent breach at the TRICARE (US military health care) facility in San Antonio, Texas, run by SAIC was.

From the official announcement (PDF) of the breach:

On September 14, 2011, Science Applications International Corporation (SAIC) reported a data breach involving personally identifiable and protected health information (PII/PHI) impacting an estimated 4.9 million military clinic and hospital patients. The information was contained on backup tapes from an electronic health care record used in the military health system (MHS) to capture patient data from 1992 through September 7, 2011, and may include Social Security numbers, addresses and phone numbers, and some personal health data such as clinical notes, laboratory tests and prescriptions. There is no financial data, such as credit card or bank account information, on the backup tapes.

The risk of harm to patients is judged to be low despite the data elements involved since retrieving the data on the tapes would require knowledge of and access to specific hardware and software and knowledge of the system and data structure. Considering the totality of the circumstances, we determined that potentially impacted persons or households will be notified of this incident via letter. We regret that the information required to initiate notification is not available at this time, but we will ensure that it is done in an accurate and timely manner and in compliance with all applicable DoD guidelines. Due to the large volume of individuals potentially impacted by this incident, we anticipate that individual notification will take at least 4-6 weeks; therefore, this notice is being posted in the interim. The incident continues to be investigated and additional information will be published as soon as it is available. Meanwhile, both SAIC and TRICARE Management Activity (TMA) are reviewing current data protection security policies and procedures to prevent similar breaches in the future.

It's not clear from the official announcement, but other reports have stated that the cause of this particular breach was a burglary of a car that contained a tape that was being carried from one government facility to another one. And it's not clear why the car containing the stolen tape was left unattended.

Data breaches of health care information may be some of the worst because it's impossible to undo the damage that the loss of privacy that they can cause. If your credit card number is comprimised it's easy enough to cancel the old card and get a new one issued. It's even possible to do that with Social Security numbers, even though the government doesn't like to do it. But it's essentially impossible to handle the exposure of sensitive health records in any way.

Tuesday, 27 September 2011

Data breaches in Massachusetts

According to the Boston Globe, the personal information of over 2.1 million Massachusetts residents has been exposed by data breaches since the beginning of 2010.

The population of Massachusetts is about 6.5 million, so that's almost one in three people.

And that's only since the beginning of 2010.

Oddly enough, according to the Federal Trade Commission, Massachusetts only ranks 25 out of the 50 states for the number of identity theft complaints per capita, so your odds might be even worse in other states.

Wednesday, 21 September 2011

Another projective elliptic curve

Here's another visualization of a projective elliptic curve. This time it's the curve y2 = x3 - (60/13) x2 + (71/13) x. This particular curve is just one that I happened to have lying around that shows the "barbell" shape that you can get with some elliptic curves.

Elliptic1 Elliptic2

And as Forrest Gump might say, that's all I know about graphs of projective elliptic curves.

Thursday, 15 September 2011

Finding meaningless trends

There's lots of data available for the data breaches that have happened over the past few years. Let's look at some of this data. And when we do this, let's pretend that we either didn't take any classes in statistics in school or that we forgot everything that we learned in these classes.

One thing that we might notice is that the average size of a breach has increased over the past three years. The size of breaches has roughly a lognormal distribution, so let's plot the logmean that we see for breaches over the past three years. Here's what that looks like.

Image001 
That certainly looks like a trend, doesn't it? And since we're pretending that we've forgotten how to look at data carefully, we're going to ignore the fact that a more careful analysis shows that there's actually no statistically significant difference between the logmeans for those three years. Instead, let's use those three points to predict what we'll see in the future.

The logmeans from 2008, 2009 and 2010 actually fit a straight line very well. The model

logmean(year) = -198.157 + 0.100267 * year

fits the data with a correlation coefficient of R = 0.9471 or R2 = 0.8970. And because the fit is so good, let's use that model to predict what we'll see in the future.

Here's what it predicts for how the average size of breaches will grow through the year 2020.

Image001 

Holy cow! The size of the average breach will get bigger by a factor of almost 20! So the breaches that we hear about that expose 1 million records could be breaches that expose 20 million records by then. And the rarer breaches that expose 100 million records or more could be exposing billions of records in not too many years.

This interpretation makes absolutely no sense at all, of course. Although the size of a typical data breach did increase from 2008 to 2010, the increase wasn't statistically significant. That means that extrapolating what we'll see in the future from the trend that those three years suggests is totally meaningless. Don't forget this the next time that you see someone claiming that there are trends in data. An increase and a statistically-significant increase aren't the same thing.

Wednesday, 07 September 2011

The list of the 10 biggest breaches changes

Because of the recent data breach at South Korean SK Communications that exposed the personal information of 35 million people, the list of the 10 biggest breaches has changed. This new breach is now in sixth place and the 2008 breach at T-Mobile that exposed the personal information of 17 million of their customers has now fallen to 11th place.

Here's a picture of how big the new top 10 breaches compare to a breach of only 1 million (the black dot in the center of the picture):

Breaches 
   

Thursday, 18 August 2011

DBIR vs. PCIR

Does complying with the PCI DSS help prevent data breaches? The data from Verizon's most recent Data Breach Investigation Report and Payment Card Industry Compliance Report seem to indicate that this is the case. In general, businesses that suffered a breach tended to be less compliant. The following graph shows the percentage of businesses that were compliant with each of the 12 requirements of the PCI DSS. The DBIR data is for businesses that suffered a breach. The PCIR is more representative of the industry overall. In most cases, breached organizations were less compliant.

Image001 

Thursday, 21 July 2011

Are we really seeing more data breaches this year?

The article "More Cyberattacks or Just More Media Attention?" by Robert Charette was in this month's IEEE Spectrum. Here's how this article starts:

This has been a banner year for high-profile cybersecurity disasters, with no letup in sight. So far, there have been 251 data breaches—a record-setting pace.

But is this really true?

The Open Security Foundation's data breach database seems to tell us that the number of breaches that we've seen in 2011 doesn't really exceed the number we've seen in previous years. To get an estimated number of breaches for all of 2011 I took the number of breaches in the first six months of 2011 and doubled it. Here's what the historical data really looks like when you graph it:

Breaches-by-year 

That doesn't seem to support the claim that we're seeing breaches at a record-setting pace in 2011. Instead, it looks more like 2011 is turning out to be a fairly typical year. But since the breaches in the first half of 2011 exposed over 126 million records, that's not really a good thing.

Update: An alert reader pointed out that the estimate of 684 breaches for 2011 is exactly the average of the number of breaches in each of the previous five years. So it looks like 2011 is looking even more typical than I first thought.

Wednesday, 20 July 2011

Breaches vs. European countries

Yet another data visualization. This one compares the B8, the eight largest data breaches, with the populations of the eight most populous countries of Europe.

Lots of data breaches are roughly as big as the population of some of the larger countries.

BreachesAndEurope 

Encrypted CPU stolen?

You keep using that word. I do not think it means what you think it means.

Inigo Montoya, The Princess Bride

A hospital in Spartanburg, South Carolina, has had another computer stolen. Here's how the local news reported this story:

Police are investigating the theft of a computer unit from Spartanburg Regional Medical Center.

According to an incident report, a nurse reported the CPU stolen to hospital security on Saturday.

The unit, according to the report, was taken from the emergency center major care room 50.

Hospital staff told police that the CPU was last online on Friday morning.

Police say they have no leads.

Hospital representative, Chad Lawson, told News4's Mike McCormick that the CPU was encrypted and contained no patient information. The representative said the unit was used primarily to access various programs that are password protected.

I'm fairly sure that the term "CPU" isn't being used quite correctly here.

Tuesday, 19 July 2011

Breaches vs. states of the US

Another fine data visualization. This one compares the B8, the eight largest data breaches, with the populations of the eight most populous states of the US.

Four breaches were actually bigger than the population of the entire state of California.

BreachesAndStates 

Sunday, 17 July 2011

Looking back at the size of data breaches

Verizon's recent 2011 Data Breach Investigations Report (PDF) seems to show that very few records were exposed by data breaches in 2010. The report says that all of the breaches that Verizon investigated in 2010 only added up to about 3.9 million records that were exposed.

That doesn't mean that only 3.9 million records were exposed in 2010. 

The Open Security Foundation's data breach database lists breaches in that year that exposed over 28 million records. So although the amount of data that was exposed through data breaches was lower in 2010 than it was in the previous few years, there was still a significant amount of data exposed. Much more that the 3.9 million that Verizon's investigators looked at.

A breach that exposes 5 million records doesn't really look that big when it's compared to other recent breaches. Here's a graph that I created with IBM's Many Eyes data visualization tool. It shows the relative size of recent data breaches (from the Open Security Foundation's data breach database), with a single breach of 5 million records highlighted. 

Breach5m 

This seems to tell us that a breach that exposes 5 million records really isn't very notable.

If a breach that exposes 5 million records really isn't that notable, that's a sure sign that we're losing way too much data.

Data breaches that expose 1 million or more records aren't really that rare. There have been over 50 of these since 2006, or almost one per month. And if you look at how much data has been exposed by data breaches, 1 million records doesn't really look like that many. Here's a graph that shows this. The single highlighted breach exposed 1 million records.

 

Excerpted from recent posts about data breaches by Luther Martin

Friday, 15 July 2011

Another good question about data breaches

I was just asked one of the best questions ever about data breaches. It happened roughly like this.

"You guys seem to do a lot of analysis of data breaches. I just have one question for you about it."

"OK."

"I see why you're doing it, but why isn't anyone else also doing it, too?"

That's a very good question.

Looking at the available data isn't really very hard, and there are lots of companies who you'd think would also be interested in understanding it. I have no idea why they're not.

Wednesday, 13 July 2011

How big is a data breach of 1 million records?

Data breaches that expose 1 million or more records aren't really that rare. There have been over 50 of these since 2006, or almost one per month. And if you look at how much data has been exposed by data breaches, 1 million records doesn't really look like that many. Here's a graph that shows this. The single highlighted breach exposed 1 million records.

Breach1m 
 
 
 

Friday, 01 July 2011

The likely outcome of Cristina Wong v. Dropbox

Not too long ago, start-up Dropbox had a security bug that let people access the data of other users. The bug went unpatched for about four hours. According to Dropbox, during that time the data of fewer than 100 users was accessed and only a single person exploited the bug. But that didn't stop Cristine Wong from filing a class-action suit (PDF) against Dropbox. In this suit Wong claims that

As a result of the Defendant's breach of its warranties, Plaintiff and the Class have been damaged in the amount of the purchase price of Defendant's services they purchased.

That seems like an overly-broad definition of damages to me, and one that might not stand up to much scrutiny. If this suit makes it to trial, I wouldn't be surprised if any damages are limited to the 100 or fewer people whose data was actually accessed before Dropbox's bug was patched. And that might actually disqualify this from being a class-action suit.

Tuesday, 28 June 2011

Another look at 2010 data breaches

In a previous post I noted how the apparent drop in data breaches in 2010 looks a bit different if we don't look at the really big data breaches. I previously had a graph that showd that things looked fairly unchanged from 2006 to 2010 if we didn't count breaches that exposed 1 million or more records.

What if we change the threshold for a big breach? In particular, what if we don't count breaches that exposed 5 million or more records? Here's what that looks like when we plot the total number of records exposed by data breaches for each of the past few years.  

So if we don't count really big breaches, we might even think that 2010 looked a bit worse than the previous year.

Image001 

Monday, 27 June 2011

Questions about the data breach analysis

After the recent posts about some statistical analysis of what's known about data breaches I received a couple of questions.

The first asked why I used the Kolmogorov-Smirnov test for normality instead of something more sophisticated like the Shapiro-Wilk test. The answer to this question is that the statistical analysis software that we happen to have isn't really very sophisticated.

Voltage is fundamentally a company that makes and sells encryption products, and it's hard to justify spending money on something that's not directly related to our core business. This means that we'll probably never have very sophisticated statistical analysis software and that we'll leave more sophisticated analyses to someone else. We're really more interested in overall trends and how they'll affect our customers instead of in a more careful analysis of the data, and the tools that we now have seem adequate for doing that.

The second question asked why I used base 10 logs instead of natural logs when I looked at the data breach data. The answer to this question is that I believe that it's easier for most people to understand base 10 logs than natural logs.

If you're told that the base 10 log of a data breach size is 6, it's easy to see that that corresponds to a breach that exposed 106, or 1 million, records. If you're told that the natural log of a data breach size is 13.82, that also corresponds to a breach that exposed 1 million records, but it's probably harder to see it. So using base 10 logs can make the math trickier in some cases, but it also probably makes things more understandable.

Friday, 24 June 2011

Graphs of a closer look at the data

I should have expected it, but after yesterday's post, I was asked many times essentially, "Yes, but what does that look like?"

So here are graphs that show the fit of a lognormal distribution to the various types of data breaches that were listed in yesterday's table.

Biz 
Gov 
Edu 
Med 
Hack 
Lost 
Stolen 
 

Thursday, 23 June 2011

A closer look at the data

Here's some more information about the data shown in the previous two days' posts. The size of breaches appears to be lognormal, even when we just look at data for particular industry sectors or particular ways in which a breach happens. Here's a summary of this. It shows how many data points I looked at from the Open Security Foundation's data breach database, the logmean (mean of the base 10 logarithm) and log deviation (standard deviation of the base 10 logarithm) of the data for different cases, as well as the p-value that we get if we do a Kolmogorov-Smirnov test for normality with the base 10 logarithm of the data. In each case, there's a reasonable fit to a lognormal distribution (i.e., p > 0.05). In some cases, the fit is even very good.

Category

N

Logmean

Logdeviation

P-value

Biz

845

3.33

1.35

0.12

Gov

430

3.52

1.31

0.66

Edu

492

3.24

1.03

0.95

Med

419

3.53

1.15

0.57

Hack

311

3.89

1.22

0.75

Lost

264

3.49

1.33

0.82

Stolen

724

3.50

1.12

0.49

Wednesday, 22 June 2011

How is more sensitive data lost?

Hacks tend to cause bigger data breaches than incidents where data is lost or stolen data. The size of data breaches is lognormal, so I looked at the data in the Open Security Foundation's data breach database compared the logmeans of the breaches that are caused in different ways that are listed  Here's what I found. A more careful analysis shows that these differences are statistically significant.

And note that because this is really looking at the log of breach sizes, the difference between 3.9 (a breach from being hacked) and 3.5 (a breach from losing a laptop, backup tape, etc.) is actually substantial - more than a factor of about 2.5.

HackLostStolen 

Tuesday, 21 June 2011

Who loses more sensitive data?

Government and health care organizations tend to have bigger data breaches than others do. The size of data breaches is lognormal, so I compared the logmeans of the typical breaches from the different sectors that are listed in the Open Security Foundation's data breach database. Here's what I found. And a more careful analysis does indeed show that these differences are statistically significant.BizGovEduMed 

Wednesday, 15 June 2011

Visualizing 2010 data breaches

Here's an example of what you can get when you put the Open Security Foundation's information about 2010 data breaches into IBM's Many Eyes data visualization tool. The area of the circles is proportional to the size of the breaches.

2010Breaches 

Friday, 03 June 2011

Understanding the apparent drop in data breaches in 2010

In a previous post I conjectured that the apparent decrease of data breaches in 2010 was just due to the lack of really big breaches that year. A closer look at the data seems to suggest that this is true.

Because the size of data breaches follows a lognormal distribution, it's easy to get a rough idea of how surprised we should be by the lack a really big breach in 2010.

The size of breaches has a (base 10) logmean of 3.4 and a logdeviation of 1.2. If we define a "really big breach" as being one that exposes 5 million or more records, then having a really big breach corresponds to seeing a standard normal z value of

(log 5 million - 3.4) / 1.2 =  (6.7 - 3.4) / 1.2 = 2.75

or greater. From a normal distribition table we find the probability of z ≥ 2.75 is 0.003, so that the probability of a breach not being really big is 0.997.

There are 324 breaches listed in the OSF's data breach database for 2010 for which the size is known. The probability that each of these ended up exposing less that 5 million records is

0.997324 = 0.378

So there's about a 38 percent chance of seeing what we saw in 2010 - no really big breaches - just due to chance alone.

That's a probability that's too big to ignore. The usual criterion for saying that an event is rare enough to be significant is that it happens less than 5 percent of the time, and by that definition, the lack of really big breaches in 2010 doesn't really look too surprising. 

So it certainly looks like we can probably explain the overall decrease in the number of records exposed by data breaches in 2010 as being caused juist by good luck instead of by better security or compliance efforts. That's probably a better explanation than the one that Verizon came up with in their 2011 DBIR: "Our leading hypothesis is that the successful identification, prosecution, and incarceration of the perpetrators of many of the largest breaches in recent history is having a positive effect."

Friday, 27 May 2011

What's the real cost of a data breach?

I was just reading the 2011 ISUG Report on Data Security Management Challenges that's available from the International Sybase User Group web site. You actually need to join the ISUG to get the report, although there's a free level of membership available.

This report summarizes what Unisphere Research learned from a survey of 216 members of the ISUG. I found one of the findings hard to interpret. This is what's shown in Figure 7 on page 11 of this report. The graph is captioned "Costs of Recent Data Breaches (Among respondents aware of a breach over the past 12 months)." Here's my version of it.

Fig7 

This is a bit hard to explain.

Other studies of the costs of data breaches have come up with much bigger estimates - typically close to $200 per record exposed and a total of over $6 million per breach. From everything that I've heard, the $6 million estimate is fairly accurate, so the ISUG number, which is roughly 100 times lower, is probably way off. It's hard to imagine that it's even possible to cover the costs of the lawyers for what the ISUG estimates is the total cost of a breach. 

And you'd certainly expect more people to have an idea of how much their data breaches cost them. All they'd have to do is to remember the company meeting when the CEO said, "That data breach just cost us $8 million. Let's make sure that it doesn't happen again."

I'd say that the best explanation for this discrepancy is that the 3% in the above graph corresponds to a single person so that we're only looking at a sample of 10 people who had any idea of how much their recent breach cost them. If that's the case, such a small sample may not give very useful results. The Ponemon studies, for example, survey more people who had an idea of how much their breaches end up costing, so they probably give a better indication of how much breaches really cost, and their estimates tell us that it's really more like several million dollars per breach. 

So the bottom line is that there's definitely some useful information in the 2011 ISUG Report on Data Security Management Challenges, but the cost of data breaches isn't part of this.  

Thursday, 26 May 2011

What types of cloud security incidents are happening?

Here's the breakdown of the 505 cloud security incidents in the database at cloutage.org by the type of incident. The five incident types are defined here.

Outages are still the biggest problem with cloud computing.

Image001 

Monday, 23 May 2011

Linda Ronstadt sings about the SEC's email security

The Securities and Exchange Commission recently had a data breach that was caused by the failure of their email encryption product to actually encrypt when it was supposed to. This breach resulted in the exposure of the Social Security numbers and other payroll information of over 4,000 of their employees. 

Here's what the story in the LA Times about the breach said:

The May 4 email was sent by a contractor at the department's National Business Center, which manages payroll, human resources and financial reporting for dozens of federal agencies, Malcomb said. Interior Department policies require that sensitive personnel information be encrypted when emailed.

But the contractor neglected to encrypt the email, and the software in place to catch such errors did not work properly, Malcomb said.

"It was a twofold thing," he said. "The contractor forgot, and then the software failed or malfunctioned." 

I don't know which email encryption product the SEC was using when this incident happened, but it might be what Linda Ronstadt was singing about in this YouTube video.

Friday, 20 May 2011

Who lost more of your data in 2010?

In a previous post we saw the number of data breaches in 2010 in each of the four major industry sectors (business, education, government and medical) according to the data in the Open Security Foundation's data breach database. But that's just by the number of incidents. Here's what it looks like if we look at the number of records compromised by sector. That's a bit different, isn't it?

Image001 

Monday, 16 May 2011

Who lost your data in 2010?

Image001 

According to the data in the Open Security Foundation's data breach database, here's how the number of data breaches in 2010 breaks down by the four major industry sectors of business, education, government and medical.

Tuesday, 10 May 2011

PCI compliance after a breach - another point of view

Here's another way to look at the data in Verizon's 2011 Data Breach Investigations Report (PDF) about what fraction of their customers were found to be compliant with the requirements of the PCI DSS after a breach had occurred. I'm not sure which graph is more useful - this one or the one in yesterday's post. They each seem to tell a different story, and it's not clear to me which one is better. As in the first graph, the horizontal axis is the PCI DSS requirement and the vertical axis is the percentage of businesses found to be compliant after a breach.

Image002 

Monday, 09 May 2011

PCI compliance after a breach

Verizon's recent 2011 Data Breach Investigations Report (PDF) has some interesting information about what fraction of their customers were found to be compliant with the requirements of the PCI DSS after a breach had occurred. Here's a graph of what they found. The number on the horizontal axis is the number of the PCI DSS requirement. This makes it fairly easy to see trends, although it's probably not worth calling changes over just a few years a trend.

Image001 

Friday, 06 May 2011

Why I don't like the OSF's definition of "fringe incidents"

The Open Security Foundation doesn't keep track of all data breaches. In particular, they don't track what they call "fringe incidents." Here's how they define the incidents that they do log, which are all incidents that wouldn't count as "fringe incidents:"

The criteria has traditionally been:

  • An incident must have lost one or more of the following data types:
    • Social Security or national ID number
    • Credit card number
    • Bank account number
    • Medical record
    • Financial account number
  • AND the number of records lost/stolen/missing must be greater than 10,
  • AND the data lost must have had a steward organization.

The part that I particularly don't like about these criteria is that only incidents that affect more than 10 records end up being tracked.

It certainly looks like the size of data breaches follows a lognormal distribution. That particular distribution is symmetric (in the log) about its mean, so that you're just as likely to get a very small incident as you are to get a very big one. But to be able to see that pattern you need for those very small incidents to be part of your data set. So by filtering out the very small incidents, the OSF may be making it harder for researchers to find patterns in data that actually are there.  

Thursday, 05 May 2011

Just like the Air Force used to be

It looks like the police in Newington, New Hampshire need a lesson or two in modern IT. A loptop that held lots of sensitive information was recently stolen from one of their police cars. But they're not worried about the loss of the data. Here's how the local news covered this event:

The police chief said he's been advised that it's unlikely anyone could access personal information stored on the stolen laptop because the battery is so old it barely functions without a companion power cord.

But it could be worse.

When I used to work for the government, I was once part of a team assessing the IT security at a large Air Force base. I was more that a bit surprised when they explained to me that their servers were perfectly secure because they were behind not one, but two locked doors. To their credit, though, I think that the Air Force has gotten a bit better since then.

Wednesday, 04 May 2011

The Threat of Data Theft to American Consumers

Earlier today, the U.S. House of Representatives Subcommittee on Commerce, Manufacturing, and Trade (a subcommittees of the Committee on Energy and Commerce) heard testimony about “The Threat of Data Theft to American Consumers.”

Eugene Spafford of Purdue University talked about the privacy principles that the U.S. Public Policy Committee of the Association for Computing Machinery (USACM) have been proposing since 2006. It's not clear how interested Congress is in adopting all or part of the USACM's principles. They haven't shown much serious interest in the past five years. There has recently been some interest in this because of the recent high-profile data breaches, but I'd expect the interest to rapidly decrease as people forget that the large breaches happened. Perhaps in as little as a month or two.

Spafford also recommended that there be no safe harbor provisions for PII at all. This means that if hackers somehow manage to steal encrypted data that you'd need to treat the incident just like they stole the unencrypted data. And because tokenization is equivalent to encryption (like Ramon Krikken of the Burton Group discussed at the recent Key Management Summit), that means that if hackers manage to steal tokenized data that you'd need to treat the incident just like they had stolen the untokenized data.

I'm not sure that this is really a good idea. 

When you decide whether or not use any security solution, you always need to balance costs and benefits. Removing safe harbor provisions for encrypted or tokenized data would dramatically change that calculation by greatly reducing the benefits of using the technology, which would then make their use much less attractive that it is now. So this would end up actually discouraging the use of encryption or tokenization, and that's not good.

Safe harbor provisions seem to have been a very useful way to encourage businesses to protect more sensitive information than they would have otherwise, and I think that removing them would actually be a fairly big mistake.

Voltage Data Breach Index

  • Grab the Voltage Data Breach Index

February 2012

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29