« June 2009 | Main | August 2009 »

July 2009

Friday, 31 July 2009

Calibrating experts

All expert opinions are not equally useful. Some expert opinions aren't even very accurate. So in cases where you need some sort of expert advice, you need to be able to somehow calibrate the opinions of experts to make the opinions more useful. I do this when I decide which books to buy.

There are so many books in print that it's impossible to read a page or two from each of them to see if they're the sort of thing that appeals to you. Even in relatively small niches like genre fiction, there are so many books published each year that one person can't really keep up with all of them. Because of this, a useful alternative is to find someone whose job is to review books full-time and to use their opinions to help you make your decisions. There are also as many opinions as there are book reviewers, so you also need a way to filter the opinions of reviewers. What I've done in the past is to find people whose opinions tend to agree with mine and to then use their reviews to help me decide which books to get.

You have a similar problem with any specialized field, and information security is no exception to this. Fortunately, there are techniques that have been devised that let you calibrate the opinions of experts. These essentially generalize my technique for deciding which books to consider buying: you get expert opinions for which you know the right answers and use the accuracy of these answers to develop a way to interpret the opinions that you get when you don't know the right answer. If you formalize this technique using Bayesian statistics, you get the Mendel-Sheridan model that was first described in 1989 by Max Mendel and Thomas Sheridan in "Filtering Information from Human Experts," and you can probably use this approach to create a way to help interpret what consultants tell you.

If you're working with a consultant, it's likely that the consultant's knowledge will be better than yours in the area of his specialization. Because of this, you probably can't ask him meaningful questions about it. Using the Mendel-Sheridan approach, however, you'd ask questions that you do know the answers to and use the accuracy of his responses to form your opinion of the answers that you don't know. That's apparently the best way to handle the situation. You can probably even apply the same principle to finding the right consultant to begin with.

We probably already do this, of course, but it's sometimes comforting to know that there's a reasonable basis for how we do things.

Thursday, 30 July 2009

Uses for the Axiom of Choice

The Axiom of Choice is one of those things that causes a lot of trouble if you think about it for too long. It says that if you have a collection of non-empty sets then you can choose a member of each set in the collection. This sounds perfectly plausible, but it turns out that you can either assume that the Axiom of Choice holds or that it doesn't hold and without causing any logical inconsistencies. I've never really cared much about the Axiom of Choice myself. The only thing that it's done for me is create a really bizarre memory from graduate school.

When I was in graduate school, there was a fellow graduate student named Mike who was fascinated with the Axiom of Choice. It was definitely his favorite topic of discussion, although I never quite understood why. Mike told me once how he once was on a date that he could tell just wasn't going well. Apparently he and the woman that he had asked out ended up having absolutely nothing in common. After a while, he decided that he was going to stop trying to find common interests and instead decided to talk about things that interested him. In this case, it was the Axiom of Choice. According to Mike, this caused a sudden and dramatic change in his date, who couldn't keep her hands off him for the rest of the night.

I don't know if this story is true or not, but it certainly is one of the more unusual applications for the Axiom of Choice that I've heard of. Another interesting use may be in economics.

According to a paper by Christopher Ayres, all of the foundations of economic theory rely of the Axiom of Choice. Because the Axiom of Choice seems to be something that you can either accept or not accept without really affecting anything, I'm not sure how much Ayres' results will withstand a careful look, but I'm sure that I can think of at least one person who'd be interested in reading his paper. Of at least he would have been several years ago.

Wednesday, 29 July 2009

Why unary is impractical

I've been asked lots of questions about unary representations of numbers recently. Usually, this is stuff that absolutely nobody cares about, but because of the connection to Gentry's homomorphic encryption scheme, it's received lots of interest recently. Here's why the need to represent numbers in unary makes a cryptographic scheme totally impractical.

Unary is a way to represent numbers without a base at all. Suppose that we have a number that's 165 in base 10. If we write it in base 16 it's 0xa5. In binary it's 10100101. In unary, however, we use a number of 1s that equal the number. So for 165, we use 165 1s: 1111...111 (you'll have to imagine a string of 165 1s here - the sotware that runs this blog apparently has trouble displaying a string that long).

That clearly doesn't scale well. In a number system using a base that's two or greater, it takes roughly log n digits to represent the number n. And because n increases much more rapidly than log ndoes, unary isn't really very practical, particularly when you're dealing with numbers like those used in cryptography.

Consider a DES key. Ignoring the parity bits, this key has only 56 bits, which is trivial to store on a modern computer. If we use unary to represent a 56-bit number, however, we need to write 256 1s in a row. If we store chunks of eight bits of this number in 1 byte, this takes 253 bytes to store, or about 9 petabytes. That's a lot of storage.

If you try to use unary to represent a number like those often used in public-key cryptography, it's even worse. If we have a 1,024-bit key and try to represent it in unary, it takes 21,021 bytes, or over 10307 bytes. I'm not sure how accurate estimates like these are, but I've read that physicists estimate that there are roughly 1078 protons in the entire universe. So if you're thinking of implementing a public-key scheme that uses unary, it's going to be totally impractical.

Tuesday, 28 July 2009

One fallacy down, several more to go

The Internet is good for some things. It certainly makes some types of research much easier than they once were. You once had to look up reference materials in a card catalog, find the material on your library's shelves, and then read through it to see if it contained the information you were looking for. This often took quite a while. It certainly took more time than just typing a few words into Google and clicking on "Google Search."

The Internet is also very useful when you start teaching your kids about logical fallacies. Pick almost any blog that discusses politics and you'll see more examples of logical fallacies than you used to see in your entire life in the pre-Internet days. When I stumble across examples of these fallacies, I often feel the urge to post things like "This is a good example of what's often called a 'false dilemma' or 'bifurcation fallacy.' Please refer to your college textbook on logic for more information, or click on this link to learn why your argument makes no sense whatsoever."

Maybe I'll actually do it some day.

One of the common logical fallacies is the so-called genetic fallacy, which says that an idea shouldn't be accepted or rejected based on its origin instead of on its merit. I suspect that a careful analysis of this particular fallacy would show that it's not really a fallacy, and this is because of the connection  to Bayesian reasoning.

As I've mentioned before, Bayesian reasoning leads us to weighing peoples' opinions based on what we know (or think that we know) about them. Liberals are likely to misrepresent and distort the facts when talking about conservatives and their points of view and conservatives are likely to misrepresent and distort the facts when talking about liberals and their points of view, for example. Because of this, we know that we can't trust what we hear, so the reasonable thing to do is use Bayesian reasoning that evaluates the chances of what we hear being true given everything else that we know (or think that we know). This means that the genetic fallacy is really nothing more than Bayesian reasoning at work.

Now it seems that Bayesian reasoning is a generalization of the usual Aristotelian logic that reduces to it in the special case that the hypotheses are either true or false. There's even an interesting book by E. T Jaynes, Probability Theory: The Logic of Science, that describes exactly how this works. So if Bayesian reasoning is consistent with logic and the genetic fallacy is consistent with Bayesian reasoning, I'm inclined to believe that the genetic fallacy isn't really a fallacy after all. A logical fallacy, after all, is an error in reasoning, and it looks to me like the genetic fallacy really isn't an error. Instead, it's just taking advantage of all the available information to put new information into a useful context.

That just means that I won't feel compelled to point out a small fraction of the logical fallacies that I see on the Internet. Luckily, there are still enough others out there to keep me entertained for the foreseeable future.

Monday, 27 July 2009

The lamest spam ever

A few days ago, I received what must be the lamest spam ever. Here's what it said:

Hi Sir, You have a wonderfull offer to getting world wide companies. Please. If you are increase your bussiness then me. Regard's Thomes

It's somewhat amazing that this message made it past our spam filter, but what's even more amazing is the fact that someone actually thought that he could make money off his email campaign that sent out this message.

I've seen some poorly-designed spam over the past few years. Some spammers didn't even take the time and effort to make the content of their spam consistent. Like messages that claim to be from one bank yet tells you that your account at an entirely different bank has been suspended and that you'll be unable to access your balance unless you click on a link that installs a virus on your computer.

At least that spam made a better effort. It at least had flashy graphics that made a reasonable effort of making the message look like it really came from a big commercial bank. This message from "Thomes," however, makes no effort at all to even try to look legitimate. It must the lamest spam that I've ever seen.

Friday, 24 July 2009

What you can and can't do in 39 days

The court order for the sale of the assets of General Motors went into effect a few days ago, 39 days after GM filed for Chapter 11 protection, paving the way for the new and improved GM to start business. That's a remarkably short time. Particularly when you compare it to how long it takes to do some other things, like creating, editing and issuing a technology standard. If GM can do it in 39 days, why can't standards groups do it in anything less than a few years?

Apparently it's possible to get a huge company through bankruptcy in only 39 days. It's too bad you can't complete a standard in the same amount of time.

Thursday, 23 July 2009

Vanish: self-destructing data

There’s been lots of talk recently about “self-destructing data,” or data that loses its ability to be decrypted over time. I’ve been asked about this enough times to make it worth putting here so that I can refer future questions to this post, so here it is.

Roxana Geambasu, Tadayoshi Kohno, Amit Levy and Henry M. Levy, all of the University of Washington, recently published a paper that described a system that they called “Vanish.” This system creates a way to encrypt data that makes it possible to decrypt the data for a while, but after enough time passes, this ability goes away. Here’s how Vanish works. It's based on a clever application of Shamir secret sharing.

Shamir secret sharing is a way to split a key into n parts, any m of which allow you to reconstruct the key. It works by encoding the n pieces of the key as points on the curve of a polynomial of degree m - 1. Recall that any d + 1 points uniquely determines a polynomial of degree d, so that any two points uniquely determine a line, any three points uniquely determine a quadratic equation, and so on.

With Shamir secret sharing we use the key that we want to split for the constant coefficient of a polynomial of degree m - 1 and create n points on the curve of the polynomial which then act as the split parts of the key. When we do this, we can then find the polynomial's coefficients from any m of these parts. One of these coefficients is the key that was split, so we can also find the key that we need from any m of these parts.

Vanish ties the ability to look up the parts of a split key to information that changes over time. Dynamic IP addresses, for example, change fairly often, so they can be used for this. If you use a user’s IP address to look up part of a split key, then when a user’s IP address changes, you’ll also lose the ability to look up part of the split key.

When you initialize this scheme, you’ll be able to get all n of the n possible parts of the split key, but eventually you’ll get down to having only m - 1 of them available as the information that you need to look up the parts gradually disappears or changes. When that happens, you’ll no longer be able to get enough parts of the split key to recover it. Note that this happens very suddenly. One minute you can decrypt your data just fine and the next minute you can't. There's no slow degradation in the ability to decrypt.

Vanish seems like a simple and elegant scheme, but I’m not convinced that it’s really that important. In the business world, instead of having data disappear, it’s more important to have guaranteed access to encrypted data. That’s something that people will pay for. Vanish probably isn’t.

Wednesday, 22 July 2009

When is an IV not an IV?

I just saw an interesting exchange on a security web site. Someone asked how to securely encrypt small blocks of data, like credit card numbers, using the CBC mode of a block cipher. The reply reminded me of why many people find cryptography annoying: there are lots of details to keep track of, and if you make a mistake on any one of these, you're not as secure as you think you are. In particular, the reply talked about using the same initialization vector (IV) for an entire row or column of data in a database. This is a mistake. It's such a common mistake that it even has its own Common Weakness Enumeration number assigned to it: CWE-329.

The problem is that the security model that the CBC mode of an algorithm relies on assumes that IVs are random. If they're not random, then using CBC mode isn't guaranteed to be secure. The proof of security for CBC mode says that IF an IV is random THEN CBC mode is secure. If an IV isn't random, then this tells you nothing, and there can be ways for an adversary to recover plaintext data that wouldn't be possible if the IV was actually random.

There are secure ways to use a non-random input in addition to the key that a block cipher uses, but they're different than how CBC mode is implemented. The way to do this securely is to use a tweaked block cipher, and there's some discussion of this in a previous post. The way that a tweaked mode is implemented is different from how CBC mode is implemented because of the different assumptions about the inputs. CBC mode is secure if the IV is random. A tweaked mode is secure even if the tweaks are not random, and it takes a different structure to do this.

So it's definitely possible to securely encrypt small blocks of data using a block cipher, but using the same IV for an entire row or column of a database isn't a good way to do this.

Tuesday, 21 July 2009

More ancient format-preserving encryption

In addition to the way of doing format-preserving encryption that FIPS 74 describes, there's another way that has been around for quite a while. This the technique that Michael Brightwell and Harry Smith describe in their paper "Using Datatype-Preserving Encryption to Enhance Data Warehouse Security," which they originally presented at the 1997 National Information Systems Security Conference. You can get their presentation here and a more carefully written paper here. Here's how their scheme works (ignoring an initial permutation).

Suppose that we want to encrypt a sequence of digits i1, i2, …, in in a way that maps each digit to another digit. We use the DES algorithm with a key K to do this. We first hash K to get the bytes A = (a1,…,a8) and use this value as shown in this diagram to produce the ciphertext z1 from the plaintext i1. In this diagram, the addition is done modulo 10 to ensure that we get another digit when we encrypt.

Old FPE 1  

To get the next ciphertext digit, we shift the inputs into the DES encryption function, shifting in the plaintext i1 as we do this. This diagram shows how to get the ciphertext z2 from the plaintext input i2.

Old FPE 2   

To get the next ciphertext digit, we again shift the input into the DES encryption function, this time shifting in i2. Here's how this step works when we get  the ciphertext z3 from the plaintext i3.

Old FPE 3  

We repeat this process as many times as we need to encrypt the entire sequence of plaintext digits, shifting in a new plaintext digit at each step.

Brightwell and Smith claim that their scheme is as secure as DES, but that's probably not really true. This is because adding the output of a DES encryption to a plaintext digit and then reducing the sum modulo 10 produces a bias in the ciphertext. If we're adding a random number from 0 through 255 and reducing the sum modulo 10, we'll be adding each of 0 through 5 26/256 of the time, and adding each of 6 through 9 only 25/256 of the time. Because of this, the ciphertext that we get from encrypting the digit 0 will have a different distribution than that that we get from encrypting a 1, for example.  This means that it's easy to distinguish encryption using the Brightwell-Smith scheme from a random permutation so the scheme doesn't really meet the definition of a secure one, at least by today's standards. It's still interesting, however, at least from a historical perspective.

Monday, 20 July 2009

Neurosecurity?

Neuroeconomics is a new area of economics that might be interesting to information security practitioners. It tries to understand how our brains affect how we make decisions. Economists have apparently realized that our brains are very complicated and don't make decisions in a way that's easily modeled, and neuroeconomnics tries to take these complexities into account. It essentially realizes that we're not rational and tries to understand the implications of that fact.

Psychologist Daniel Kahneman shared the 2002 Nobel Prize in Economics "for having integrated insights from psychological research into economic science, especially concerning human judgment and decision-making under uncertainty," which may indicate that neuroeconomics may have interesting and useful implications. It might even give us some insights that we can apply to the field of information security.

Microeconomics tries to explain why people make the decisions that they do. It typically tries to understand decisions as a tradeoff between two or more choices, and assumes that people will pick the choice that they like the most. Measuring exactly how much people like various alternatives can be tricky because it almost always comes down to more than just the dollar value of what you get from a particular outcome. To model other factors, economists talk about "utility," which is just a way to quantify things that have value but aren't easily measured in dollars. People who live in Silicon Valley, for example, might like being to make a day trip to Yosemite National Park, but they'd be hard pressed to quantify exactly how much this is worth to them.

Having Yosemite nearby has utility even if it doesn't actually give us any money that we can spend on other things. And just like utility is a better way to measure how much we like things, it also might be a better way to measure how much information is worth. The utility of information might be more than its value. If that's the case, we might want to protect it more than we might think is necessary. Or it might be less that its value. In that case, we might want to protect it less than we ought to. In any case, it probably pays to understand the difference between the information's utility and its value.

It's hard to put an accurate value on information, but an equally hard part of information security is understanding how often the bad things happen. In particular, our brains systematically overestimate very low probabilities and systematically underestimate very high probabilities. We might estimate that a probability that's really 0.0001 to be 0.1, for example. Or we might estimate a probability that's really 0.9999 to be 0.9. If these probabilities represent the chances of bad things happening, then the bias that we have can make a big difference.

We should expect people to spend more to address a risk of $10 million than a risk of $10,000, but if the way our brains works tends to make us want to deal with a $10 million risk as if it's really a $10,000 risk, we might be heading for trouble because we probably won't be trying hard enough to mitigate the risk in some way. Similarly, if we deal with a $10,000 risk as if it's really a $10 million risk, we'll probably spend too much on mitigating it, and that's money that could be put to a better use somewhere else.

So the bottom line is that our brains don't do a good job handling the type of data that we need to make good decisions about information security. Maybe neuroeconomics will one day be able to give us some useful insights into how to do this better. We know that we're not rational; we just haven't found the patterns in our irrationality yet.

Friday, 17 July 2009

PCI and small businesses

The business of America is business.

Calvin Coolidge

In his column on the storefrontbacktalk.com web site, David Taylor said that "small business owners may be too ignorant to ever be PCI compliant." His choice of the word "ignorant" is, at least in part, probably due to a desire to be somewhat controversial. Discussions that aren't controversial in some way really aren't very interesting, so he was probably trying to make his comments sound more interesting than they might have been otherwise. I don't think that the word "ignorant" was really appropriate in this context, however.

People in the payments industry seem to forget that small business owners aren't in the payments business. They're in the business of selling books, plastic tubing, greeting cards, or whatever they sell. Keeping their business running takes all of their time, and they shouldn't have to worry about the details of how payments are processed. The job of the payments processing vendors is to make sure that this is as easy as possible, and that it's done in a way that doesn't compromise any sensitive information.

A big part of how to do process payments securely involves the use of encryption. And just like you really can't expect a small business to be an expert in payments processing, you really can't expect payments vendors to be experts in encryption. That also requires a level of understanding that's not really that relevant to their business. What payments vendors need from encryption vendors is an easy-to-use solution that they can integrate into their offerings as easily as possible. They don't want to worry about the arcane details behind how the encryption works, they just want to use it. Worrying about all of the details that are needed to do the encryption securely is the job of encryption vendors.

It wouldn't make much sense for an encryption vendor to say that payment processors are too ignorant to ever use encryption securely. If payments vendors can't use encryption, it's the fault of the encryption vendors who aren't making the products that their customers need. Similarly, if small business owners can't become PCI compliant, it's not entirely their fault. Most of the blame should probably go to the acquiring banks and card brands that are creating requirements that aren't practical for small businesses to meet and to payment processing vendors who aren't providing small businesses the tools that they need to process payments securely.

The business of America is business, not payment processing. Let's not lose sight of that.

Friday, 10 July 2009

Gresham's law

Thomas_Gresham

There's a little-known observation called "Gresham's law" that may or may not have some relevance to today's security market. Gresham's law says roughly that the introduction of debased currency will tend to make non-debased currency disappear from circulation when people tend to hold onto the currency with more intrinsic value and spend the rest. It's named for Thomas Gresham, an advisor to Queen Elizabeth, but Gresham wasn't the first to note this behavior. Nicole Oresme described it in his 1357 book De origine, natura, jure et mutationibus monetarum.

This principle doesn't apply to all cases where low-quality and high-quality alternatives compete in the marketplace. It also needs some sort of regulation to make it happen. In the case of coins, there are laws that say that both the non-debased coins and the debased ones are worth the same amount, so the non-debased ones tend to disappear from circulation. In cases where there is no requirement that the low quality alternative be worth the same as the high quality alternative, Gresham's law doesn’t predict that the high-quality alternative will disappear.

In the case of the PCI DSS, however, we may have a situation where Gresham's law does hold. This is because compliance officers are often looking for a solution that lets them pass their PCI DSS audit instead of a solution that actually provides strong and useful security. The PCI DSS now acts like a regulation that makes the high-quality and low-quality products equal because they both will let their users pass their PCI DSS audit. If this is the case, then we would expect high-quality security products to disappear, leaving their low-quality competitors as the only alternatives. This hasn't happened yet. Should we expect it to happen soon?

According to "Gresham's Law or Gresham's Fallacy," a paper recently written by Arthur Rolnick and Warren Weber of The Federal Reserve Bank of Minneapolis, Gresham's law isn't as true as we might think. Here's the abstract for their paper that sums up what they found:

The claim that bad money drives out good is one of the oldest and most cited in economics. Economists refer to this claim as Gresham’s law. Yet despite its seemingly universal acceptance, this claim does not warrant its status as a law. We find it has no convincing explanations and many overlooked exceptions. We propose an alternative hypothesis based on the costs of using a medium of exchange at a nonpar price: small-denomination currency undervalued at the mint tends to disappear from circulation while large-denomination currency usually circulates at premium. Examining a variety of historical episodes when market and legal prices were different, we find our “law” can explain history much better than Gresham’s.

Like most things, the applicability of Gresham's law turns out to be more complicated that you might first think, and it takes a more careful understanding of a particular situation to predict exactly what will or will not happen. If this is the case, it looks like we may not have to worry about high-quality security products disappearing because of the PCI DSS.

Thursday, 09 July 2009

The hard part of the PCI DSS

Most of the requirements of the PCI DSS are really just information security "best practices." The only real exception is Requirement 3: protect stored cardholder data. The easiest way to meet this requirement is by using encryption, but many businesses that need to handle sensitive cardholder data seem to have trouble doing that. That's not too surprising. Encryption is legendarily hard and expensive to use, and there are still some encryption technologies out there for which this is true. On the other hand, there are also lots of encryption technologies for which this isn't true. Voltage makes some of these. A few other vendors do also.

Because there are encryption technologies that make it easy to meet PCI DSS Requirement 3, I was surprised to read a recent report, "Lessons Learned: Top Reasons for PCI Audit Failure and How To Avoid Them" by QSA VeriSign. In the PCI DSS assessments that VeriSign does for its clients, Requirement 3 is the area that people fail the most frequently: a full 79 percent fail it. I found that surprising. Maybe those people should talk to someone at Voltage.

Wednesday, 08 July 2009

Friendly names for certificates

The naming of keys is a difficult matter,

It isn't just one of your holiday games;

You may think at first that I'm mad as a hatter

When I tell you a key needs to have a good name.

It should be the one that its users use daily,

Such as "This one's for email" or "For VPN,"

Such as "This one's for signing," or "Intranet access,"

All of them sensible everyday names.


If you use X.509 certificates, one problem that you might encounter is figuring out which certificate to use. Every certificate that I have has the same name for me in them, so unless I know the expiration date or the serial number of a particular certificate (which I never do), it's always a matter of trial and error when I try to figure out which one I need to use.

In Internet Explorer, however, there's a field for "friendly name" in the Certificates window that you can open by going to Tools→Internet Options→Content→Certificates. Once you get there, however, it's not obvious how to change the friendly name. You can do this if you select the Personal tab and then select a particular certificate. Then go to View→Details→Edit Properties, and you'll see a field where you can enter both a "friendly name" and "details" for your certificate. If you do that, you'll be able to easily tell the difference between your certificates without having to know their expiration dates.

In Firefox, it doesn't seem as easy. If you go to Tools→Options→Advanced→View Certificates, you can see which certificates you have, but I can't find a way to assign a friendly name to them. It looks like you'll just have to remember whether you need to use the one with serial number 48:CB:F9:8D:65:9E:B5:84:5F:AF:A4:A8:B4:08:E9:D1 or the one with serial number 22:4D:02:80:BC:DE:AD:E7:73:81:BF:6C:74:8A:B4:BF when you want to encrypt email. Or you can just try to remember the expiration date. Neither way seems to be very useful.

Tuesday, 07 July 2009

Names for big numbers

Cryptographic keys don't sound that big because we usually talk about how many bits they have, which is really just the logarithm of a number. The numbers that we're dealing with in cryptography are big. Really big. They're so big that most people probably haven't heard of the names for them.

A 128-bit key, for example, represents a number that's roughly 1039, or about a hundred undecillion. A 256-bit key represents a number that's roughly 1077, or about a hundred quattuorvigintillion. The numbers that are used in public-key algorithms are even bigger. A 1,024-bit key represents a number that's about 10308, or a hundred thousand centillion.

If you're using a 15,360-bit RSA key, like you need to do to get the same strength as a 256-bit AES key, you have a number that's roughly 104,623, or a trecentillion trecentillion trecentillion trecentillion trecentillion quintrigintillion. At that point, the words don't seem to work very well and it's probably better to just think of the number as having 15,360 bits.

Monday, 06 July 2009

Why do people work on open-source software?

As every individual, therefore, endeavours as much as he can both to employ his capital in the support of domestic industry, and so to direct that industry that its produce may be of the greatest value; every individual necessarily labours to render the annual revenue of the society as great as he can. He generally, indeed, neither intends to promote the public interest, nor knows how much he is promoting it. By preferring the support of domestic to that of foreign industry, he intends only his own security; and by directing that industry in such a manner as its produce may be of the greatest value, he intends only his own gain, and he is in this, as in many other cases, led by an invisible hand to promote an end which was no part of his intention.

Adam Smith, An Inquiry into the Nature and Causes of the Wealth of Nations

It's not hard to create a plausible economic model that explains why open-source software exists. One argument is that enterprise software has a minimum cost associated with developing and marketing it. These costs include the engineers that write the software, the people that test it, the sales engineers that install it at customer sites, the sales people who help customers through the sales cycle, the marketing people who let customers know what's available to solve their problems, etc. The total cost of all of these isn't cheap, so if a particular application isn't worth more than that fixed cost, it can't be the basis for a profitable business.

But if there's a demand for something at a lower cost, someone will probably find a way to make it happen. It's much like minimum-wage laws. There are some jobs that just aren't worth the minimum wage, and when this is the case, people find ways to get those low-value jobs done, even if it involves breaking the law. They might hire illegal immigrants for less than the minimum wage. Or they might agree to pay someone cash to avoid the taxes that, from the point of view of the employer, are also part of their cost of labor.

On the other hand, an argument like this only describes market forces, Adam Smith's invisible hand that makes things happen. It might explain why open-source software exists, but doesn't really tell us why any particular person would make a decision to work on open-source software. That may require a different explanation. Here's one, and it's based on modeling contributing to open-source software as a tournament. It's much like the model that Stephen Levitt and Stephen J. Dubner used in their book Freakonomics to explain why so many drug dealers earn roughly the equivalent of the minimum wage.

It turns out that almost all drug dealers don't make very much money. These are the ones that actually sell the drugs on the streets. The real money is in managing an organization of drug dealers, and Levitt and Dubner describe how the entry-level drug dealers tolerate the low pay because they hope to eventually become one of the managers. In this sense, drug dealing can be modeled as a tournament that selects the most fit drug dealers and promotes the winners into the more lucrative jobs.

Maybe this model also applies to open-source software. After all, being a recognized contributor to a big, successful open-source project is also a good way to get a high-paying programming job. So it might be the case that the programmers who donate their time to open-source projects do this in the hope of becoming an open-source superstar one day. This doesn't sound obviously false, and it does give you a good way to start a conversation: "Did you know that open-source programmers are like drug dealers?"

Thursday, 02 July 2009

Is AES secure enough?

There's been a lot of discussion in the past day or so about the security of the AES encryption algorithm. There's a paper by Alex Biryukov and Dmitry Khovratovich that describes an attack against AES-256 that can be done in much less time that brute-force exhaustion: 2119 trial encryptions instead of 2256. That's a huge difference. Is AES now so weak that we need to worry about it?

Absolutely not.

The attack that Biryukov and Khovratovich found also takes lots of data for it to work. Their attack that can be done in 2119 time also takes the same amount of data: 2119. That's a lot of ciphertexts.

The best estimates that I've seen say that the entire world produces a few exabytes of data per year. This estimate is actually from a few years ago, so it isn't that current. Let's suppose that the amount of data being created doubles each year. If that's the case, we probably have a few zettabytes of data being created per year right now.

A zettabyte is 1021 bytes, or about 270 bytes. That's a lot of data, but it's still a long way from 2119 ciphertexts. This means that an attack that takes that much data is totally impractical. Even if we assume that all of the data in the world is being used in an attack that's trying to recover a single AES key, it's still not enough. It would take roughly the amount of data that the entire world will produce in the next 50 years or so to get the amount of data that we'd need. And even then, the amount of time required is still prohibitive.

It's interesting that Biryukov and Khovratovich found a significant weakness in AES, and their work may give useful insights into how to design better symmetric encryption algorithms, but it's not the sort of weakness that anyone can actually use to actually recover data that's encrypted with AES.

Wednesday, 01 July 2009

The Virginian goes to the RSA Conference

Owen Wister's 1902 novel The Virginian was one of the first books that might be called a "western." It essentially defined the western genre and established many of what are now its clichés. One of my favorite parts of this book is when the Virginian ends an uprising by disgruntled cowboys by beating their leader in a tall tales contest. I'm often reminded of this showdown when I hear claims made by the marketing departments of security vendors, and it's entertaining to think of how a similar epic battle might take place today.

Imagine we're at next year's RSA Conference, drinking the free beer that some generous vendor has provided. A CISO from a big company is here. He's never been to the show before doesn't realize that he'll be swarmed by vendors if he attends an event like this one. To get his attention, the sales and marketing people from lots of security vendors make more and more outlandish claims about their technology.

There's someone there from a vendor that makes products that are designed to counter the insider threat. After a beer or two, the people at the party have forgotten that there's absolutely no basis for the claims that most attacks come from insiders, so they listen to him. He quotes some statistics from analyst reports that nobody has heard of and ends up with the estimate that over 150 percent of attacks come from insiders.

People are impressed, but take a quick break to get another beer. Surely someone can do better than that.

Next is someone from a tokenization vendor who claims that tokenization is actually more secure than encryption. Encryption is hard to understand when you've had a good night's sleep and a couple cups of coffee, and the free beer has made sure that nobody at the party is able to even come close to understanding it now. The lone cryptographer who's at the party is impressed by the daring that it took to make that claim, even to a room full of people drinking free beer, so he doesn't challenge it.

Unable to think of a way to one-up this, the other vendors gradually walk away, leaving the tokenization vendor alone with the CISO.

Voltage Data Breach Index

  • Grab the Voltage Data Breach Index

September 2010

Sun Mon Tue Wed Thu Fri Sat
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30