« May 2009 | Main | July 2009 »

June 2009

Tuesday, June 30, 2009

Why businesses aren't profitable

Conventional wisdom tells us that the global recession that we're seeing now is the result of the recent problems with financial markets. A closer look at IT industry analyst reports, however, might tell us that there's another reason that businesses aren't doing as well as they'd like to, and that's because of their use of IT. In particular, if you add up the TCO estimates for all of the IT products that a typical business uses, you'll find that the cost of their IT systems is much greater than their revenue. In other words, there's absolutely no way that a business can both use IT systems and be profitable at the same time.

This leads me to believe that one of two things has to be true. One possibility is that the TCO estimates that we often see aren't really very meaningful. Another possibility is that it's just impractical to use modern IT products because they cost more that they're worth. If the first of these two is true, then we have nothing to worry about, except perhaps wasting lots of time on TCO estimates that don't really tell us anything useful. If the second is true, then the global economy is doomed until we revert to more primitive, pre-dot-com-era technologies. Which one is more likely?

Monday, June 29, 2009

Hardware Engineers and Their Software

Over the years I have dealt with a lot of different security hardware, from tokens (PCMCIA cards, USB devices, etc.) and coprocessors, to crypto accelerators and "HSMs" (Hardware Security Modules). With the hardware product comes the software interface. If you're building a product that wants to take advantage of the chip or device, your code will call the supplied set of functions that will actually send the messages to the hardware informing it of the operations to perform.

Over the years I have complained bitterly about some of the software interfaces I have come across. I have seen some software that I thought was awful. I would be angry that a company would build a device, invite people to use it, then make the software interface so terrible.

Let me interject something here. I write software for a living. Most of the software I have written has gotten good reviews, but I must admit, there have been some bad reviews in there. Or sometimes the review was not so much bad, as the reviewer was a bit frustrated by complexity. For example, I read part of a book about Crypto APIs, and one of the APIs was BSAFE. Although I was not the original designer of BSAFE, I was a major contributor to that product. Furthermore, I'm on board with the design. I think it is a great design. (In fact, the Voltage toolkit is my design and the similarities to BSAFE are very evident, although I think I did make some improvements). Anyway, this book said that one of the best things about BSAFE is that it could do anything you could possibly want to do in crypto. The worst thing about it is that it can do anything you could possibly want to do in crypto. That made it complicated. Simple tasks were not necessarily simple. The point is, when I complain about someone else's software, I fully admit that someone can look at my designs and complain about aspects of it. I'm not saying my code has no flaws or problems for the customer.

But some of that software from the hardware vendors was terrible.

Well, now I might not be so quick to label it "terrible". Recently, I was talking with a hardware engineer. He said something that I think explains a lot of the "awfulness". He said the H/W person will design the software interface to mimic what the hardware does. Put another way, the difference between a hardware engineer and a software engineer is that the H/W engineer designs software based on what the hardware can do, the S/W engineer will design software based on what the user wants to do.

Who is the user? For the security hardware, the user is the application developer. That's the same user of the Voltage toolkit. So let's look at an example of two interfaces performing SHA-256.

Let's say you have an app that needs to perform SHA-256. Maybe it's for a digital signature, or maybe for a checksum, or maybe it's for a key derivation function. You need to digest something.

The Voltage toolkit says, "build an object that can perform SHA-256, then call Init, Update, Final. Provide the data to digest as a byte array. If you need to digest something else, keep the object around and reuse it, just call Init again to start over. If you have so much data that you want to perform the operation in parts, or maybe the data is from multiple sources and you don't want to collect it into a single buffer, call Update, Update, and so on until you have no more data, then call Final." In the end you have 32 bytes, the digest.

The interface to a hardware device might say, "provide 64 bytes and the eight h-values and the return is the updated h-values".

Great, what does that mean?

It turns out there is some rationality to that interface. You see, SHA-256 works this way.

  • Start with 32 bytes, represented as eight 32-bit words, these are the h-values. Give them an initial value.

  • Take the first 64 bytes of input and perform a function on them, call this function SHA256Transform. This SHA256Transform combines the h-values with the 64-byte input to update the h-values.

  • Now operate on the next 64 bytes of input, and the next, and so on. Each call to SHA256Transform will use the h-values that were the result of the last call and combine them with the new input to create a new set of h-values.

  • When there's no more output, apply padding. If the input data was not a multiple of 64 bytes, you need to pad so that you have a complete block on which the algorithm can operate (SHA256Transform can only operate on 64 bytes, no more, no less). Even if the input was a multiple of 64 bytes, you must apply padding (the padding scheme increases the security of the algorithm).

  • After the padded block has been processed, the result is the 32 bytes that make up the last set of h-values.

Suppose you're a hardware vendor and you want to make a chip that can perform SHA-256. You'd really only do the SHA256Transform in hardware. Your chip would not do the work of breaking the input data into 64-byte blocks, nor would it perform the padding. The time-consuming and compute-intensive operation is the SHA256Transform. All the chip would be built to perform is the Transform operation.

When it comes time to build the software interface, what do you do? Do you build something that takes in byte array input and returns a digest (as the Voltage toolkit does)? Or do you put together a function call that is a "mirror image" of the hardware, namely the SHA256Transform?

A software engineer would probably say that while a SHA256Transform can be useful, that alone does not generate a digest. Most users will want to simply call a routine that takes in input and generates a digest. So the right thing to do is to perform all that "bookkeeping" as well as the transform.

The hardware engineer might say, "What do you mean? The hardware does the Transform function, therefore the interface should be the Transform function. Why should the interface do the bookkeeping if the hardware doesn't?"

The answer is that someone will have to do the bookkeeping. If the hardware interface does not do it, then the customer will. Or more precisely, each customer will. Suppose the hardware vendor has 50 customers. Each customer will write their own version of the bookkeeping code. That's 50 versions of the same code, quite a duplication of effort. Everyone has to "re-invent the wheel". Doesn't it make much more sense to do that work once?

This does not explain all the awful software I've seen come from the hardware vendors, but it does explain some of it. I don't agree with the philosophy, "The software interface is designed to do what the hardware can do, no more, no less." I suspect most people would think it is not the right philosophy a business venture should take. But at least I can understand it.

Challenges with point-of-sale devices

The point-of-sale devices that retail operations use need to be very easy to use because they're often used by relatively untrained people. I saw an example of this a few years ago when I went to the local sporting goods store to pick up a package of white athletic socks.

I handed my credit card to the clerk in the store, who swiped it through his POS device, keyed in the amount of the transaction and waited for an authorization code to appear. There was apparently a problem with the connection of the POS device to the authorization service. What appeared on the LCD screen of the POS device was a pattern of random LCD segments that looked something like this:

Image001  

The clerk didn't recognize this as anything other than a valid authorization code, and proceeded to carefully copy this pattern into the space for the authorization code onto the credit card payment slip. Apparently, nobody had told him to expect a series of numbers for the authorization code.

I left the store wondering what other types of mistakes had happened when the store processed credit cards. I doubt that I saw the worst mistake that an untrained person ever made.

Friday, June 26, 2009

Ancient format-preserving encryption in FIPS 74

Although the format-preserving encryption algorithms that Phil Rogaway invented are very useful for encrypting data in a way that makes integrating encryption with legacy applications and complex IT environments particularly easy, it turns out that he wasn't the first to do this. An old, retired NIST document that describes how to use the DES encryption algorithm actually described a way to do it.

The NIST document that does this is the retired FIPS 74, "Federal Information Processing Standards Publication 1981 Guidelines for Implementing and Using the NBS Data Encryption Standard." This document is so old that it even refers to NIST as the National Bureau of Standards (NBS), a name that hasn't been used since 1988. FIPS 74 was withdrawn in 2005, so this technique should probably be considered obsolete. Here's how it works.

Suppose that we want to encrypt a credit card number that has 16 digits. The FIPS 74 technique encrypts one digit at a time. To do this it creates four bits that are XORed with the four bits that represent a digit. These four bits come from the lowest four bits of the output of a DES encryption. If the result of the XOR operation isn't a valid digit, then the next lowest four bits of the DES output are tried, etc. If all 16 of the four-bit values in the DES output don't give a valid digit, then the binary value 1001 is XORed with the plaintext input to get the ciphertext output. A new output of a DES encryption is used for each digit. Here's a diagram that shows how this works. This can easily be extended to more than digits. FIPS 74 describes techniques for encrypting both alphanumerics and arbitrary alphabets.

DES FPE   

This technique seems to be fairly slow. Each digit it will take an average of (10/16)(1) + (6/16)(10/16)(2) + (6/16)2(10/16)(3) + ... + (6/16)14(10/16)(15) + (6/16)15(16) = 1.6 DES encryptions to encrypt, so that it will take an average of 25.6 DES encryptions to encrypt a single 16-digit credit card number.

Curiously, it looks like NIST hasn't defined a similar way to do format-preserving encryption with AES. Maybe there's really no need to do that because there exist other techniques that have strong proofs of security. The fact that a format-preserving encryption technique was included in FIPS 74 makes me think that there was an interest in the technology well before the PCI DSS increased the level of interest in encryption. Is the FIPS 74 technique used anywhere these days?

Thursday, June 25, 2009

R. C. Matheson gets a certificate

Web site.

Find page.

Click "Login."

Enter Username.

Enter Password.

Click "OK."

Click "certificates."

Click "request certificate."

Click "request."

Click "request."

Yes, again.

Doesn't work.

IE8 problem.

Try Firefox.

Repeat steps.

First 10.

It works.

Click "next."

Select email.

Click "next."

Click "next."

Another duplicate.

Click "accept."

Click "next."

Click "finish."

Wait.

Check email.

Open email.

Click link.

Click "OK."

Certificate installed.

Now what?

Wednesday, June 24, 2009

Gentry's homomorphic encryption

There's been a lot of talk lately about Craig Gentry's new homomorphic encryption scheme that’s based on lattice cryptography. The discussion that I've heard has been essentially one of two types. Some people are saying, "Wow, that's incredibly cool. And his proof is cool too." Others are saying, "What the heck is homomorphic encryption, anyway?" What follows is for the people who are saying the latter of these two.

The term "homomorphic" is used in homomorphic encryption because of the connection to a homomorphism, which is a function that roughly lets us switch the order of doing a mathematical operation and evaluating the function.

For an example of this, consider the function f(x) = ex. This function has the property that ex+y = exey, or that f(x + y) = f(x) * f(y), which means that we can roughly switch the order of implementing the mathematical operations and applying the function. In the first case, we apply the operation + first and then the function f, and that's equal to what we get from applying the function f first and then the operation *. A function that behaves that way is called a "homomorphism."

Now suppose that f is an encryption function that happens to be a homomorphism. If that's the case, you could implement an operation on plaintext by doing a different operation on ciphertext, and you could even do this without having to decrypt the ciphertext. From the point of view of most uses, this is a bad property because it lets an adversary change a ciphertext into one that decrypts into an entirely different plaintext, but that doesn't stop such functions from being inherently interesting.

If we have an RSA public key (n,e) and encrypt a message m by calculating c(m) = me mod n, then we have that c(xy) = (xy)e = xeye = c(x)c(y). This means that we can switch the order of encryption and multiplication and get the same result, so RSA is a form of homomorphic encryption. We can take advantage of this fact, for example, by changing a ciphertext c(x) into one that decrypts into 2x by multiplying c(x) by c(2), and we can do this without decrypting the ciphertext c(x).

On the other hand, this only works for multiplication and doesn't work for addition. For RSA encryption, we don't have that c(x + y) = c(x) + c(y). If we had that, we could do even more tricks with ciphertext and have something useful happen to the plaintext.

The reason that Gentry's work is interesting is that it lets you do just that. It's homomorphic for both addition and multiplication. His scheme lets you do an arbitrary number of additions and multiplications, with the only requirement being that a limit on the number of multiplications needs to be fixed when you generate a user's public key.

It also appears that Gentry's scheme is totally impractical. This means that we won't be seeing a start-up that makes products based on Gentry's homomorphic encryption scheme any time soon, but that shouldn't stop us from admiring the elegance of his invention.

Tuesday, June 23, 2009

NIST gets in the Cloud Computing game

Wall_cloud_with_lightning_-_NOAA

It looks like NIST is now getting interested in cloud computing. I found it somewhat interesting that the information on their web site about Cloud Computing is provided by their Computer Security Division, possibly indicating that security is one of the most important issues that need to be addressed before Cloud Computing can become more accepted.

Apparently, it's even tricky for NIST to sort through some of the claims that people make about Cloud Computing. According to the NIST's "Presentation on Effectively and Securely Using the Cloud Computing Paradigm v20", for example, there's a fairly wide range of estimates for the potential cost savings that Cloud Computing can allow: from 18 percent to 90 percent. That's a fairly big range.

Something that saves you 90 percent is certainly a much bigger deal than something that can save you 18 percent. If I had to bet, I'd say that the estimates of 90 percent are way off, and the real savings that are possible are much lower.

Monday, June 22, 2009

It could be worse

Heavy_Rain

Reading academic papers on cryptography is hard. It's not like reading music, where you can sit down at the piano and hack your way through a new piece of music. Sure, you'll make lots of mistakes your first time through, and it may sound so bad that you want to do it well away from other people, but you'll get through it fairly quickly. Reading papers on cryptography is much worse, at least it is for me. Getting through a paper and understanding what it really says usually takes me at least a few days. Maybe more if I don't get a good night's sleep or drink enough coffee.

Some papers take much longer than that to understand. Back when Andrew Wiles proved Fermat's Last Theorem, I realized that I'd never understand Wiles' proof, so I tried something much less ambition: to read and understand a paper that explained exactly why what Wiles proved, the Taniyama-Shimura conjecture for semistable elliptic curves, implies that Fermat's Last Theorem is true. Even that turned out to be too hard, and I eventually gave up on it.

I recently learned that it's not just papers on cryptography that are hard to understand, but papers on other topics may hard to understand for an entirely different reason, and that's because they're poorly written. There are apparently so many poorly-written academic papers that the editors of Philosophy and Literature felt the need to run a bad writing contest that showcased some of the worst academic writing around.  Here are some of the past winners of this contest:

The move from a structuralist account in which capital is understood to structure social relations in relatively homologous ways to a view of hegemony in which power relations are subject to repetition, convergence, and rearticulation brought the question of temporality into the thinking of structure, and marked a shift from a form of Althusserian theory that takes structural totalities as theoretical objects to one in which the insights into the contingent possibility of structure inaugurate a renewed conception of hegemony as bound up with the contingent sites and strategies of the rearticulation of power.

Judith Butler, "Further Reflections on the Conversations of Our Time,” Diacritics

Total presence breaks on the univocal predication of the exterior absolute the absolute existent(of that of which it is not possible to univocally predicate an outside, while the equivocal predication of the outside of the absolute exterior is possible of that of which the reality so predicated is not the reality, viz., of the dark/of the self, the identity of which is not outside the absolute identity of the outside, which is to say that the equivocal predication of identity is possible of the self-identity which is not identity, while identity is univocally predicated of the limit to the darkness, of the limit of the reality of the self). This is the real exteriority of the absolute outside: the reality of the absolutely unconditioned absolute outside univocally predicated of the dark: the light univocally predicated of the darkness: the shining of the light univocally predicated of the limit of the darkness: actuality univocally predicated of the other of self-identity: existence univocally predicated of the absolutely unconditioned other of the self. The precision of the shining of the light breaking the dark is the other-identity of the light. The precision of the absolutely minimum transcendence of the dark is the light itself/the absolutely unconditioned exteriority of existence for the first time/the absolutely facial identity of existence/the proportion of the new creation sans depth/the light itself ex nihilo: the dark itself univocally identified, i.e., not self-identity identity itself equivocally, not the dark itself equivocally, in “self-alienation,” not “self-identity, itself in self-alienation” “released” in and by “otherness,” and “actual other,” “itself,” not the abysmal inversion of the light, the reality of the darkness equivocally, absolute identity equivocally predicated of the self/selfhood equivocally predicated of the dark (the reality of this darkness the other-self-covering of identity which is the identification person-self).

D.G. Leahy, Foundation: Matter the Body Itself

So I suppose that it could be worse. Instead of having to read and understand papers on cryptography, I could have to read and understand papers in some other field. Based on some of the entries into Philosophy and Literature's bad writing contest, I may be fairly lucky.

Friday, June 19, 2009

Security through simplicity

If you're trying to estimate the security provided by a particular key management technique, you can essentially ignore the chances of any cryptographic algorithm being defeated. The chances of a peer-reviewed algorithm being weak, particularly when it has a rigorous proof of security is extremely small. On the other hand, the chances of a weakness in some component of a key management system is significant. It's very hard to implement and keep a secure configuration, and even if you can do this, you'll still exposed to bugs in the software and hardware that implements the key management system. You can formalize this if you really want to by writing an expression for the security of a system in terms of the security of each of its components, but even without explicitly writing that down, it should be fairly clear that simpler systems are inherently more secure than more complex ones.

Using this principle, let's look at two architectures and see which is more secure: one that implements X.509-based PKI or one that implements identity-based encryption. Here's a diagram that shows the components of a PKI system and how they interact when Alice sends an encrypted message to Bob. For this system to operate securely, each of its components needs to operate securely.

PKI  

Now here's a diagram that shows the components of an IBE system and how they interact. Alice, who wants to send a message to Bob, just needs to get a set of public parameters from a secure key server that's usually called the private key generator, or PKG. Once she has these, she can encrypt messages to Bob. She can actually encrypt them to anyone else also, so she actually only needs to do that once. Once Bob receives Alice's message, he gets the key needed to decrypt it from the PKG. You don't need to do key look-up with IBE, so there's no component that acts like the LDAP directory does for a PKI, and because IBE keys are calculated as needed, there's also no need for a component that acts like a key archive server.

IBE 

This system has only one-fourth the number of components of a system that implements PKI, which means that it has fewer ways to fail, which means that's it's more secure. Simpler is definitely better.

Thursday, June 18, 2009

SSNs versus CCNs

Social_security_card

There's lots of talk these days about securing credit card numbers. As I've mentioned before, people should have more of an interest in keeping information like their Social Security number protected. If your credit card number is compromised you can always have the old card canceled and a new one issued to take its place. With your Social Security number, however, you can't do that. Once it's compromised, there's no way to get it back.

It turns out that there's another big difference between credit card numbers and Social Security numbers, and that concerns how often they're exposed in data breaches. According to the information in the OSF's database of data breaches, incidents that expose a Social Security number are far more common than data breaches that expose credit card numbers. The last time I looked at the data in this database, for each incident that exposed credit card numbers there were almost seven incidents that exposed Social Security numbers.

Maybe the government should start a program like the PCI DSS to ensure that anyone handling Social Security numbers needs to have some basic security mechanisms in place.

Wednesday, June 17, 2009

The 2010 Key Management Summit

The web site for the 2010 IEEE Key Management Summit is now up. This event will be held on May 4-5, 2010 in Lake Tahoe, Nevada, and will be collocated with the IEEE Symposium on Mass Storage  Systems and Technologies.

We're already looking for speakers for this event and hope to finalize the list in roughly November of this year. So if you're interested in presenting at this event, it's not too early to send in your submissions.

Tuesday, June 16, 2009

More Bayesian thoughts

Encryption isn’t as widely used as it could be, and we might be able to understand this from the point of view of Bayesian statistics. Here’s why.

Much of the negative feelings that people have about encryption can probably be explained by the bad reputation that encryption has, and this reputation is probably justified. Many people who used some of the encryption products that were available back in the dot-com era are probably still suffering from the trauma that using encryption caused them. Older encryption products were hard to use, which made them expensive to support.

Even security specialists couldn’t use some of the early encryption products. When I worked at one of those dot-com era companies that made those early encryption products, I remember being stunned by a demo of an application that we had developed to let users authenticate to a server using the WTLS protocol. It was terrible, and this was from the point of view of a person who thought that using X.509-based client certificates was easy.

Many people suffered through using the early encryption products, and they’ve probably developed a significant bias against using encryption because of this. This is perfectly understandable. From a Bayesian point of view, they’ve built a knowledge base that tells them that encryption is hard and expensive, and this will affect how much they believe that newer products are actually easier to use.

Suppose that you tell a person who used older encryption technologies that the newer ones are much easier to use. If they're using Bayesian reasoning, they probably won't believe you, and this is because of what they're really estimating when you tell them this.

Let E represent how much someone believes that encryption is actually easy to use, N represent the claim that newer encryption technologies are easier to use than earlier ones, and K represent experience with the older technologies. If people are using Bayesian reasoning, instead estimating P(E|N), they're really estimating P(E|KN). If you’re really motivated, you can work through calculating P(E|KN), much like we did in an earlier post, and if you do this you’ll find that we can’t expect people to believe that the newer technologies are actually usable.

The problem is that they are.

Encryption has become much easier to use in the past five years or so, and it’s now feasible to use in many situations where it wasn’t practical at all in the past. That means that there’s now a solid business case for its use, even though people may not actually believe it. After all, the people who are making decisions about using encryption today are the same people who had to suffer through using the bad implementations of it that we had in the dot-com era. And because their experience has told them that encryption is hard and expensive, we can expect them to not believe that newer technologies are really any better.

One way to address this problem may be for researchers to look at the newer technologies and try to measure their usability. The famous paper “Why Johnny Can’t Encrypt” was followed by “Why Johnny Still Can’t Encrypt” a few years later, but the later work didn’t look at the newer technologies. Instead, it looked at how much the same technology used in the earlier study had improved. A more useful study might look at products like Voltage’s SecureMail, which is much easier to use than the previous generation of products.

The participants in the study that was described in “Why Johnny Can’t Encrypt” had to do some fairly low-level work with their keys. A user of SecureMail, on the other hand, doesn’t need to worry about those details at all. If they can click on the “Send Secure” button instead of the “Send” button in their email client then they can send encrypted messages. I can’t believe that any study would find that particular task too difficult for most people to do.

The problem may be finding a way to get people to believe that it’s really that simple. If they’re using Bayesian reasoning, however, they might not even believe it after they’re shown it. How do you work around that?

Monday, June 15, 2009

How to stop some data breaches

I just read an interesting paper, "Nobody Sells Gold for the Price of Silver: Dishonesty, Uncertainty and the Underground Economy," by Cormac Herley and Dinei Florêncio of Microsoft Research. Herley and Florêncio describe an interesting way to fight data breaches, and that's by contaminating the data that's available for sale by cyber-criminals. The reason that what they propose would probably reduce data breaches is based on the work that won George Akerlof the Nobel Prize in Economics in 1991.

Akerlof's insight was that if buyers can't tell the quality of what they're buying, then markets can fail. His favorite example of this is a hypothetical market for used cars.

Suppose that used cars are worth $10,000 if they're in good condition, but half of them require $1,000 in repairs and that buyers can't tell which cars are which. If this is the case, then a rational buyer would only pay $9,500 (= 0.5 * $10,000 + 0.5 * $9,000) for a used car. But at this price, the owners of the high-quality cars won't sell, because theirs are worth $10,000. This leaves only the lemons on the market, which will further depress the price of used cars, and will lead to the failure of the market for used cars unless buyers have some way to distinguish the good cars from the lemons.

Apparently, there are also quality uncertainty issues in the market for stolen credit card numbers, and this leads to a way to make this market fail. If we can add enough bogus credit card numbers into the market for stolen credit card numbers, if Akerlof's model is right, this will lead to a downward spiral in how much cyber-criminals can get stolen credit card numbers. If we can reduce this low enough, it will no longer be worth their effort to sell stolen credit card numbers and the market will essentially disappear. So flooding the underground market with bogus credit card numbers might be a way to stop some data breaches, in particular, the ones that are caused by hackers actively compromising a system to steal credit card numbers. 

That's an interesting idea, but I doubt that we'll see credit card companies doing it any time soon.

Friday, June 12, 2009

Signed malware

Associating a digital signature with software can be useful because it gives you a reasonable assurance of the source of the code. But because it’s easy to misuse a code signing certificate by using it to sign malicious code, there’s absolutely no reason to believe that all signed code isn’t malicious in some way. Exactly how common is signed malware?

The Microsoft Security Intelligence Report gives lots of interesting statistics about malware, and version 5 of this document, the one that covered January through June 2008, had some information about signed malware. Here’s how the Microsoft report described what they saw:

In the first six months of 2008, the MMPC received reports of about 22 million instances of distinct malware files, of which about 173,000 were distinct malware files with code signatures. Of this malware with code signatures, about 38,000 had signatures that were not valid for signing code, so approximately 135,000 validly signed malware files were reported to Microsoft. In total, approximately 0.6 percent of detected malware was validly signed.

That’s useful information, but what would be even more useful is an estimate of the probability of having malware given that software has a valid digital signature. That information isn’t available in the Microsoft report.

Thursday, June 11, 2009

Security as entropy

It turns out that there may be a connection between security and entropy. Here’s why this is the case.

Let’s assume two things. First, let’s assume that security is a function S of the exploitability of the technologies that are used to implement it, where exploitability is some sort of probability measure P. Next, let’s assume that security follows these general rules:

  1. The security provided by two technologies when they’re both used is greater or equal to the security of each of the components when they’re used by themselves
  2. If two technologies are independent then the security provided by the two technologies when they’re used together is equal to the sum of the security provided by each of the technologies
  3. The security provided by any technology is non-negative

If security follows these rules, it turns out that it has to be a function that looks like S(P) = -log(P). We can make the base of the logarithms anything that we want, but it has to be of this general form.

Now suppose that a particular technology X can perform n different security functions with probabilities p1,p2,…,pn. If this is the case, then the average or expected level of security provided the technology X is 

Image001

which is just another way of writing entropy.

Some people that I've mentioned this connection to tell me that it's is obvious. Others don't think that it's that obvious at all. Just like you can think about Shannon entropy as what you don't know about information, you might be able to think of security as what adversaries don't know about your information, although that might be going a bit too far.

I doubt that there's anything profound about this observation, but because of this connection to entropy, some of the things that we observe about data breaches may turn out to not be coincidences. There are some things that seem to make sense if we think of them as coming from situations where entropy is maximized, for example. Maybe that'll be the topic of a future post.

Wednesday, June 10, 2009

A Bayesian approach to understanding tokenization

I had an interesting discussion last week about why most security experts don’t believe them when tokenization vendors claim that their technology is more secure than encryption. It seems that understanding this is good example of an application of Bayes’ theorem. Here’s why.

Let E represent the belief that encryption is the most secure way to protect sensitive information. Let K represents the knowledge that we have of the effectiveness of security technologies and T represent the belief that tokenization is more secure than encryption.

What we’re interested in is the probability that encryption is the most secure way to protect sensitive information given the claim made by tokenization vendors and our prior understanding of security technologies: P(E|TK). We can use Bayes’ theorem to write this as

P(E|TK) = P(E|K) P(T|EK) / P(TK)

= P(E|K) P(T|EK) / (P(E|K) P(T|EK) + P(NOT(E))|K) P(T|NOT(E)∩K))

Let’s assume that before we hear any claims about the security of tokenization, we’re fairly confident that encryption is the best way to protect sensitive information, so that P(E|K) = 0.99. If we don’t believe that encryption is the most secure way to protect sensitive information then it's reasonable to assume that we might believe that tokenization is a good alternative. To reflect this let’s assume that P(T|NOT(E)∩K)) = 0.9. On the other hand, you’d be hard pressed to find a single cryptographer who agrees that tokenization is more secure that encryption, so it’s probably the case that we should doubt the claim that tokenization is more secure than encryption. To reflect this, let’s assume that P(T|EK) = 0.1.

With these values we can now calculate P(E|TK) as

P(E|TK) = (0.99)(0.1) / ((0.99)(0.1) + (0.01)(0.9)) = 0.9167

so that Bayes’ theorem tells us that there’s no reason to assume that the claims that tokenization is more secure than encryption should really change our belief that encryption is the best way to protect sensitive information.

Tuesday, June 09, 2009

A better way to define cryptographic security

Olives

The level of protection that using cryptography actually provides can be confusing to many people. An 80-bit Skipjack key provides the same level of security as a 160-bit elliptic curve key or as a 1,024-bit RSA key, but understanding why this is true requires knowing more math than most people care to learn.

When it comes to quantifying the strength of cryptographic hash functions it gets even worse, because the strength of a hash function is very dependent on how it used. Finding a collision is the easiest attack on hash functions, and is the usual metric that's used to quantify the strength of one. But even if you can find a collision in a hash function, that doesn't mean that you can actually use that collision to defeat a cryptographic protocol. It's almost always more complicated than that.

This means that an accurate answer to exactly how secure something is can be so full of arcane crypto-details that only a specialist can understand it. This also means that the answer is useless to anyone other than specialists, which includes most of the almost 7 billion or so people that are currently alive.

Maybe there's a better way to quantify security that giving it in terms of bits of security. Maybe an arcane government regulation that dates back to 1967, 32 FR 11467, provides an example of a better way to do this.

32 FR 11467 defines the United States Standards for Grades of Green Olives. It's where the olive sizes like Large, Jumbo, Giant, Colossal, Mammoth, etc., are defined. It's also why olives always sound bigger that they are. If you open a can of Large olives, you'll probably be surprised by how small they really are. You need to get Colossal or Mammoth olives to really impress your holiday guests.

But despite the way in which it tricks people into buying a can of Large olives at least once in their life, the olive standard that's defined in 32 FR 11467 has been fairly successful and people seem to understand it better than they understand the way in which cryptographic security is quantified. Because it seems to work so well, why not model a standard for cryptographic security on it?

Instead of using bits of strength to quantify the strength of cryptography, the government could define several security levels that are defined by words instead of numbers. We could have levels like Secure, Very Secure, Highly Secure, Extremely Secure, etc. The level defined by "Secure" would have to be roughly the same as a fairly low level of security, of course, so that we'd be following the model that's worked so well with the United States Standards for Grades of Green Olives.

Some people might be tricked into buying something that is only Secure, but that wouldn't happen too often. People quickly learn that Large olives aren't really very big, and they'd learn quickly that something that's called "Secure" doesn't really provide that much protection against hackers.

I'll have to mention this idea to the people at NIST the next time that I talk to them.

Monday, June 08, 2009

Why use Secure Mail?

I predict that some day we'll notice that very few consultants encrypt their email. This will be because of the secure email products that they were forced to use when they worked at larger companies. My experience with ice cream leads me to believe this.

When I was a kid, we would frequently get ice cream on the way home from school. I always wanted to get a clown cone, a scoop of ice cream in a sugar cone that's decorated with candy to look like a clown. Clown cones don't actually taste that good because they've usually been sitting out for quite a while before you buy them, but they certainly look good, and that's why I always wanted one.

My parents, however had different ideas. They knew that clown cones don't actually taste very good, and they never let me get one. After years of being cruelly denied clown cones, now I'm the dad, and nobody (except my wife of course) can tell me that I can't get one. I do this every time I take my kids out for ice cream, cheerfully ignoring the fact that the clown cones really don't taste that good as I do it.

Now consider the people who have to use email encryption on the job. Many of these people aren't lucky enough to be using Voltage's SecureMail, which is extremely easy to use. Some of them even use PKI and S/MIME to protect their email. They probably hate every minute of this, but are forced to use S/MIME anyway.

Some of these people are going to become consultants one day. They'll be their own boss and won’t have a security department to force them to use S/MIME. I expect that many of these people will rebel against their terrible experiences with S/MIME by intentionally avoiding using encrypted email as much as they can, ignoring that fact that they really should be protecting sensitive information.

What's the solution to this? Use Voltage's SecureMail, of course.

Friday, June 05, 2009

A Journey to the Center of the Earth

Journey

Jules Verne's book A Journey to the Center of the Earth might be able to give us some insights into what some people put up with some of the more complicated and less user-friendly information security products.

In Verne's book, Professor Lidenbrock hires an Icelandic man named Hans Bjelke as a guide. Bjelke turns out to be unflappable, and tolerates all of the dangers of the book's subterranean journey without complaining. The only thing that he insists on is getting paid three rix-dollars every Saturday. As long as he gets paid, he cheerfully puts up with hungry prehistoric creatures, lightning storms and travel by geyser through a volcano.

People who work in the 21stcentury seem to be inspired by Bjelke to some degree, particularly when it comes to putting up with using some information security products. Some of these products don't even come close to being user-friendly, but people put up with them because they need to use them to do their jobs. And much like Bjelke didn't complain when he had to put up with the dangers that he found below ground, many people tolerate using difficult products because they're getting paid to do it. Maybe it's just because they feel they have to. Not many people will walk away from a job just because they have to use really annoying security products.

Because of that, I hope one day to do an interview for a job that I have no interest at all in getting that will go something like this:

Me: So how do you protect sensitive information that you send to customers?

HR person: We use encrypted email for that. Everyone has a "digital certificate," you see…

Me: Stop! I've heard enough! There's no way that I'll do that! You'll have to find someone else to fill this position.

That ought to create an interesting discussion in the HR department.

Thursday, June 04, 2009

Look out for yourself

Complying with the PCI DSS is definitely one of the biggest concerns that many businesses have these days. The PCI DSS isn't perfect, but there's probably a good reason for its existence. Data breaches expose millions of credit card numbers each year, and many of these are used for fraudulent transactions. Consumers, however, typically don't end up paying for these fraudulent transactions, at least not directly. It's still there, though, because the cost of the fraud gets built in to the prices that we pay.

According to the Federal Trade Commission's 2006 Identity Theft Survey Report (the most recent version), the median out-of-pocket loss cost that consumers experience from all forms of identity theft is actually ZERO dollars. Instead, it's the merchants and banks who really end up absorbing the losses and paying fines when they lose credit card numbers. So if you're a merchant that accepts credit cards, you have a serious interest in reducing the number of data breaches that expose credit card numbers. If you're a consumer, you probably have better things to worry about.

If you're a consumer and your credit card number gets stolen, your bank will cancel the old card and issue you a new one. You may suffer a little inconvenience, but having your credit card number stolen probably won't affect you that much. You're probably better off worrying about other sensitive information that can't be just cancelled and replaced. This includes things like your credit history, your Social Security number and your medical history. If any of these gets compromised, it can't be cancelled and reissued. Once it's exposed, your privacy is gone for the rest of your life and it's probably impossible to get it back. If you're a consumer, you probably have more of an incentive to worry about the security and privacy of this type of information.

The PCI DSS has attracted most of the attention recently, but consumers really should be more concerned about the protection of information other than credit card numbers. If you're going to lobby your government to do something, look out for your interests. Let the banks and merchants worry about reducing credit card fraud. You should worry the most about the loss of sensitive information that affects you the most.

Wednesday, June 03, 2009

Cloud computing security incidents

There's lots of talk about cloud computing and the security issues that surround it these days. There are some issues that deal with regulatory compliance that get tricky when data's in a cloud, but how many actual security vulnerabilities have been discovered that relate specifically to cloud computing? It's easy to answer this question if we look at the Cloud Computing Incidents Database. It's not clear how reliable or comprehensive the information in the CCID is, but it currently has information on 18 separate incidents that have taken place in the past two years.

All but one of these incidents are the loss of service or the loss of data. The single incident that's an actual exploitable vulnerability is a bug in the way Google used SAML 2.0 in their single sign-on service for Google Apps. This particular vulnerability even has its own listing in the National Vulnerability Database, which you can find here and you can find the original paper that described this vulnerability here. Here's how the abstract of this paper summarizes the vulnerability:

Single-Sign-On (SSO) protocols enable companies to establish a federated environment in which clients sign in the system once and yet are able to access to services offered by different companies. The OASIS Security Assertion Markup Language (SAML) 2.0Web Browser SSO Profile is the emerging standard in this context. In this paper we provide formal models of the protocol corresponding to one of the most applied use case scenario (the SP-Initiated SSO with Redirect/POST Bindings) and of a variant of the protocol implemented by Google and currently in use by Google's customers (the SAML-based SSO for Google Applications). We have mechanically analysed these formal models with SATMC, a state-of-the-art model checker for security protocols. SATMC has revealed a severe security aw in the protocol used by Google that allows a dishonest service provider to impersonate a user at another service provider. We have also reproduced this attack in an actual deployment of the SAML-based SSO for Google Applications. This security flaw of the SAML-based SSO for Google Applications was previously unknown.

Many people think of the confidentiality of data when they think of information security, but the history of cloud computing should be a good reminder that the integrity of data and the availability of data are just as important.

Tuesday, June 02, 2009

The Philosophy of Tokenization

In 1846, Edgar Allen Poe published the essay "The Philosophy of Composition" in which he described how he came to write The Raven. What Poe says in this essay probably isn't true, but it almost certainly tells you how Poe wanted people to believe how he came to write The Raven. In other words, Poe may have invented more than the detective story. He may also have invented marketing.

Over 160 years after Poe wrote "The Philosophy of Composition" we see security vendors telling us things about their products that aren't really true but they want you to believe are true. What some tokenization vendors are claiming about their technology is a good example of this.

A tokenization system implements two functions: tokenize and detokenize. When a requesting application calls the tokenize function with a plaintext input, a tokenization system creates a token from the plaintext using a proprietary tokenization algorithm. The tokenization system then encrypts the plaintext using a cryptographic key that it gets from a key server. It then archives a copy of both the resulting ciphertext and the key used to encrypt it, and returns the token to the requesting application. When a requesting application calls the detokenize function with a token input, a tokenization system retrieves the archived ciphertext that corresponds to the token from its encrypted data archive and the key that was used to encrypt the ciphertext from its key archive. It then decrypts the ciphertext and returns the decrypted value to the requesting application.

Note that the secure operation of a tokenization system relies on the secure operation of several components: the tokenization server itself, as well as the key server that it uses, the key archive system and the encrypted data archive. The failure or compromise of any one of the components of this system will compromise the entire tokenization system.

The chances of an application being inappropriately implemented, configured or maintained is much greater that the chances of an adversary being able to defeat modern cryptography, so the reliability of a security system is limited by the chances of the system failing rather than an attacker defeating any type of cryptography. This means that tokenization systems are inherently less secure than systems with fewer components, which includes systems that implement encryption.

The security of a tokenization system also relies on the security of a proprietary tokenization algorithm. The operation performed by this algorithm is identical to the operations performed by well-known cryptographic algorithms like hash functions or encryption algorithms, which transform a message into a form from which it's infeasible to recover the original message. Oddly, the fact that its output is called a "token" instead of a "ciphertext" or a "message digest" has let tokenization algorithms avoid the careful public review that other cryptographic algorithms are subject to.

"Security through obscurity" has been known to be a bad system design principle for over 125 years - ever since Auguste Kerckoffs wrote "La cryptographie miltaire" in 1883. There's no reason why tokenization algorithms should be exempt from this general principle, and anyone thinking of using tokenization to protect sensitive data should ensure that the technology that they're considering is secure enough to withstand careful public review.

Tokenization really isn't more secure than encryption. Tokenization vendors just want us to think that it is.

Monday, June 01, 2009

Locking Doors

A lot of people use the locked door as a metric for the quality of a community. They will say something like, "I grew up in a small town. It was full of good people, it was safe. We didn't even lock our doors." I've heard it said of big city neighborhoods as well. "We lived in this neighborhood where everyone knew everyone else. In our building there were maybe 20 apartments and no one had to lock their doors."

In a documentary once, some residents of a large, non-US city were proclaiming proudly that they don't lock their doors. Incidentally, I looked up some statistics on crime in that city (my sources were reports put out by that country's government), and it looks like they suffer more crimes in total and fewer violent crimes than the US average. Hence, it appears there are more property crimes per capita than in the US. I'd like to see a comparison between that city and US cities, but the report only included the US averages.

Sometimes when people talk about doors and locks, I get the impression they are saying it is morally superior to leave the door unlocked. That is, the unlocked door is not just a metric on the honesty and goodness of the population, but the character of the door owner.

By the way, I don't get this impression with all people who don't lock their doors. But every once in awhile it seems to me that an open-door person is saying, "You are so mistrusting. You have such a low opinion of humanity. But me, I choose to believe that people are mostly good." Or the subtext is a weird counter projection, "You believe people are dishonest and will steal from you because you yourself are dishonest. You expect other people to do what you would do. Only a dishonest person would lock their door."

I suspect some places where people leave their doors unlocked do so because the neighbors would be offended (or the door owner thinks they might be offended) if they did lock. "What? You think I might steal from you, so you lock your door? You think I'm a thief? I'm offended."

Another reason to leave the door unlocked is convenience, maybe you never want to get locked out. Perhaps you figure that a professional thief will have no problem getting past a locked door. If so, then locking doors won't prevent crime, it will only cause you to get locked out every so often. So why bother? Of course, the locked door might not stop the professional, but it might deter the amateur or the opportunistic. Or maybe the thieves simply go after the easiest targets ("I don't have to outrun the bear, I only have to outrun you"). They know enough doors are unlocked so they don't have to bother with the locked ones.

If you leave your door unlocked, it's possible you don't get robbed because no one tries your door. A common security measure is "do nothing and hope nothing bad happens". It often works.

I believe that there are thieves in this world. Some of them live in small towns, some in large cities (large cities have more targets and more opportunities to be anonymous, so thieves gravitate to larger metropolitan areas). Some of them are drug addicts trying to get money for a fix (and they might live in small towns or in the supposedly safe apartment building). Some of them are kleptomaniacs or teenagers looking for a thrill or see it as a game. I think it is a wise policy to try to protect yourself from the thieves as best you can. Locking your doors is easy. It's not an indictment of your neighbors, the town, or your own character to lock doors.

Time really isn't money

The saying that "time is money" probably isn't true. It may sound true, but peoples' behavior tells us that they really don't believe it. We tend to have a strong bias in favor or things that we get right now at the expense of things that we'll get later. This is why people are easily tempted by low introductory interest rates on credit cards, for example, even if the rate becomes unattractive after six months or so. We'd apparently rather pay less now, even when we know that it will cost us more later. This means that time really isn't money. It also may explain why open-source software is so popular.

Some open-source software is very good. Linux is a very good operating system and MySQL is a fairly good database (at least for some applications). On the other hand, some open-source software isn't quite as good. It's cheap, but it can be much harder to use than commercial alternatives. Sometimes it's much harder to use. OpenSSL, for example, is free, but can be very tricky to use. Some people tell me that I should use harsher language to describe it, but I'll let them do that themselves.

RSA's BSAFE SSL-C, on the other hand, is fairly easy to use, but isn't free. It doesn't actually cost that much, but it's definitely not free. You'd think that people would prefer BSAFE SSL-C over OpenSSL when they look at how many hours of engineering time they'd save by using BSAFE SSL-C, but this doesn't seem to happen very often. They'd prefer to pay the lower costs today that are measured in money, even though they'll have higher costs in the long run that are measured in time. The fact that time apparently really isn't money seems to be why this happens.

Voltage Data Breach Index

  • Grab the Voltage Data Breach Index

March 2010

Sun Mon Tue Wed Thu Fri Sat
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31