Current State of the MD5 Hash Function

I’ve had a couple of people ask me recently if the MD5 hash function is still ok to use in web applications, so I thought I would write a brief summary on the subject. They gave excuses such as “the hash is hidden away in a database, end users will never see it…”, not a good thought process for security.

Short Answer: No, don’t use it any more.

Long Answer:


MD5 has various well documented attacks against it, I will explain these and what they actually mean in terms of its security.

Firstly, and this is often miscited by misinformed ’security people’, MD5 still has full preimage resistance (kind of) (If I give you a 128-bit MD5 hash value, there is no algorithmic attack to find the original input message [ignoring brute force & rainbow tables]). What it does not have, is collision resistance. There is a fine difference between the two, but a significant one.

First preimage attack: Given a hash h, find a message m such that hash(m) = h.

Second preimage attack: Given a fixed message m1, find a different message m2 such that hash(m2) = hash(m1).

Collision Attack: Find m1 and m2 such that hash(m1) = hash(m2)

Chosen-prefix Collision Attack: Find b1 & b2 such that hash(a1 || b1) = hash(a2 || b2)

The difference to be aware of here is that in a second preimage attack, the attacker has to use a fixed input m1. In a collision attack, he has full control over two arbitrary inputs which makes collision searching significantly easier. Note that a chosen-prefix attack is at least as difficult as a classic collision attack, and that a chosen-prefix attack implies a collision attack but not the reverse.

The first theoretical collision attack came in 1995, these attacks developed slowly with increasing efficiency (Schneier named MD5 broken around this time) and in 2008 there was the famous SSL chosen-prefix collision attack on MD5 in X.509 certificates. Certificate Authorities promptly switched from MD5 to a stronger hash function.

The most recent collision attack (I think) is from 2006 in a paper entitle Fast Collision Attack on MD5 which has a complexity of 2^34.1 and can generate collisions in just a few minutes (with certain restrictions on the input).

Today, it remains hard however to find the original m given h = hash(m), this is first preimage resistance. Rainbow tables have some effect on this statement, but are quickly solved by using a suitably sized salt (Always Use Salts!! >64bits if possible) such that h = hash(m || salt). Some would therefore say that it is ok to store hash(password || salt) in a database, I say why risk it when there are significantly better alternatives at the cost of a few bits?

A recent paper from Eurocrypt 2009 presents a full preimage attack with a complexity of 2^116.5, completely infeasible. So although a preimage attack exists, it is not possible in practice with this current complexity level.

In my opinion, 128-bit hashes are a little on the small side today. If you put your paranoid hat on for a second and imagine worst case brute force power, massive rainbow tables are most likely in existence (there are many good ones free on the web!). These tables likely store the hashes of vast dictionaries or common passwords / languages etc as well as a large percentage of the arbitrary hash space.

In conclusion, MD5 has ‘decent-enough’ preimage resistance but has some serious collision vulnerabilities which may or may not effect the application it’s deployed to. Even though MD5 might be safe under certain scenarios, unless you absolutely have to use it for reasons such as backwards compatibility or interoperability don’t. There is no point and as Schneier always says “Attacks never get worse, they only get better”. There are numerous alternatives available today, SHA256, SHA512 and Whirlpool for example are all sensible choices.  :)


5 Responses to “Current State of the MD5 Hash Function”

  1. SmallGoda says:

    SHA512 for the win!

    Seriously though, you can “acquire” some seriously chunky Rainbow Tables (> 80GB easily) without too much difficulty in the general interwebs; taking off the tinfoil for a second, it isn’t hard to imagine that anyone could take these and expand them without much effort or serious commercial storage space – not just government types but also the rather more malicious criminal types as well..

    As usual, the most sensible encryption to use is the strongest one that your performance requirements allow for. ;-)

    And for gods sake people, patch your damn servers! ^_^

  2. Phillip says:

    MD5 is still cool to use for general integrity checks though yer?

  3. Jack says:

    @SmallGods: To true! What’s going from 128bits to 512bits really going to cost you in resources in the long run given the extra safety margin? I’ve heard of people selling terabyte hard drives packed with tables, scary stuff. Patching and strongest ftw as you say :)

    Incidentally, the new competition currently in progress for 2012 is for 1024-bit monster hashes :)

    @Phillip: Kind of, if you just what a really fast integrity checksum then yes it’s great. Collision attacks make it dangerous to use for anything sensitive though. There are examples of malicious PDFs for example, both very different but with the same MD5. So use it carefully…

  4. [...] of hash function. If you pick a weak hash like MD4 or MD5 with known collision attacks then you are asking for trouble Use something with decent strength > 256bits and you will be significantly better off (SHA-512, [...]

  5. Pennie Whack says:

    Like the site.

RSS feed for comments on this post. And trackBack URL.

Leave a Reply

You must be logged in to post a comment.