The Weaknesses of MD5, SHA1 and SHA-256 — The Length Extension Attack
The foundation of trust in cybersecurity is layed by the simple concept of data hashing, and where we take data and create a fixed-length hash for the data. If we cannot trust our hashing methods, we are in trouble. When we creating the perfect message hash, we thus need to make sure we have:
- Collision resistance. This is where it is extremely difficult to find two messages which have the same hash. Thus we should not be able to find the has of two messages (M1, and M2) that are the same, within a reasonable time: H(M1)=H(M2).
- Pre-image resistance. If we already have a hash value (h), it should be extremely difficult to find a message which will give the same hash. Thus for a given hash (h), it is difficult to find a message (M1) for H(M1)=h.
The original hash methods were often based on the Merkle-Damgård (MD) construction. With this, we create a hash function using blocks of data. Based on the MD constuct, Ron Rivest created the MD5 hashing method, and it was widely adopted in the industry. It works by taking a static initialisation vector (IV) and then feeding this into a one-way function (f), along with a block of the message. We feed this output into the next stage, and so on until we get to a message pad at the end:
Figure 1: The MD construct
The one-way function (f) will generally compress the data and produce fewer bits out than are fed in. Unfortunately, the MD construct has many weaknesses, and one of the most serious is the length extension attack. With this an adversary (Eve) can take a hash for an unknown message, and then add additional data to produce a new valid hash.
So Bob could take a hash of a password that he and Alice know (“qwerty123”) and the append with a message (“hello”) to produce:
H(Password || Message)
an where “||” identifies the appending of one string onto another. Thus when Bob sends a message to Alice, she will prepend the message with the shared password, and generate the same hash. In this way, Bob has proven the message and that he knows the secret password. This is a message authentication code (MAC) and validates that Bob knows a shared secret and the message.
But, the MD method is flawed as it is possible for Eve to take a previous hash for a known message, and then append a new method to produce:
H(Password || Original Message || New Message)
In this way, Eve does not know the password but can still generate a valid hash, and add her message onto it. An outline of the code is:
addition = b'addition'if (len(sys.argv)>1):
addition=(sys.argv).encode()# Compute a previous hash for H(Password || Message)m = hashlib.sha1()
print ("Previous hash: ",rtn)# Compute a hash for H(Password || Message || Addition)rtn = hashpumpy.hashpump(rtn, message, addition, len(password))
print ("New hash: ",rtn)
print ("New message: ",rtn)m = hashlib.sha1()
rtn=m.hexdigest()print ("Computing new hash (password+newdata): ",rtn)
A sample run for a message of “message” and with the addition of the text of “addition” is:
Previous hash: 22583ca8f00efff6296b4b571b9c2e1bcf22a99a
New hash: dd448d0874b738ca1b85bc00e151fbf16393ce4a
New message: b'message\x80\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00xaddition'Computing new hash (password+newdata): dd448d0874b738ca1b85bc00e151fbf16393ce4a
In this case, the hash of H(Password || “message”) is ‘22583ca8f00efff6296b4b571b9c2e1bcf22a99a’, and Eve can now use this to generate a new valid hash, without the knowledge of the password. We can see that Eve can generate some extra bytes in the message, and then add a new message and create a valid hash. Here is the code:
And there you go. Don’t use MD5 and SHA1! When producing a MAC, make sure it is HMAC, and which does not suffer from the length extension attack.