Spotting Randomness

You are gambling in a casino and there seems to be rather too many occurrences of a certain number. So how can you tell if the casino is cheating? Well, we would measure the entropy of the system, and determine how random the numbers were.

Encrypted content tends not to have a magic number (apart from detecting it in a disk partition). If we analyse both compressed and encrypted fragments of files we will see high degrees of randomness.

An important detection method for detecting compressed and encrypted files is the randomness of the bytes in the file. This measure is known as entropy, and was defined by Claude E. Shannon in his 1948 paper. The maximum entropy occurs when there is an equal distribution of all bytes across the file, and where it is not possible to compress the file anymore, as it is truly random.

We determine the frequencies of the bytes and then use:

For example “00 01 02 03” gives f1=0.25, f2=0.25, f3=0.25 and f4=0.25, which gives:

`for freq in freqList    ent = ent + freq * math.log(freq, 2)`

If we measure the Shannon entropy of a TrueCrypt volume we get the results of:

`C:\Python27>python en.py "c:\1.tc"File size in bytes:3145728Shannon entropy (min bits per byte-character):7.99994457357Min possible file size assuming max theoretical compression efficiency:25165649.6435 in bits3145706.20544 in bytes`

We can see that the file size is 3,145,728 bytes and the minimum bytes for each character is 7.99994457357, which is extremely close to an almost perfect rating of 8 bits per byte. The efficiency is thus 99.999307 (3145706.20544/3145728 x 100%).

If we now try a compressed file (DOCX, which derives from the PKZip file format), we get:

`File size in bytes:318724Shannon entropy (min bits per byte-character):7.98787618412Min possible file size assuming max theoretical compression efficiency:2545927.84891 in bits318240.981113 in bytes`

And we now get an efficiency of 99.84% with an entropy of 7.98787618412. A measure of entropy on a DOC file (a non-compressed or encrypted file format) gives:

`File size in bytes:62464Shannon entropy (min bits per byte-character):4.64286159485Min possible file size assuming max theoretical compression efficiency:290011.706661 in bits36251.4633326 in bytes`

Which gives an efficiency of 58.03577%. Thus a typical characteristic is that encrypted content results in the highest levels of Shannon entropy, followed closely by compressed file formats. An entropy value of over 98% is likely to identify compressed or encrypted content.

The following are some examples:

• 00 FF 00 FF 00 FF 00 FF 00 FF 00. Try
• 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F. Try
• 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F. Try
• First 256 bytes of TrueCrypt volume. Try
• First 256 bytes of PKZip file (notice major number: 50 4b 03 04). Try

--

--

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. Based in Edinburgh. Old World Breaker. New World Creator. Building trust.