Lucky 13 Attack Explained
Oct 4, 2019 · 7 min read
This attack is applicable with CBC mode of encryption and with MAC-then-Encrypt scheme. This is more of a theoretical attack because of delicate nature of differential timing of server response over the network. However researchers did demonstrated the attack in lab. The TLS MAC calculation includes 13 bytes of header information (5 bytes of TLS header plus 8 bytes of TLS sequence number) and that’s why it is called Lucky 13. Lucky 13 exploits the flaw mentioned in RFC 5246.
This leaves a small timing channel, since MAC performance depends to some extent on the size of the data fragment, but it is not believed to be large enough to be exploitable, due to the large block size of existing MACs and the small size of the timing signal.
Before digging into attack, let’s first understand the basic building blocks.
CBC(Cipher Block Chaining):
Plain tex t is divided into block size (depend on underlying symmetric encryption algorithm; 8 for DES and 16 for AES). Plain text Block Pᵢ is first XOR with previous Cipher block and then fed to DES/AES with the negotiated key.
First MAC of plain text is calculated, the result is appended to plain text, and then padding bytes are added such that it become integral multiple of block length.Last byte of last block tells how much padding is there and all the padding byte must contain same numeric value. Padding must consist of p+1 bytes of same value p. For example, if last byte is 0x00, then their will be only 1 byte of 0x00 and that will be the byte itself. If padding byte is 0x01 , then their will be 2 bytes 0x01 0x01 of same value 0x01.
MEE(Mac-then-Encode-then-Encrypt): Here the “Encode” step takes care of any padding that might be needed prior to the encryption step. This is more of a technical term used instead of MAC-then-Encrypt.
Encryption in TLS: For a message M, first a HDR||SQN (13 bytes) is appended and then a tag T is calculated as per negotiated algorithm (MD5/SHA1/SHA256). After that padding is added to make it integral multiple of blocksize(8 bytes for DES and 16 for AES).
HMAC: A brief understanding of HMAC is also required to understand the attack. Typically HMAC is computed of 64 bytes block, with 8 bytes header, at least 1 byte of padding, so effectively 55 bytes of data is maximum a tag can be computed in 1-block. If the data is more than 55 bytes, it take 2 block.The inner hmac function uses compression, which means if the data is more it would take more runtime for execution and hence server response is also delayed.In general an extra 64-bytes of data requires extra compression cycle.The size of the MAC tag is 16 bytes (HMAC-MD5), 20 bytes (HMAC-SHA-l), or 32 bytes (HMAC-SHA-256).
Point to remember is
55 bytes takes 4 CPU cycles.
56 or more take 5 CPU cycles.
I will explain for AES, whose blocksize is 16 bytes and HMAC-SHA1, which computes 20 bytes of tag.
An attacker modify & truncate this cipher text.
First let’s see how much to truncate. This is depend on above factor.
- From HMAC, we know MAC computation of 55 bytes of data can be distinguished from 56 or more bytes of data, since 56 or more take a complete extra cycle of compression. So our aim is to find cipher block such that we get this 55 bytes of data.
- We know that MAC computation also uses 13 bytes of HDR(5 bytes) & SQN(8 bytes). Since cipher block won’t contain these thing , let’s subtract to get actual bytes cipher block should have 55–13 = 42.
- Now add tag size, since we are using SHA-1, its 20 bytes so 42+20 = 62.
- Nearest block size(16) multiple of 62 is 64 ( 4 block of 16 bytes each). We have to add up 2 byte extra to get 64 .
Hence for HMAC-SHA1, we have to truncate at 4 blocks and modify 2 bytes.
These 2 bytes will be our playing ground to retrieve plain text .
What if MD5 is being used for tag computation?
Step 3 & 4 will change.
3. MD5 tag size is 16 bytes, so add 42 + 16 = 58 bytes.
4. Nearest multiple of 16 for 58 is 64. Hence again 4 blocks for truncation but 6 bytes for modification.
For SHA-256, same logic will hold but we will have 5 blocks for truncation.
Attack become less efficient for these cases but there are workaround suggested. For example if one can modify the input URL such that 5 bytes are known. Something similar to what was done in BEAST attack.
Let’s come back to SHA-1 with AES cases.
We deduce that truncation should happen at 4 cipher block and 2 bytes to be modified.
For decryption attacker submit.
HDR || C₀ || C₁||C₂||C₃||C₄
HDR is TLS header.
C₀ is Initialization Vector.
C₁ -C₄ are the truncated cipher block.
For TLS, SQN is not sent but sender and receiver maintain themselves, however for DTSL, SQN is also transmitted.
C₄ is the target block, whose plain text( P₄) attacker want to determine.Attacker can replace this C₄ with any other cipher block as well.
Attacker will modify previou block i.e. C₃ with some chosen 16 bytes value denoted as Δ.
Let’s draw the Block cipher decryption diagram.
P₄ = Dec(C₄) ⊕ C₃ ; under normal circumstances.
P* = Dec(C₄) ⊕ (C_3 ⊕ Δ) ; because C₃ is modified with Δ.
Above 2 equation can we re-written as
P* = P₄ ⊕ Δ
This is a important relation between modified plain text and plain text attacker is interested in.
Receiver first decrypt the packet, strips off padding bytes, removes the MAC, and the compute the MAC of decrypted text and then compare with received MAC.There can be these possibility on receiver side after decryption.
- Padding check failed. This means the last bytes and padding bytes mismatched. For example last byte is 0x75 , which means all previous 0x75 number bytes should also contain numeric value of 0x75, but it fails. So in this TLC RFC suggest that entire data should be used for MAC calculation which means MAC would be calculated for MAC(HDR||SQN||C₁..C₄). This would take long time as this is > 55 bytes.
- Response(which would be bad MAC or something )returned in less time. This interesting case this means, MAC computation used 55 bytes of lesser for MAC computation. This can only be possible if last 2 bytes are 0x01 0x01 or even 0x02 0x02 0x02, in all these cases 64 bytes of cipher block (after removing padding(2) and 20 bytes of tag)= 42 + 13 bytes of HDR||SQN = 55 bytes.
Hence only Case #2 take least time and that means P* last 2 bytes are 0x01 0x01 and since we have a relation between P₄ and P*. Attacker can retrieve last 2 bytes of P₄.
P* = P₄ ⊕ Δ
- Write the code in such a way that success and error cases consume same time so that an attacker cannot measure the time difference.
- Use AEAD Ciphers like AES-GCM, these cipher calculate MAC while encrypting the message itself.
- Use Encrypt-then-MAC.
This attack is applicable with CBC mode of encryption and with MAC-then-Encrypt scheme. This is more of a theoretical attack because of delicate nature of differential timing of server response over…