Cryptopals: Set 1 Challenge 1
The Backstory
The Cryptopals Challenges introduce people to cryptography through a series of programming exercisese that increase in complexity. The challenges start with encoding & decoding in hex/binary/base64, and they build up to a variety of topics including an implementation of Wang's Attack, which exploits a flaw in the md4 hashing algorithm.
I plan to follow along with my own efforts to implement the challenges, and provide detailed explanations of concepts that helped me understand and implement the code.
This, the first post in the series, will be much more detailed than subsequent posts because it provides some broad background info regarding data representation in computers.
The Challenge: Set 1 Challenge 1
Convert hex to base64
Convert the Hex string:
49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
Into its Base64 encoded equivalent:
SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
Cryptopals rule: Always operate on raw bytes, never on encoded strings. Only use hex and base64 for pretty-printing.
View the challenge on cryptopals.com
The Solution
The cryptopals rule helps break this challenge down into two steps:
- Step 1 - convert hex string into raw bytes
- Step 2 - convert the raw bytes into base64
An understanding of number systems and bit-wise operations will help you understand the solution. You can find a deeper discussion of these topics in these other blog posts:
Step 1 - how to decompose a hex string into bytes
The hard way - packing bits
As noted in Primer on Data Encoding, each hexadecimal symbol can be represented by four bits. If two hexadecimal digits are taken together, they can be "packed" into an 8-bit byte by putting the first character's four bits in one half of the byte, and the second character's four bits in the other half of the byte. In code it'd look something like this:
string hex = "cf";
char hex0 = hex.at(0); // 'c'
char hex1 = hex.at(1); // 'f'
// NOTE: it's important to convert the _ascii character_ 'c' into
// the corresponding _hexadecimal integer value_ 0x0c
// i.e. convert 'c' into 0x0c
uint8_t highNibble = convertHexCharToBits(hex0); // 'c' = 0x0C = 12
uint8_t lowNibble = convertHexCharToBits(hex1); // 'f' = 0x0f = 15
// Bit-shifting and the bitwise-OR operator are used here
// to pack the two nibbles into a single byte.
//
// First, every bit in the first nibble is shifted 4 spots
// to the left... moving c into the "high nibble" position:
// 0x0c << 4 0xc0
//
// Second, the "high nibble" bits and the "low nibble"
// bits are joined together using the bitwise OR operator:
// 0xc0 | 0x0f = 0xcf
uint8_t result = (highNibble << 4) | lowNibble;
The easy way - lookup table
Because there are only 255 unique bytes, it's easy to make a lookup table from "pairs of hex chars" into "bytes". This is faster and easier than packing the bits one character at a time.
std::map<string, uint8_t> hex_to_bytes { {"00", 0x00} , ... {"cf", 0xcf},... {"ff", 0xff} };
std::string hex = "cf";
auto result = hex_to_bytes[hex];
Step 2 - converting a sequence of bytes into base64
This step is trickier, and requires building on all the concepts covered so far.
Base64 has, you guessed it, 64 symbols... A..Z, a..z, 0..9, +, and /. See the full table here.
64 symbols can be represented by 6 bits.
Now... there's a mismatch. Two hex characters fit perfectly into a single byte. When you try to fit base64 symbols into a byte, you can only fit one... with two bits left over.
But base64 decoding a sequence of bytes assumes the symbols are encoded densely, meaning no 2-bit gaps in the data. Therefore, symbols/data are grouped together into the smallest chunks that are evenly divisible both by 8 and by 6 (i.e. a chunk of data that can be chopped up into either bytes or base64 symbols. That's 24-bits (which can be evenly divided into four 6-bit base64 symbols, or three 8-bit bytes).
0b00000000 00000000 00000000 = [0x0, 0x0, 0x0]
0b000000 000000 000000 000000 = [0, 0, 0, 0]
In code, this mean breaking data apart across bytes using a "bit mask" and bit shifting, which is not particularly natural. In code, we end up with something like this:
uint8_t input_bytes[4] = {0x49, 0x27, 0x6d};
// the first of 4 base64 symbols (aka.. the first easy one....)
// Shift the first byte to the right by 2 bits, and then take the 6 lowest bits
uint8_t i1 = (bytes[1] >> 2) & 0b00111111
// the second of 4 base64 symbols (aka the first hard one)
// * take the lowest two bits from the first byte
// * shift them to the left 4 positions
// * take the first four bits from the second byte
// * shift those bits to the _right_ 4 positions
// * bitwise-OR them together with the first two bits
uint8_t i2 =
((bytes[1] & 0b00000011) << 4) |
((bytes[2] & 0b11110000_ >> 4);
// the third of 4 base64 symbols (aka the second hard one)
// * take the lowest four bits from the first byte
// * shift them to the left 2 positions
// * take the first two bits from the second byte
// * shift those bits to the _right_ 6 positions
// * bitwise-OR them together with the first four bits
uint8_t i3 =
((bytes[2] & 0b00001111) << 2) |
((bytes[3] & 0b11000000_ >> 6);
// the _last_ of 4 base64 symbols (aka the other easy one)
// grab the last byte, and take the 6 lowest bits
uint8_t i4 = bytes[3] & 0b00111111
// now you have 4 uint8_t that are in the range of 0..64.
// These can be used as keys into a base64 symbol lookup table
std::map<uint8_t, char> base64_symbols = { {0x00, '0'} , ... {0x40, '/'};
string result = // concatenate the chars together
base64_symbols[i1] +
base64_symbols[i2] +
base64_symbols[i3] +
base64_symbols[i4];
Working Code Solution
View on GitHub git clone https://github.com/barely-software/cryptopals-challenges cd cryptopals-challenges git checkout set1challenge1-rev2 mkdir build cd build cmake .. make ./set1_challenge1expected output:
--- Set 1 Challenge 1 ---
Description: Convert Hex to Base64:
input: 49276d206b696c6c696e6720796f757220627261696e206c696b65206120706f69736f6e6f7573206d757368726f6f6d
expected output: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
actual output: SSdtIGtpbGxpbmcgeW91ciBicmFpbiBsaWtlIGEgcG9pc29ub3VzIG11c2hyb29t
SUCCESS!