Computing parity using CRC
It is a party trick rather than anything practically useful, but the following four functions all compute the same thing:
uint8_t parity_b(const uint64_t* data, size_t n) {
uint8_t acc = 0;
for (size_t i = 0; i < n; ++i) {
for (size_t j = 0; j < 64; ++j) {
if (data[i] & (1ull << j)) acc ^= 1;
}
}
return acc;
}
uint8_t parity_p(const uint64_t* data, size_t n) {
uint8_t acc = 0;
for (size_t i = 0; i < n; ++i) {
acc += _mm_popcnt_u64(data[i]);
}
return acc & 1;
}
uint8_t parity_x(const uint64_t* data, size_t n) {
uint64_t acc = 0;
for (size_t i = 0; i < n; ++i) {
acc ^= data[i];
}
return _mm_popcnt_u64(acc) & 1;
}
uint8_t parity_c(const uint64_t* data, size_t n) {
uint32_t acc = 0;
for (size_t i = 0; i < n; ++i) {
acc = _mm_crc32_u64(acc, data[i]);
}
return _mm_popcnt_u64(acc) & 1;
}
Going from parity_b
to parity_p
is an easy step: replace the inner loop with a popcnt
. Going from parity_p
to parity_x
is also an easy step: lift the popcnt
out of the loop. Hence it is not too surprising that parity_b
and parity_p
and parity_x
all compute the same thing. The parity_c
function is like parity_x
, except replacing ^=
with crc
, and the surprising thing is that parity_c
computes the same thing as all the other functions. To understand why, we need to look at the polynomial used by the crc
instruction (the CRC32-C polynomial):
This polynomial can be expressed as the product of two separate polynomials:
Accordingly, CRC32-C isn't really a 32-bit checksum, it is really two separate things packed into 32 bits: a 1-bit checksum and a 31-bit checksum. The method of packing is slightly funky, and thus the method of unpacking is also slightly funky: from a mathematical point of view, to get the 1-bit checksum out of the 32-bit value, we need to view the 32-bit value as a Z2 polynomial and then compute the remainder modulo x + 1
. This is equivalent to iteratively applying the reduction rule x = 1
until nothing more can be reduced. If we were to start with the 4-bit polynomial b3 * x3 + b2 * x2 + b1 * x + b0
and iteratively apply x = 1
, we'd get b3 * 1*1*1 + b2 * 1*1 + b1 * 1 + b0
, which is just b3 + b2 + b1 + b0
, which is just another way of saying popcnt
(mod 2). The same thing is true starting with a 32-bit polynomial: the remainder modulo x + 1
is just popcnt
(mod 2). In other words, we can use popcnt
to unpack the 1-bit checksum out of the 32-bit value, and a 1-bit checksum is just parity. Hence why parity_c
computes the same thing as all the other functions.
This combination of a 1-bit checksum and a 31-bit checksum can be seen as a good thing for checksumming purposes, as the two parts protect against different problems. Notably, every bit-flip in the data being checksummed will invert the 1-bit checksum, and hence any odd number of bit-flips will cause the 1-bit checksum to mismatch. The (ahem) flip-side is that the 1-bit checksum will match for any even number of bit-flips, which is where the 31-bit checksum comes in: it needs to protect against as many even number of bit-flips as possible.