Take an encoded word, flip a single bit, and see if you can figure out the method for determining the place the error occurred. It ends up being super simple.
Soof Golan because they have to give uneven sums, bits and their parity checkbits have to be different (so 0,1 or 1,0). so if b2=b3 one of them is wrong, same for b4=b5. if that's the case, you can then check the bits b1+b3+b5 if their sum is an uneven number. if it is, bits b1,b3 and b5 are correct and one of the parity bits was faulty (b2 or b4). if they do not add to an uneven number, one of them is wrong (so if b2=b3, b3 has to be the wrong one, b4=b5 means b5 was wrong and if neither of those was the case, only the parity bit b1 was incorrect).
*As a quick example:* Say we receive the code 11000 (1) Looking at bits 1, 3 & 5: bit3=0 and bit5=0 which should mean bit1=0 therefore either bit1, bit3 or bit5 is wrong. (2) Looking at bits 2 & 3: bit2=1 and bit3=0 indicates that either bit2 or bit3 is wrong, as we do not have even parity. (3) Looking at bits 4 & 5: bit4=0 and bit5=0 indicates that both bit4 & bit5 are correct as we have even parity. Since we are assuming that only 1 bit is incorrect, taking (1) and (2) together implies that bit3 must be wrong and so the code should have been 11100=CLOUDY.
Yes indeed 1111 logically belongs on the inner cube . But the labelling of the corners isn't the issue here. Rather, the question is: can you locate 4 red blobs on any 4 of the 16 available corners such that all 4 of therm are at least distance-3 from one another. No - you can't.
Would be nice to see more about the generalization here. How does this work if you want to encode a three-bit message? Or, conversely, if you add a 6th bit what does it do? And what happens once you get up to the next highest power of 2? What are the general rules that guide this process?
With m parity bits, you can error correct up to (2^m)-1 bit message, which means you have (2^m)-m-1 bits left for data. with 3 parity bits you can add 6th and 7th bit for data and still be fine. The general principle is that the parity bits encode the parity of all bits that have that particular binary digit in their position. That allows you to pointpoint the exact position of each error. If parity bits 1,2,8 don't match up it means that bit 1+2+8=11 has error. If parity bit 8 alone doesn't match up, it means that bit 8 is wrong (which is itself). To encode position "i" of the bit in "n"-long string you need log2(n) bits to do it. The length of the message rises linearly, but number of parity bits rises logarithmically, which means the space wasted for error-correction drops exponentially with the length of the message.
The '0000' and '1111' codes at 6:17 (and onwards) are actually marked incorrectly on the hypercube. You can see that the edge distance between them is clearly 3, however it should be the same as the corresponding Hamming distance - 4. The correct way to mark these codes would be, if we keep '0000' where it already is, to place the '1111' in the opposite corner of the INNER cube, rather than the outer. You can then convince yourself that the edge distance between '0000' and '1111' becomes 4, as it should be. Note that the author's point is NOT invalidated by this. It is still correct that you cannot have three 4-bit codes all of which are at least 3-away from each other in Hamming distance.
The amount of idiocy, arrogance and lack of interest in the matter is absurd in the comment section. Great video as always, my thanks to the professor.
If you look it closely, you're just adding a bit for parity check and then doubling all the other bits. So if a pair of those doubled bits is not even, then you see if the sum was even, so you can correct. If the damaged bit is the parity one, there's no problem since all the other bits are perfectly doubled, so you don't need to check anything. 3bits example 000 -> 0 00 00 00 001 -> 1 00 00 11 010 -> 1 00 11 00 011 -> 0 00 11 11 100 -> 1 11 00 00 101 -> 0 11 00 11 110 -> 0 11 11 00 111 -> 1 11 11 11 Lets say 0111100 gets damaged and become 0011100 The first pair after the parity bit is 01, but the parity bit says that the sum must be even, so it was 11. if instead the received signal was 1011100 The first pair is 01, but the parity bit says that the sum must be odd, so it was 00. But I think there is a little problem, because some of the words only differ 2 bits, like 001 and 010. The video says you need 3 bits difference...
m-parity bits can cover upto 2^m -1long message, leaving 2^m -m-1 leftover room for the message itself. To get the amount of required parity bits just solve for n
As KohuGaly mentioned, Hamming codes in general has codeword length (2^r)-1 where r is the number of parity symbols. Each codeword has (2^r)-1 symbols but since r of them are parity symbols, there are only (2^r)-r-1 information symbols. Hamming codes can correct up to 1 error. There is also something called BCH codes, which in some sense is a generalized version of Hamming codes. BCH codes can correct up to t errors. Coding theory goes far beyond these codes, though. Hamming codes are linear codes, which means that the code can be considered a vector space in which the codewords are vectors(which basically means that you can add them to eachother). These kinds of codes are quite nice to deal with mathematically, but they are only the tip of an enormous iceberg of codes.
Too bad none of these responses tell you how to actually construct a Hamming Code for three bit messages. Allow me to correct that problem. To know how many parity bits to add, the rule is this: add an extra parity bit if the total number of bits becomes equal to a power of two. So, for example, if the message was 5 bits long, adding 3 parity bits would get you 8 bits, which is a power of 2. Therefore, a 5 bit message requires 4 parity bits, for a total of 9 bits in the final code word. However, you were talking about checking for a 3-bit message, and this requires only 3 added parity bits, for a total of 6 bits: 1 2 3 4 5 6 Once again, bits 1, 2 and 4 are the parity bits, whereas bits 3, 5 and 6 are the information bits. The rule for which parity bits check which bits in the code word is essentially the same as with 2-bit messages / 5-bit code words: Bit 1 checks bits 1, 3 and 5 Bit 2 checks bits 2, 3 and 6 Bit 4 checks bits 4, 5 and 6 So, for the 8 possible 3-bit messages with an even-parity Hamming Code we have (the parity bits are in [] brackets): 000 --> [0] [0] 0 [0] 0 0 001 --> [0] [1] 0 [1] 0 1 010 --> [1] [0] 0 [1] 1 0 011 --> [1] [1] 0 [0] 1 1 100 --> [1] [1] 1 [0] 0 0 101 --> [1] [0] 1 [1] 0 1 110 --> [0] [1] 1 [1] 1 0 111 --> [0] [0] 1 [0] 1 1 You can check: the "distance" _[aka number of bits that differ]_ between any 2 code words is at least 3. In fact, I think you'll find that it's either 3 or 4. And since any error of n bits can be corrected by a code which has a Hamming Distance of at least 2n + 1, the code I generated above qualifies. I think it'd be better to wait for _Computerphile_ to make a _How to correct 1-bit errors using the Hamming Code_ video to explain how to actually go about checking for and correcting any errors.
Dealing efficiently with binary asymetric error correction, i.e. where 1s are more likely to be turned into 0s than 0s into 1s, seems like it would be very complex and interesting
The No. 1 ESS Switch program store used hamming and parity check methods to correct single read errors. Each 44 bit word contained 5 hamming bits and 1 parity bit as I recall.
Hamming codes are brilliantly clever but there not that practical. In reality bit flip errors rarely occur in singles like this. Generally if there is some external phenomenon that is causing noise in the the signal it will occur in bursts that will mess up a large sequences of bits. Also hamming codes are really expensive as they wasting a lot of bandwidth. For these reasons generally CRC is used for error detection, and correction is done when needed by re-transmission. Hamming codes are really only useful in situations where error rates are very low, and the cost of re-transmission is quite large. One possible example of this is communicating with things way out in space where it takes a long time to send a message round trip, however the large amount of potential interference can still makes the error rate on the higher side, making hamming codes again unpractical.
I'd love to see the video on how the error correction is done, and maybe if you could work in how parity can be scaled up such as .par files you used to get downloading large multi-part archives on Usenet.
Does the Hamming method hold up if you need to send more than 2 information bits? I imagine you'll need exponentially more parity bits the more information bits you have.
You won't need exponentially more parity bits. Even if you would come up with a code that would scale in such an exponential way, there are way more clever alternatives out there. Hamming codes in general has codewords with (2^r)-1 symbols where r is the number of parity symbols. Thus, the Hamming codes has (2^r)-r-1 information symbols, but only r parity symbols. That is, the information symbols grow exponentially, but the parity symbols grow linearly. There's also something called BCH codes that correct up to t errors. BCH codes are in some sense a generalized version of Hamming codes. But of course, coding theory goes way beyond these codes.
So, bits 3 and 5 contain the message, bits 2 and 4 contain exact copies of that, and bit 1 contains the parity bit of the message. If 2 and 3 are the same, then that is the first bit of the message; if bit 4 and 5 are the same then that is the second bit; if something differs, then the first bit can be used to determine which versions of the message bits are the correct ones. Easy enough.
I believe it does generalise to sending an N-bit message in 2N+1 bits, assuming there are zero or one bit errors: send the message twice, plus one parity bit. If the two messages agree, then ignore the parity (it might be wrong); if the messages don't agree then they differ in one bit, so use the parity bit to determine which message version is the correct one. Admittedly, for larger N this is sub-optimal, but it still works. The proper, more efficient generalization is described well in e.g. en.wikipedia.org/wiki/Hamming_code
binary calculations are hard to get ones mind wrapped around at first but this really holds up en.wikipedia.org/wiki/Hamming_code. When i started computerscience i didn't believe for one second that every finite number can be created as powers of two, but law and behold: they can. This is similar
The disconnect is that you won't always be sending 2N+1 bits, for higher numbers of information bits you'll actually be sending less than 2N+1 bits. You have to pay attention to the very brief generalization that he gave when explaining the Hamming Code. The parity bits are the bits that correspond to exact powers of 2, and each will be the parity of the bits whose position is a sum of those powers of 2. So if we wanted to send 3 bits of information instead of 2, we would use 6 bits. The parity bits would still be 1, 2, and 4, and the information bits would be 3, 5 and 6, and the parity rules would be as follows: bit 1: 1,3,5 bit 2: 2,3,6 bit 4: 4,5,6 (basically since we added bit6 to the message and 6 is 2+4, we include it in those parity bits) In fact, we can even send 4 bits of information and still only use 3 parity bits, the 4th information bit would be bit 7, and it would simply be included in the parity of all 3 parity bits (because 7 is 1+2+4). And with 4 parity bits we can send up to 11 bits of information. So you see the generalized procedure is far more efficient than simply 2N+1.
Yeah, it gets crazy. To send a megabyte of information, you would only need to dedicate 3 bytes to the parity data... but again, that only allows you to recover from an error in a single bit.
for the 2bit case the following would be more intuitive: if the bits A and B need to be transmitted, send the combination ABABC with C as a checksum for AB. you can then compare the A's to eachother aswell as the B's. if any don't match use the check bit in combination with thr matching bit. (so basically the same as in the video, but parity bits in position 3,4,5 and using even sums instead of uneven ones because A, B, A, B, C is more intuitive than C, not A, A, not B, B)
Well if you move from a 5-bit to a 6-bit overall length then that brings an extra message-bit into play i.e. bit 6 . So you now have three bits (3, 5 and 6) to hold your message info. giving you 2^3 = 8 combinations to represent 8 weather states.
Nice. I'd like to see a follow up video about measuring the relative detection/correction effectiveness of various codes, e.g. why might somebody prefer Hamming(7, 4) over the presented code with 5-bits, 2 of which are for data?
Error correction is going to be important when the internet goes all the way to Mars. I think we'll have to have more than just one bit correction, it's worth giving up some bandwidth for the *crazy* ping time. :p
It seems to me, that this way of error correction is just sending the messages 2 times and a parity bit. Because bit 2 and bit 4 are always the copy of the bit 3 and bit 5, and the bit 1 acts sort of as a parity bit. Am I wrong?
aren't electrical disturbances only able to flip a bit to 1 as for a power surge? what could something cause a 0 during the transportation? in the case of only flipping to 1, you need 4 bits for 4 states, as parity can be encoded as 00 and 11 and the rest in the remaining two, so you can detect an error in either the parity or the code.
What happens when then computer can't make the difference between ACK (0000) and NACK (1111)? And is this 'system' viewed "IN-side" or/and "OUT-side" the 'system'?
Very good, thanks. I now understand how to add error DETECTION bits to a signal but the title is error CORRECTION and for this there are just verbal comments like apply various techniques. So in conclusion fine and good but Mia-titled and I would like to know how to correct the error too please, perhaps in a subsequent video.
Is there any reason for having the check bits in position 1, 2 and 4? It seems to me that the order of bits shouldn't matter, as long as you are consistent. So why not just append the check bits to the end of the message, instead of having them in positions of powers of two?
Putting them into positions 1, 2, 4, 8, 16 etc ties in very nicely with doing things "by hand" because it reminds you that bit 4 (say) can only contribute in parity and sums-of-powers-of-two terms to bits that lie to the right of it. However you are quite right in what you say. The codewords matrix represents a linear vector space and so you can shift the exact-power-of-two-columns to the right hand end of the matrix (say) if that's what you want to do. In fact doing so may make it easier to "discard" those bits, via bit-shift operations, to leave just the message bits.
So it's kind of like... instead of storing all of the information as part of the machinery of the transmission, you keep all that machinery at the other end as a kind of cypher, allowing you to reconstruct data you didn't have to send, because it was implied by the positioning of the data you DID send? Data within data, with no extra bandwidth spent?
Can we have a quick video on an actual algo like crc32? Or (brain fart) on what ethernet or T1 or E1 uses? IIRC SCSI also has a CRC which might be easier to draw (not sure)...?
We just have to assume we can have only one error and the probability of which is sufficiently low that probability of having two errors would be the square of that probability. And it would be insanely small.
you don't really, but the assumption is that even just a single error is fairly rare, so getting two would be very unlikely. Also, if for some reason you could expect there to be 2 errors, you could detect it with hamming code, you would just be unable to correct it then. Basically, it depends on what do you estimate the probability for an error is.
It is the most likely error. Hamming code allows you to detect and correct 1 bit error and detect 2bit error. With 3bit error and above the message gets scrambled potentially undetectably. The probability of 2bit error is the probability of 1bit error squared. With 3bit error is cubed. If 1bit error has probability 0.01 (aka 1%) two bit error has just 0.0001 (which is 1such error in 1kB downloaded) and probability of getting wrong message (aka. 3bit error) is 0.000001 which is once per 100kB downloaded.
can you talk about how the coding theory is connected to what I study (cryptography and network security). why coding theory is important for computer security. Furthermore, Can you explain CRC(cyclic redundancy check) and how error correction works in modern technologies such as hard drive error correction and channel coding in 4G network. This video is more interesting than what I learned, but coding theory is a very difficult course(mathematical group theory or abstract algebra).
If 00000 corrupted to 01100, then surely it would correct to 11100 and return ACK. Are we saying we don't care about detecting 2 errors or have I missed something? Or is this just a case of better than nothing?
I find it interesting that you can't get a third "proper" state in four bits. In four bits, each "proper" state maps on to five real states, so there are actually enough.
Not only that, since it is more than twice as long, it is at least twice as likely to have two errors in the code word... which this error correction method cannot handle.
One part of coding theory is of course the keep the rate k/n as close to 1 as possible, where k is the number of information symbols and n is the total number of symbols. When generalizing Hamming codes to arbitrary length, the number of information symbols grows exponentially while the number of parity symbols only grows linearly.
that's quite intuitive thinking, but if you work out the algebra, you'll find you get a much better success rate in decryption. For example, if the probability of a bit-flip error is less than 0.1, the successful transmission is four times more likely. More precisely, if the probability of bit-flip is p, the probability of success in the naive sending two bits is (1-p)^2, the probability for the hamming code is (5-p)*(1-p)^4.
Time taken and space taken is a common trade off in computer systems. By adding these extra bits for correction you do take up more space but you save on transmission time by not having to retransmit messages. In modern networks there is so much bandwidth that the extra length doesn't really matter anyway
Watched this years ago and it was too confusing for me, now I’m about to take computer engineering courses at uni and it’s time to see if I can understand this now
Can't you just repeat the 2 bit message twice and add 1 parity bit and pretty much get the ability to correct 1 bit errors from such 5 bit message (if message copies are equal, it's that, if they are different, it's the one with proper parity)?
What if the message was longer than just 2 bits? let's say the message was 1024 bits long. by your logic, you would have to send 2*1024 + 1 bits of data, while in hamming code you would have to send 1024+10. Generally, instead of sending 2n+1 bits for an n bit message, we would only need to send n + log2(n) bits, which for large messages (as they typically are over the internet) is much much better.
if one of the duplicates errors, how will you know which is the correct one as both will be good codes. A single parody bit will only tell you that an error occurred, not ow to correct it. so, with the simplified states 00, 01, 10, and 11 I send the message the way you propose with the format two-digit code , duplicate two-digit code, even parody bit, (c1 c2 d1 d2 p) and you receive the message: 01000 What was the original message supposed to be? 00000? 01010? you don't know. You know an error occurred, but can't correct it.
you can, but you get only 1/3rd transmission rate (meaning only 1/3 of the send message is useful data). With hamming code you can get (1-log(n)/n) transmission rate with the same amount of protection. For 1 bit code that is 0.333, for 2bit code that is 0.571, for 5bit code that is 0.839 which is already a significant improvement.
This method assumes you would realistically only have up to 1 error per message. If you think you could have more, you can just increase number of parity bits, but as said 3 bits is safe in 99,99% cases, otherwise it's a sign of a bad, inefficient and unreliable transmission system. Think about it, what are the chances that both bits in a 2 bit message are corrupted?
You can detect about twice as many errors than you can correct. So if enough errors occur, you may be able to detect them but not correct them. Also, there could of course be enough errors to transform one codeword into another codeword, in which case you wouldn't even be able to detect that errors occured!
I don't see why you couldn't just repeat the 2 bit code and make a 4 bit code out of it. Couldn't that also check and fix the 1 bit corruption with even less space?
+faiAre Worse than that. In the era this was developed, you measured bandwidth costs in minutes of long distance time at the given baud rate (300 Bd was normal consumer level in the early '80s) with a floor of 1 minute. For the oversimplified 2bit weather code used in this subseries it wouldn't be an issue (64 bits isn't going to take anything like a minute) but in meatspace we're going to be sending text that takes up tens to thousands of characters at 7bits per character (which was the standard in that era) or binary files of a few thousand bits with lower fault tolerance than text has. As an example here consider +Horsecock 's post above. There are 110 characters in the text of his post, so 770 bits times 32 repeats at 300 bps (bps and bd aren't always the same, but they are for this era of modems) sent presuming a flat $6 in modern value per minute (long distance was never a flat rate, and the $6 per minute is ballpark adjustment for inflation of the average 1980 daytime weekday LD rates in the US). That works out to about $8.21 in today's money to send his post using the schema he describes in that post. Sending it at night, which as far as i can find reduced the cost for LD to around 1/3 the daytime rate, would cost $2.74. Using the schema described in the video, meanwhile, encodes the text of 7bit character codes with error detection/correction bits as 11 bits per character which works out to 4 seconds to send at 300bps so would be a flat $6 in inflation corrected LD rates during weekday daytime and $2 at night with enough time (56 seconds or so) to send more data without increasing those rates.
Meh. If I was really worried I'd have included the late '80s through early '90s costs for a specific BBS network I was familiar with in that era that bypassed local long distance costs (which were getting more expensive than cross country LD at the time) by sharing data every few hours via purely local phone calls.
Explaining bit level (hard) correction is a bad idea since it is not used any more. Better explain soft decoding, block decoding LDPC or turbo. Hamming code is just a small hard decoded LDPC block.
Seems to me you're making it less robust with multiple bits. Using 3 bits to encode 1 bit means you can handle 1 flipped bit, ie. 33.etc% of the transmission. Using 5 bits to encode 2 bits still only lets you handle 1 flipped bit, ie. 20% of the transmission.
That's not true. With 3 bits encoding a single bit, you can correct the error at the receiving end. Here, they're using 5 bits to encode two bits, to get that same error correction ability - but it's less robust as the risk of more than 1 bit error is greater in the total 5 bit transmission than in a 3 bit transmission.
In general huge distances are thought of as hundreds (multiple 1 hundreds) of miles apart. It doesn't take long to send something 100 miles, when you near 600 you're looking at a second or so. Any longer and it exponentially increases due to interference.
not really. You send the n-long message just once but you add log2(n) bits which allow the person on the other side to pinpoint where error has occurred, because you only need log2(n) digits to encode position in n-long message.
+kingpopaul It only looks like it's twice as long because the original messages were so short (2 bits, to be exact) For, say, a 100 bit message, you'd only need a "Code Word" of 107 bits. so that would be 7 parity bits. That improves the situation quite a bit, doesn't it?
Repeatedly switching between filmed footage and the graphic was very disorienting. Some people may learn better when you explain things slow with repetition and others learn better by watching the process. I for one became very distracted when I couldn't watch the graphic progress and had to wait for the explanation get through the repetitive parts. Not complaining, just saying these videos could be made a lot more interesting(=more views) with better planning and editing.
Sean, Profound thanks for a GREAT piece of editing.
>30 mins down to 16' 35" takes some doing ! Dave.
ProfDaveB thanks for the great vid!
ProfDaveB You are a great teacher. Love watching your videos no matter the topic. Greetings from germany.
Absoluteley love your teachings and your explanations!
I dont mind watching to 30 min long of your explanation video though :p
Love your work sir!
Ahhhh! He didn't show us the error correction! Correct this shortcoming immediately! This guy is my favourite presenter. Thanks so much guys.
EBy7IYtJRKs
??
Please film the video on the Hamming code error correction method.
Thanks.
Take an encoded word, flip a single bit, and see if you can figure out the method for determining the place the error occurred. It ends up being super simple.
Soof Golan because they have to give uneven sums, bits and their parity checkbits have to be different (so 0,1 or 1,0). so if b2=b3 one of them is wrong, same for b4=b5. if that's the case, you can then check the bits b1+b3+b5 if their sum is an uneven number. if it is, bits b1,b3 and b5 are correct and one of the parity bits was faulty (b2 or b4). if they do not add to an uneven number, one of them is wrong (so if b2=b3, b3 has to be the wrong one, b4=b5 means b5 was wrong and if neither of those was the case, only the parity bit b1 was incorrect).
Agreed, more Hamming code videos, loved it!
*As a quick example:*
Say we receive the code 11000
(1) Looking at bits 1, 3 & 5:
bit3=0 and bit5=0 which should mean bit1=0 therefore either bit1, bit3 or bit5 is wrong.
(2) Looking at bits 2 & 3:
bit2=1 and bit3=0 indicates that either bit2 or bit3 is wrong, as we do not have even parity.
(3) Looking at bits 4 & 5:
bit4=0 and bit5=0 indicates that both bit4 & bit5 are correct as we have even parity.
Since we are assuming that only 1 bit is incorrect, taking (1) and (2) together implies that bit3 must be wrong and so the code should have been 11100=CLOUDY.
There is a simple one amazingly from 3 years ago
here : /watch?v=5sskbSvha9M
0:23 missing n in
Thanks for spotting. Sorry! >Sean
No, nak is in the right place. The challenge was to find a third point that was at least three moves away from *both* ack and nak >Sean
NAK should be written 1110 on the hypercube diagrams; 1111 would be on the inner cube.
Yes indeed 1111 logically belongs on the inner cube . But the labelling of the corners isn't the issue here. Rather, the question is: can you locate 4 red blobs on any 4 of the 16 available corners such that all 4 of therm are at least distance-3 from one another. No - you can't.
??
6:16 is in the wrong location, it is at 1110, not 1111.
I'm surprised you didn't go over hamming(7, 4). It has such a nice graphical representation which makes encoding and decoding really intuitive.
Agreed, it's a really nice example - and easy to remember.
Would be nice to see more about the generalization here. How does this work if you want to encode a three-bit message? Or, conversely, if you add a 6th bit what does it do? And what happens once you get up to the next highest power of 2? What are the general rules that guide this process?
en.wikipedia.org/wiki/Hamming_code#Hamming_codes
Look at the the table in the middle of the article :)
With m parity bits, you can error correct up to (2^m)-1 bit message, which means you have (2^m)-m-1 bits left for data. with 3 parity bits you can add 6th and 7th bit for data and still be fine.
The general principle is that the parity bits encode the parity of all bits that have that particular binary digit in their position. That allows you to pointpoint the exact position of each error. If parity bits 1,2,8 don't match up it means that bit 1+2+8=11 has error. If parity bit 8 alone doesn't match up, it means that bit 8 is wrong (which is itself). To encode position "i" of the bit in "n"-long string you need log2(n) bits to do it. The length of the message rises linearly, but number of parity bits rises logarithmically, which means the space wasted for error-correction drops exponentially with the length of the message.
The '0000' and '1111' codes at 6:17 (and onwards) are actually marked incorrectly on the hypercube. You can see that the edge distance between them is clearly 3, however it should be the same as the corresponding Hamming distance - 4. The correct way to mark these codes would be, if we keep '0000' where it already is, to place the '1111' in the opposite corner of the INNER cube, rather than the outer. You can then convince yourself that the edge distance between '0000' and '1111' becomes 4, as it should be.
Note that the author's point is NOT invalidated by this. It is still correct that you cannot have three 4-bit codes all of which are at least 3-away from each other in Hamming distance.
The amount of idiocy, arrogance and lack of interest in the matter is absurd in the comment section. Great video as always, my thanks to the professor.
most people does not even know about what is coding theory, which is a very difficult course and it is related to abstract or modern algebra.
Many thanks to Professor Brailsford and the team
is the part at 4:00 an example for applied error correction, live ? Or did you shoot the footage with a VHS recorder :D
Can't find the playlist that was mentioned in the beginning :/
Link in the description.
To be fair, that model is probably as accurate as the typical weather forecast.
Can you cover the Byzantine General's problem? It's a fascinating problem with a huge impact.
I was wondering if this *was* a possible solution or--at least a workaround--to the B2G problem. +1
Definitely would like to see the demo of correcting the errors
How does this work for n-bits?
If you look it closely, you're just adding a bit for parity check and then doubling all the other bits.
So if a pair of those doubled bits is not even, then you see if the sum was even, so you can correct.
If the damaged bit is the parity one, there's no problem since all the other bits are perfectly doubled, so you don't need to check anything.
3bits example
000 -> 0 00 00 00
001 -> 1 00 00 11
010 -> 1 00 11 00
011 -> 0 00 11 11
100 -> 1 11 00 00
101 -> 0 11 00 11
110 -> 0 11 11 00
111 -> 1 11 11 11
Lets say 0111100 gets damaged and become 0011100
The first pair after the parity bit is 01, but the parity bit says that the sum must be even, so it was 11.
if instead the received signal was 1011100
The first pair is 01, but the parity bit says that the sum must be odd, so it was 00.
But I think there is a little problem, because some of the words only differ 2 bits, like 001 and 010.
The video says you need 3 bits difference...
m-parity bits can cover upto 2^m -1long message, leaving 2^m -m-1 leftover room for the message itself. To get the amount of required parity bits just solve for n
As KohuGaly mentioned, Hamming codes in general has codeword length (2^r)-1 where r is the number of parity symbols. Each codeword has (2^r)-1 symbols but since r of them are parity symbols, there are only (2^r)-r-1 information symbols. Hamming codes can correct up to 1 error. There is also something called BCH codes, which in some sense is a generalized version of Hamming codes. BCH codes can correct up to t errors.
Coding theory goes far beyond these codes, though. Hamming codes are linear codes, which means that the code can be considered a vector space in which the codewords are vectors(which basically means that you can add them to eachother). These kinds of codes are quite nice to deal with mathematically, but they are only the tip of an enormous iceberg of codes.
Too bad none of these responses tell you how to actually construct a Hamming Code for three bit messages. Allow me to correct that problem.
To know how many parity bits to add, the rule is this: add an extra parity bit if the total number of bits becomes equal to a power of two. So, for example, if the message was 5 bits long, adding 3 parity bits would get you 8 bits, which is a power of 2. Therefore, a 5 bit message requires 4 parity bits, for a total of 9 bits in the final code word.
However, you were talking about checking for a 3-bit message, and this requires only 3 added parity bits, for a total of 6 bits:
1 2 3 4 5 6
Once again, bits 1, 2 and 4 are the parity bits, whereas bits 3, 5 and 6 are the information bits. The rule for which parity bits check which bits in the code word is essentially the same as with 2-bit messages / 5-bit code words:
Bit 1 checks bits 1, 3 and 5
Bit 2 checks bits 2, 3 and 6
Bit 4 checks bits 4, 5 and 6
So, for the 8 possible 3-bit messages with an even-parity Hamming Code we have (the parity bits are in [] brackets):
000 --> [0] [0] 0 [0] 0 0
001 --> [0] [1] 0 [1] 0 1
010 --> [1] [0] 0 [1] 1 0
011 --> [1] [1] 0 [0] 1 1
100 --> [1] [1] 1 [0] 0 0
101 --> [1] [0] 1 [1] 0 1
110 --> [0] [1] 1 [1] 1 0
111 --> [0] [0] 1 [0] 1 1
You can check: the "distance" _[aka number of bits that differ]_ between any 2 code words is at least 3. In fact, I think you'll find that it's either 3 or 4. And since any error of n bits can be corrected by a code which has a Hamming Distance of at least 2n + 1, the code I generated above qualifies.
I think it'd be better to wait for _Computerphile_ to make a _How to correct 1-bit errors using the Hamming Code_ video to explain how to actually go about checking for and correcting any errors.
Ahsim Nreiziev - So I'm gathering you can send the data out in one long signal, adding a parity bit every 2^x bits, and have error checking built in?
Hi Computerphile! Would it be possible to add links to the earlier videos that the Professor is referring to, into the description?
Dealing efficiently with binary asymetric error correction, i.e. where 1s are more likely to be turned into 0s than 0s into 1s, seems like it would be very complex and interesting
The No. 1 ESS Switch program store used hamming and parity check methods to correct single read errors. Each 44 bit word contained 5 hamming bits and 1 parity bit as I recall.
Excellent lecture! Thank you professor
Hamming codes are brilliantly clever but there not that practical. In reality bit flip errors rarely occur in singles like this. Generally if there is some external phenomenon that is causing noise in the the signal it will occur in bursts that will mess up a large sequences of bits. Also hamming codes are really expensive as they wasting a lot of bandwidth. For these reasons generally CRC is used for error detection, and correction is done when needed by re-transmission.
Hamming codes are really only useful in situations where error rates are very low, and the cost of re-transmission is quite large. One possible example of this is communicating with things way out in space where it takes a long time to send a message round trip, however the large amount of potential interference can still makes the error rate on the higher side, making hamming codes again unpractical.
The hyper cube is wrong. the diametrically opposite point should be on the inside cube.
I was going to mention the same thing. Luckily I found your comment first.
Shapeshifter5 UNIMATRIX 01
??
@@Triantalex the opposite point on a hypercube wont be on the same 3d cube face.
I didn't really understand but I like listening to Professor Brailsford
Amazing..... Is that BCH code or RS code?
I'd love to see the video on how the error correction is done, and maybe if you could work in how parity can be scaled up such as .par files you used to get downloading large multi-part archives on Usenet.
Erasure codes can also be explained. They are used in huge storage systems.
Prof. Brailsford is the best.
great video !!! question: is that the same for the control of errors with checksum or CRC ?
Does the Hamming method hold up if you need to send more than 2 information bits? I imagine you'll need exponentially more parity bits the more information bits you have.
You won't need exponentially more parity bits. Even if you would come up with a code that would scale in such an exponential way, there are way more clever alternatives out there. Hamming codes in general has codewords with (2^r)-1 symbols where r is the number of parity symbols. Thus, the Hamming codes has (2^r)-r-1 information symbols, but only r parity symbols. That is, the information symbols grow exponentially, but the parity symbols grow linearly. There's also something called BCH codes that correct up to t errors. BCH codes are in some sense a generalized version of Hamming codes. But of course, coding theory goes way beyond these codes.
So, bits 3 and 5 contain the message, bits 2 and 4 contain exact copies of that, and bit 1 contains the parity bit of the message. If 2 and 3 are the same, then that is the first bit of the message; if bit 4 and 5 are the same then that is the second bit; if something differs, then the first bit can be used to determine which versions of the message bits are the correct ones. Easy enough.
Think of this in a larger scale, that logic doesn't always hold.
I believe it does generalise to sending an N-bit message in 2N+1 bits, assuming there are zero or one bit errors: send the message twice, plus one parity bit. If the two messages agree, then ignore the parity (it might be wrong); if the messages don't agree then they differ in one bit, so use the parity bit to determine which message version is the correct one. Admittedly, for larger N this is sub-optimal, but it still works. The proper, more efficient generalization is described well in e.g. en.wikipedia.org/wiki/Hamming_code
binary calculations are hard to get ones mind wrapped around at first but this really holds up en.wikipedia.org/wiki/Hamming_code. When i started computerscience i didn't believe for one second that every finite number can be created as powers of two, but law and behold: they can. This is similar
The disconnect is that you won't always be sending 2N+1 bits, for higher numbers of information bits you'll actually be sending less than 2N+1 bits.
You have to pay attention to the very brief generalization that he gave when explaining the Hamming Code. The parity bits are the bits that correspond to exact powers of 2, and each will be the parity of the bits whose position is a sum of those powers of 2.
So if we wanted to send 3 bits of information instead of 2, we would use 6 bits. The parity bits would still be 1, 2, and 4, and the information bits would be 3, 5 and 6, and the parity rules would be as follows:
bit 1: 1,3,5
bit 2: 2,3,6
bit 4: 4,5,6
(basically since we added bit6 to the message and 6 is 2+4, we include it in those parity bits)
In fact, we can even send 4 bits of information and still only use 3 parity bits, the 4th information bit would be bit 7, and it would simply be included in the parity of all 3 parity bits (because 7 is 1+2+4).
And with 4 parity bits we can send up to 11 bits of information. So you see the generalized procedure is far more efficient than simply 2N+1.
Yeah, it gets crazy. To send a megabyte of information, you would only need to dedicate 3 bytes to the parity data... but again, that only allows you to recover from an error in a single bit.
for the 2bit case the following would be more intuitive: if the
bits A and B need to be transmitted, send the combination ABABC with C as a checksum for AB.
you can then compare the A's to eachother aswell as the B's. if any don't match use the check bit in combination with thr matching bit.
(so basically the same as in the video, but parity bits in position 3,4,5 and using even sums instead of uneven ones because A, B, A, B, C is more intuitive than C, not A, A, not B, B)
Excellent lessons. I’m going to have to watch it again though.
am I correct in understanding that this gives you room to correct 1 error for any of those 4 codewords? what if we want to correct multiple errors?
Thank for the great content and knowedge, before watch this channel i never tought about computer philosophy.
Great video, but it would be nice to know a general idea beyond 2 bit (5 bit) words, and error correction on the other end.
ProfDaveB wow, great! I love your content by the way, you are a great educator!
yes! please do an episode on correction!
what happens when the parity bits get clobbered up?
What if you wanted more than 4 weather conditions?
Well if you move from a 5-bit to a 6-bit overall length then that brings an extra message-bit into play i.e. bit 6 . So you now have three bits (3, 5 and 6) to hold your message info. giving you 2^3 = 8 combinations to represent 8 weather states.
Can we do concatenated error correcting codes for next lesson?
1:10 Ahh, well, if there's an electrical disturbance on the line, you know it must be either cloudy or rainy 😌
0:24 dimesion
Nice. I'd like to see a follow up video about measuring the relative detection/correction effectiveness of various codes, e.g. why might somebody prefer Hamming(7, 4) over the presented code with 5-bits, 2 of which are for data?
Hamming[7,4] is a secded code, so as well as being able to correct a single-bit error it can detect a second error (but not solve it).
Error correction is going to be important when the internet goes all the way to Mars. I think we'll have to have more than just one bit correction, it's worth giving up some bandwidth for the *crazy* ping time. :p
It seems to me, that this way of error correction is just sending the messages 2 times and a parity bit. Because bit 2 and bit 4 are always the copy of the bit 3 and bit 5, and the bit 1 acts sort of as a parity bit. Am I wrong?
aren't electrical disturbances only able to flip a bit to 1 as for a power surge? what could something cause a 0 during the transportation? in the case of only flipping to 1, you need 4 bits for 4 states, as parity can be encoded as 00 and 11 and the rest in the remaining two, so you can detect an error in either the parity or the code.
in dynamic ram bits need to be refreshed or they will lose their charge.
How does one determine the number of bits required to encode x many distinct choices in code that contains up to n many errors?
7:28 A little mistake found: the 1111 (NAK) must be on the inner cube.
What happens when then computer can't make the difference between
ACK (0000) and NACK (1111)?
And is this 'system' viewed "IN-side" or/and "OUT-side" the 'system'?
0:23 "Higher dimesion"
Very good, thanks. I now understand how to add error DETECTION bits to a signal but the title is error CORRECTION and for this there are just verbal comments like apply various techniques.
So in conclusion fine and good but Mia-titled and I would like to know how to correct the error too please, perhaps in a subsequent video.
Is there any reason for having the check bits in position 1, 2 and 4? It seems to me that the order of bits shouldn't matter, as long as you are consistent. So why not just append the check bits to the end of the message, instead of having them in positions of powers of two?
Putting them into positions 1, 2, 4, 8, 16 etc ties in very nicely with doing things "by hand" because it reminds you that bit 4 (say) can only contribute in parity and sums-of-powers-of-two terms to bits that lie to the right of it. However you are quite right in what you say. The codewords matrix represents a linear vector space and so you can shift the exact-power-of-two-columns to the right hand end of the matrix (say) if that's what you want to do. In fact doing so may make it easier to "discard" those bits, via bit-shift operations, to leave just the message bits.
That makes a lot of sense, the pattern does become much clearer that way. Thank you for the quick reply.
So it's kind of like... instead of storing all of the information as part of the machinery of the transmission, you keep all that machinery at the other end as a kind of cypher, allowing you to reconstruct data you didn't have to send, because it was implied by the positioning of the data you DID send? Data within data, with no extra bandwidth spent?
Can we have a quick video on an actual algo like crc32? Or (brain fart) on what ethernet or T1 or E1 uses? IIRC SCSI also has a CRC which might be easier to draw (not sure)...?
I wish Prof. Brailsford was my dad
He's everyone's dad here.
I can dream hamming codes. What I want to learn about in a video is how Reed-Solomon works as I struggle with the polynomials over a field.
but how do you know you only had a 1 bit error?
We just have to assume we can have only one error and the probability of which is sufficiently low that probability of having two errors would be the square of that probability. And it would be insanely small.
you don't really, but the assumption is that even just a single error is fairly rare, so getting two would be very unlikely. Also, if for some reason you could expect there to be 2 errors, you could detect it with hamming code, you would just be unable to correct it then. Basically, it depends on what do you estimate the probability for an error is.
It is the most likely error. Hamming code allows you to detect and correct 1 bit error and detect 2bit error. With 3bit error and above the message gets scrambled potentially undetectably. The probability of 2bit error is the probability of 1bit error squared. With 3bit error is cubed. If 1bit error has probability 0.01 (aka 1%) two bit error has just 0.0001 (which is 1such error in 1kB downloaded) and probability of getting wrong message (aka. 3bit error) is 0.000001 which is once per 100kB downloaded.
can you talk about how the coding theory is connected to what I study (cryptography and network security). why coding theory is important for computer security.
Furthermore, Can you explain CRC(cyclic redundancy check) and how error correction works in modern technologies such as hard drive error correction and channel coding in 4G network.
This video is more interesting than what I learned, but coding theory is a very difficult course(mathematical group theory or abstract algebra).
What happens when more than 2 bits are corrected? I guess such a condition is possible
If 00000 corrupted to 01100, then surely it would correct to 11100 and return ACK. Are we saying we don't care about detecting 2 errors or have I missed something? Or is this just a case of better than nothing?
Patrick Nunn hamming only works when you get 1bit wrong, when 2 or more bits are bad, the whole thing doesn't work
Ok this haming stuff is incredibly clever. Like... just... so insanely clever...
false.
What wasn't clearly explained was how you arrived at those specific parity rules for the hamming code
What parity rules? You mean evenness? That was covered in the last video in the series.
icedragon769 No, I mean the rules about which bits are parity for which posititions. How would you figure that out for, say, an 8 bit code?
Also 'hamming' is restricted from use in any theatrical environment.
I find it interesting that you can't get a third "proper" state in four bits. In four bits, each "proper" state maps on to five real states, so there are actually enough.
This video made me ask, can a 1D thing roll? So a 2D thing too?
Are you asking if you can rotate something that exists only in 1 dimenion? :P
For one bit of error correction the message is 2.5 times longer now.
Not only that, since it is more than twice as long, it is at least twice as likely to have two errors in the code word... which this error correction method cannot handle.
One part of coding theory is of course the keep the rate k/n as close to 1 as possible, where k is the number of information symbols and n is the total number of symbols. When generalizing Hamming codes to arbitrary length, the number of information symbols grows exponentially while the number of parity symbols only grows linearly.
that's quite intuitive thinking, but if you work out the algebra, you'll find you get a much better success rate in decryption. For example, if the probability of a bit-flip error is less than 0.1, the successful transmission is four times more likely. More precisely, if the probability of bit-flip is p, the probability of success in the naive sending two bits is (1-p)^2, the probability for the hamming code is (5-p)*(1-p)^4.
lol who th ell neds eror coretion tht stufs for lossers!
Time taken and space taken is a common trade off in computer systems. By adding these extra bits for correction you do take up more space but you save on transmission time by not having to retransmit messages. In modern networks there is so much bandwidth that the extra length doesn't really matter anyway
You are my youtube grandpa
6:45
Aaah. That... explains things! THanks! :D
Watched this years ago and it was too confusing for me, now I’m about to take computer engineering courses at uni and it’s time to see if I can understand this now
For some reason I can't hear anything on this video. Everything works fine with other youtube vidoes though.
I got one post letters . I asset some year ago
I guess the next step would be a video on polynomial error correction.
Yeah, it would be great if we could see the actual error correction procedure :D
Am I the only one who really feel like sunny ought to have been 00, and foggy 01?
I thought the exact same thing! That would make more sense from progression from sunny to rainy
Can't you just repeat the 2 bit message twice and add 1 parity bit and pretty much get the ability to correct 1 bit errors from such 5 bit message (if message copies are equal, it's that, if they are different, it's the one with proper parity)?
What if the message was longer than just 2 bits?
let's say the message was 1024 bits long. by your logic, you would have to send 2*1024 + 1 bits of data, while in hamming code you would have to send 1024+10.
Generally, instead of sending 2n+1 bits for an n bit message, we would only need to send n + log2(n) bits, which for large messages (as they typically are over the internet) is much much better.
if one of the duplicates errors, how will you know which is the correct one as both will be good codes. A single parody bit will only tell you that an error occurred, not ow to correct it.
so, with the simplified states 00, 01, 10, and 11 I send the message the way you propose with the format
two-digit code , duplicate two-digit code, even parody bit, (c1 c2 d1 d2 p) and you receive the message:
01000
What was the original message supposed to be? 00000? 01010? you don't know. You know an error occurred, but can't correct it.
you can, but you get only 1/3rd transmission rate (meaning only 1/3 of the send message is useful data). With hamming code you can get (1-log(n)/n) transmission rate with the same amount of protection. For 1 bit code that is 0.333, for 2bit code that is 0.571, for 5bit code that is 0.839 which is already a significant improvement.
But what if ALL the bits are wrong? :(
If you know all the bits are wrong you'll only need to xor it with a bunch of 1s, and all the bits will be correct :)
This method assumes you would realistically only have up to 1 error per message. If you think you could have more, you can just increase number of parity bits, but as said 3 bits is safe in 99,99% cases, otherwise it's a sign of a bad, inefficient and unreliable transmission system. Think about it, what are the chances that both bits in a 2 bit message are corrupted?
You can detect about twice as many errors than you can correct. So if enough errors occur, you may be able to detect them but not correct them. Also, there could of course be enough errors to transform one codeword into another codeword, in which case you wouldn't even be able to detect that errors occured!
I wonder if the professor has heard of physicist James Gates' discovery of a error correction code within the physics of the universe.
I remembered it was a resemblance but I don't remember anyone debunk it either.
So basically you have one parity bit for the sum of the data bits and then you have one parity bit for each individual data bit.
more on this plz
Love these videos!
I don't see why you couldn't just repeat the 2 bit code and make a 4 bit code out of it. Couldn't that also check and fix the 1 bit corruption with even less space?
That's fuuny, I've study this at school the same day the video was out
This is like a constellation diagram in quadrature amplitude modulation or ofdm.
He's fantastic!
Thank you
Why not just spend 64 bits, with the code repeating. No need to worry about it being wrong 32 times in a row.
Goodbye bandwidth lol
+faiAre Worse than that. In the era this was developed, you measured bandwidth costs in minutes of long distance time at the given baud rate (300 Bd was normal consumer level in the early '80s) with a floor of 1 minute. For the oversimplified 2bit weather code used in this subseries it wouldn't be an issue (64 bits isn't going to take anything like a minute) but in meatspace we're going to be sending text that takes up tens to thousands of characters at 7bits per character (which was the standard in that era) or binary files of a few thousand bits with lower fault tolerance than text has.
As an example here consider +Horsecock 's post above. There are 110 characters in the text of his post, so 770 bits times 32 repeats at 300 bps (bps and bd aren't always the same, but they are for this era of modems) sent presuming a flat $6 in modern value per minute (long distance was never a flat rate, and the $6 per minute is ballpark adjustment for inflation of the average 1980 daytime weekday LD rates in the US). That works out to about $8.21 in today's money to send his post using the schema he describes in that post. Sending it at night, which as far as i can find reduced the cost for LD to around 1/3 the daytime rate, would cost $2.74. Using the schema described in the video, meanwhile, encodes the text of 7bit character codes with error detection/correction bits as 11 bits per character which works out to 4 seconds to send at 300bps so would be a flat $6 in inflation corrected LD rates during weekday daytime and $2 at night with enough time (56 seconds or so) to send more data without increasing those rates.
Jeez, if you guys are so worried about it, then maybe only send it 16 times. Is that better?
Meh. If I was really worried I'd have included the late '80s through early '90s costs for a specific BBS network I was familiar with in that era that bypassed local long distance costs (which were getting more expensive than cross country LD at the time) by sharing data every few hours via purely local phone calls.
Explaining bit level (hard) correction is a bad idea since it is not used any more.
Better explain soft decoding, block decoding LDPC or turbo.
Hamming code is just a small hard decoded LDPC block.
Seems to me you're making it less robust with multiple bits. Using 3 bits to encode 1 bit means you can handle 1 flipped bit, ie. 33.etc% of the transmission. Using 5 bits to encode 2 bits still only lets you handle 1 flipped bit, ie. 20% of the transmission.
Yes but it gives you the opportunity to error correct. If you are using only 3 bits you have no way of knowing which bit to correct.
That's not true. With 3 bits encoding a single bit, you can correct the
error at the receiving end. Here, they're using 5 bits to encode two
bits, to get that same error correction ability - but it's less robust
as the risk of more than 1 bit error is greater in the total 5 bit
transmission than in a 3 bit transmission.
awesome, more on that please!
I love error correction, specially forward error correction
assuming the double errors don't occur isn't actually helping.
SF and Sac'to *are* miles apart! 90 miles, in fact.
In general huge distances are thought of as hundreds (multiple 1 hundreds) of miles apart. It doesn't take long to send something 100 miles, when you near 600 you're looking at a second or so. Any longer and it exponentially increases due to interference.
So basically, instead of sending many time the same message, just send one message that contains the same information twice, at least.
You make ECC sounds like inverse lossless compression.
not really. You send the n-long message just once but you add log2(n) bits which allow the person on the other side to pinpoint where error has occurred, because you only need log2(n) digits to encode position in n-long message.
+kingpopaul
It only looks like it's twice as long because the original messages were so short (2 bits, to be exact)
For, say, a 100 bit message, you'd only need a "Code Word" of 107 bits. so that would be 7 parity bits. That improves the situation quite a bit, doesn't it?
Except,sending 5 bits instead of 2, makes it more than twice as likely to get 2 errors :@
旺樹
ahh this is what I call advanced K-maps lol
You don't have to pretend Sacramento and San Francisco are miles apart, they are.
Well, this is apropos of what I'm working on today.
awesome vid.
"reliable payload".
Nice!
Repeatedly switching between filmed footage and the graphic was very disorienting. Some people may learn better when you explain things slow with repetition and others learn better by watching the process. I for one became very distracted when I couldn't watch the graphic progress and had to wait for the explanation get through the repetitive parts.
Not complaining, just saying these videos could be made a lot more interesting(=more views) with better planning and editing.