Exploring the impact of Flipped Bits.

Exploring the impact of Flipped Bits.

Following a few interesting conversations recently, I got interested in the idea of 'bit flip' – an occasion where a single binary bit changes state from a 0 to a 1 or from a 1 to a 0 inside a file.

I wrote a very inefficient script that sequentially flipped every bit in jpeg file, saved the new bitstream as a jpeg, attempted to render it in the [im] python library, and if successful, to calculate an RMSe error value for the new file.

I've not really had much time to take this further at the moment, but its an academic notion I'd be interested in exploring some more.

I'm not sure if a bit flip is a theoretical or 'real' threat on modern storage devices – in the millions of digital objects that have passed through my hands in the past 10 years, I've never knowingly handled a random bit flip errored file. I'd be interested in any thoughts / experiences / observations on the topic.

Please see the attached file for some pretty pictures.

Feel free to get in touch if you want any more data – images, RMSe data or scripts.

264
reads

17 Comments

  1. Jay Gattuso
    February 14, 2013 @ 7:04 pm CET

    Hey Paul,

    As ever, I completely agree.

    Underneath this work, there is a question in my mind that looks at knowing where critical parts of files are. Some file types will naturally localise errors into chucks, others will spread them evenly throughout the file object, and some will not tolerate any errors at all (depending on how the bitstream is organised in the file object, and now the file object is organised on the storage medium). 

    If a single bit in a txt file is errored, the error is always only localised to the affected byte – any burst errors would be dispersed at byte level throughout the file, and any cluster errors the same.

    If a single bit in a mp3 file is errored, as long as the bit is not in the critical setup/declaration parts of the header, then errors are confined to the frame that contains the errored bit. Any burst errors would be dispersed at a frame level throughout the file – (without doing an impact study in mp3, I'm not sure what size error mp3 frames are tolerant of).

    It’s not hard to imagine there are some files in which any damage to any bit results in a complete render failure.

    What we can see from these jpgs is that some errors will cause critical failure (it follows that if a one bit error is capable of preventing the object from being rendered, at some offsets an error of any size can do the same)

    Quick count of the failed render files: 1930 failed to render in script. That means we can assume that a jpg of a comparable size encountering any number of distributed bit errors is 1.37% per bit. Without accounting for a location bias, a full byte flip error has a ~10% chance of disabling a file.

    This asks questions (1) what is the location bias – I can see from my data that the first 6 bytes of jpg are critical – any error of any bit results in a 91.6% chance the whole file fail to render. (2) If I throw big enough errors at the file, when does it behave differently? (3) What if I spread those errors around the file (transmission or storage cluster errors) or in a single block (transmission or write errors)

    So yupe, totally take your point! it is a hypothetical attack, but it has helped to start some interesting discussions already, so I'm very happy with that!

     

  2. pixelatedpete
    February 14, 2013 @ 4:37 pm CET

    …this is a pretty nice art project.

  3. andy jackson
    February 14, 2013 @ 3:22 pm CET

    Some of the previous work is covered in the Heydegger paper referenced from here.

    Note that the reason we don't see bit-level damage is precisely because all of our systems are very carefully engineered in order to address them. There are error detection and correction protocols working for us at every moment, at the lowest levels of our systems.

    Which is why things mostly go wrong at the higher levels, where we haven't fully understood the classes of threat to the data and so engineered management protocols to compensate.

  4. andy jackson
    February 14, 2013 @ 3:16 pm CET

    Great stuff, but I had trouble understanding this part:

    This source image was reduced to 180 x 120 pixels in size, which results in a 117,514 byte image. This is equal to 140,112 bits of data per image, which results in 140,112 new images being created.

    Surely, at 8 bits to the byte, this should be 940,112 bits of data? Where did 140,112 come from?

  5. paul
    February 14, 2013 @ 1:55 pm CET

    Hi Jay,

    Nice work, but I'm not aware of any specific examples of disk/storage failure type bitrot with single bit flips. From what I've heard from people with experience of disk failures, this doesn't tend to happen very much. Although I'd love to see some better evidence on this.

    What we do know is that processes to manage files (move, replicate, migrate, etc) do sometimes go wrong. S**t does indeed happen. Software tools are buggy. Networks drop out. Humans press the wrong button. That's life. And this means that quality assurance across the lifecycle is pretty important.

    There are a few examples of damage to files (scroll down to the bit rot sections) that we've collected in our mashups, and these tend to be caused by an array of different issues. This rather interesting example with TIFFs is single bit damage, but seems likely to have been caused by the creating software as its consistent across a lot of files (although this has not been confirmed).

    I think there was a paper published by some of the Planets partners who conducted a similar experiment to yours, but I can't locate it. I'm sure another reader will know it…?

    Cheers

    Paul

Leave a Reply

Join the conversation