Exploring the impact of Flipped Bits.

14 February 2013

Following a few interesting conversations recently, I got interested in the idea of 'bit flip' – an occasion where a single binary bit changes state from a 0 to a 1 or from a 1 to a 0 inside a file.

I wrote a very inefficient script that sequentially flipped every bit in jpeg file, saved the new bitstream as a jpeg, attempted to render it in the [im] python library, and if successful, to calculate an RMSe error value for the new file.

I've not really had much time to take this further at the moment, but its an academic notion I'd be interested in exploring some more.

I'm not sure if a bit flip is a theoretical or 'real' threat on modern storage devices – in the millions of digital objects that have passed through my hands in the past 10 years, I've never knowingly handled a random bit flip errored file. I'd be interested in any thoughts / experiences / observations on the topic.

Please see the attached file for some pretty pictures.

Feel free to get in touch if you want any more data – images, RMSe data or scripts.

Bit rot

17 Comments

pixelatedpete
February 18, 2013 @ 9:57 am CET

You know, I've been thinking (dangerous I know) and wondered if my thought was a good idea, a bad idea, been done before, etc. and this seemed a good place to find out! 🙂

Flipping a bit produces a broken image. If you flip the bits on lots of images you get lots of broken images and broken images – particularly JPEGs – seem to exhibit very similar artifacts – at least the broken images seem familar somehow.

We could use this technique then to create a large body of broken images quite quickly.

Now, my question is, will we see any similarity in that breakage?

I'm not expecting a direct correllation between the bit and the damage (though it'd be neat if flipping bit 17 always resulted in a cyan swathe across the image for example) but rather that images that are broken may all produce similar artifacts/shapes?

If (big if probably) we can extract features from each of the broken images (Matchbox?) we may then be able to cluster around these features and start to answer that question – is there any similarity in the breakages?

Why?

If we can spot similarity, we can use that cluster data as another measure of whether or not an image is broken in the absence of any "ground truth" – ie. we've not migrated the image and are checking against the original, we're just handling an image in isolation – say from a CD-ROM we're ingesting?

Could also do something similar with images identified as broken on the Atlas, but I'm not sure the corpus is big enough yet…

Having thought it all through, I think I'll go get on with it! 🙂
andy jackson
February 15, 2013 @ 1:44 pm CET

Just asked a collegue, and they said over the last six years of operation of the main store, which has a current total of 50 million files containing about half a petabyte of data (replicated totals), the BL has seen spontanous bitstream damage once (i.e. only one file has ever been repaired for this reason). There have been other errors, but they have been down to systematic sources like faulty hardware or workflow problems, rather than true spontanous 'bit rot'.

So yes, it happens, but it is certainly rare.
Jay Gattuso
February 14, 2013 @ 7:27 pm CET

Indeed. When I get round to it, I'm going to loop them all into a movie. Even at full frame rate its going to be a very long and dull movie!
Jay Gattuso
February 14, 2013 @ 7:13 pm CET

Yupe, totally agree – I was following a couple of strands when I did this, one is the comments in the reply to Paul, and the other was to see what the resulting images look like!

I've really only seen file construction errors (where a filestream is created incorrectly) or truncation errors (where files haven't been written fully post tx or write).
Jay Gattuso
February 14, 2013 @ 7:16 pm CET

Good spot,

Fixed now, thanks.

You must be logged in to post a comment.

You might also like…

BSDIFF: Technological Solutions for Reversible Pre-conditioning of Complex Binary Objects

Documented provenance and the ability for researchers to locate and view original versions of digital records as transferred into an archive are concepts central to archival theory.

Using EXIFTool to address “Tag out of sequence” errors in images (and a 101 level dive into tags)

We were asked recently to write up a tactical fix for addressing “Tag out of sequence” errors in image files. It seems like the sort…

A Weekend With Nanite

Well over a year ago I wrote the ”A Year of FITS”(http://www.openpreservation.org/blogs/2013-01-09-year-fits) blog post describing how we, during the course of 15 months, characterised 400 million of harvested web documents using the File Information Tool Kit (FITS) from Harvard University. I presented the technique and the technical metadata and basically concluded that FITS didn’t fit that kind of heterogenic data in such large amounts. In the time that has passed since that experiment, FITS has been improved in several areas including the code base and organisation of the development and it could be interesting to see how far it has evolved for big data. Still, FITS is not what I will be writing on today.

Today I’ll present how we characterised more than 250 million web documents, not in 9 months, but during a weekend.

Join the conversation

Open Preservation Foundation
11 Jamaica Street
Greenock, PA15 1XX

+44 (0) 113 526 6467

[email protected]

About | Company Information

Open Preservation Foundation

All content on this website is licensed under
CC BY-SA 4.0 unless stated otherwise.