The problem
We have a large volume of content on floppy disks that we know are degrading but which we don't know the value of.
Considerations
- We don't want to waste time/resources on low-value content.
- We don't know the value of the content.
- We want to be able to back up the content on the disks to ensure it doesn't degrade any more than it already has.
- Using unskilled students to do the work is cost-effective.
- Unskilled students have often never seen "floppy" disks, let alone can distinguish between different formats of floppy disk. So we need a solution that doesn't require them to differentiate (e.g. between apple formats, PC formats, Amiga, etc).
Solution
- Make KryoFlux stream files using the KryoFlux hardware and software.
- Use the KryoFlux software to create every variant of disk image from those streams
- Use the mount program on Linux to mount each disk image using each variant of file system parameter.
- Keep the disk images that can mount in Linux (as that ability implies that they are the right format).
Very rough beginnings of a program to perform the automatic format identification using the KryoFlux software and Mount are available here.
Issues with the solution
- When you use the KryoFlux to create raw stream files it only seems to do one pass of each sector. Whereas when you specify the format it will try to re-read sectors that it identifies as "bad sectors" in the first pass. This can lead to it successfully reading those sectors when it otherwise wouldn't. So using the KryoFlux stream files may not lead to as much successful content preservation as you would get if you specified the format of the disk before beginning the imaging process. I'm trying to find out whether using "multiple" in the output options in the KryoFlux software might help with this
- Mount doesn't mount all file-systems – though as this is improved in the future the process could be re-run
- Mount can give false positives
- I don't know whether there is a difference between disk images created with Kroflux using many of the optional parameters or using the defaults. For example there doesn't appear to be a difference in mount-ability of disk images created where the number of sides is specified or disk images when it is not and defaults to both sides (for e.g. MFM images the results of both seem to mount successfully).
- Keeping the raw streams is costly. A disk image for a 1.44mb floppy is ~1.44mb. The stream files are in the 10s of MBs
Other observations:
- It might be worth developing signatures for use in e.g. DROID to identify the format of the stream files directly in the future. Some e.g. emulators can directly interact with the stream files already I believe.
- The stream files might provide a way of over-coming bad-sector based copy protection, (e.g. the copy protection used in Lotus 1-2-3 and Lotus Jazz) by enabling the use of raw stream files (which -i believe- contain the "bad" sectors as well as good) in emulators
Thoughts/feedback appreciated
ecochrane
July 17, 2014 @ 2:15 pm CEST
The Kryoflux team got in contact with me after reading this post and gave the following really useful feedback:
"One thing that comes to mind: Yes, KF can not redump a badly dumped track if it does not know about it. But you can supply a format that will be used as a guide, e.g. for PC disks you would dump RAW and IMG (-i4; MFM) at the same time. You can add as many formats as you like (suspect), and then dump against these. If one of them matches, and the data read is bad, it will force another read of the track. You can omit file output, like this:
A problem that you might encounter is if one of the formats accidentally decodes as something else partially, obviously resulting in a bad track.
This is most likely to happen with copy-protected C64 formats though, so usually should not be an issue for you.
Another possible problem is 40 vs 80 track written disks.
40 track disks often have crosstalk between the tracks partially or sometimes even fully readable, depending on the alignment.
This is data that you do not want normally (but it’s quite common protection on e.g. C64 games).
Again, a partially bad read of such a track containing crosstalk would trigger a re-read, as it is bad track as far as the software is concerned.
One possible way to try and find out whether it is crosstalk or actual track data is to see the track numbers coming up from the decoded data.
68.0 : CBM DOS: OK*, trk: 035[034], sec: 17, *T
While this is protection, it still demonstrates the information relayed to the user.
Here track 68.0 is the physical track being read (in 40 track mode due to user request), the expected track number encoded would be 35 (CBM disks start with track 1, not 0 like FM or MFM), but the track number encoded in the data is actually 34. Hence you get a *T warning, which means the track number found does not match the expected value.
A 40 track disk read in 80 track mode (default –k1 parameter and 80 track drive) would look like this regarding track numbering:
00.0: 000
01.0: <unformatted, rubbish, or 0>
02.0: 002[001]
3: <unformatted, rubbish, or 1>
…
You will start to get *T warnings from track 2, as the expected number would be 2, but the number found would be 1.
Again, this information is only useful assuming the disk is not copy-protected, since anything goes in a copy-protected disk.
A 80 track disk read in 40 track mode (40 track drive or –k2 parameter) would behave the opposite of the above:
00.0: 000
01.0: 002[001]
02.0: 004[002]
…
Again, this will give you a *T warning, but you won’t get errors due to bad tracks quite simply because the track data itself is correct.
You normally want to read all the disk with a 80 track drive for simplicity if possible.
A 40 track disk read in 40 track mode (using –k2 parameter or a 40 track drive) or a 80 track disk read in 80 track mode (default, –k1 and 80 track drive) will always have track numbers matching their expected values, hence you won’t get *T warnings and [mismatched track number values] ."