Working with a database such as PRONOM feels like a never ending task… There are so many file formats out there, so many improvements that could be made and so much data tidying up to do. Our website could be more up to date, our search functionality could be far far better and there are just so many improvements to be made. Which makes it both a tiring and exciting project to work on. That is why it is nice to sit down once a year and write down all the things that everyone, not just the staff at The National Archives (UK), has achieved towards this project.
There is a lot here to be proud of, and mostly from the amazing community that supports PRONOM.
First Some Statistics…
This year we had one major release at the end of April followed by a smaller update in July. This is less than usual but we are currently focussing on some infrastructure upgrades so hope to increase the frequency next year. There were a total of 64 new PUIDs, 79 updates and 73 new signatures. As can be seen in our release notes.
As many people may already be aware, each file format entry in PRONOM has a PUID (persistent unique identifier) which normally starts with the code fmt (for format). It was in April this year we reached entry fmt/2000, which we assigned to the Husqvarna Embroidery Stitch Format. A HUS file is an embroidery design format found on floppy disks storing embroidery patterns for sewing machines. It was created by the company Husqvarna to work with their specific brand of sewing machine. The Husqvarna Embroidery Stitch format is placed next to four other embroidery formats. Reaching a milestone PUID is often an excuse to work on formats that wouldn’t normally fall in our remit and it was a lot of fun learning about embroidery formats. In 2017 it was the TZX Format (a cassette image format often associated with 80s video games) that got the limelight. The embroidery stitch formats were all researched and submitted by Tyler Thorsted, one of the most prolific file format researchers we know, and you can read more about them in Tyler’s blog. Stay tuned for fmt/3000 coming to you in maybe 6-8 years time.
Finally speaking of statistics Ross Spencer created a web page specifically for PRONOM stats. It contains figures on what the latest release is and how many file formats are in PRONOM. Not only that, but for anyone unsure of where to start with file format research it has statistics on which formats still require signatures or descriptions.
As well as this, Ross went into a deep dive of PRONOM’s data in this blog for PRONOM research week: ‘PRONOM’s Dustiest Records’. It is a very well explained and interesting look at past PRONOM data. When you have a database that multiple people have worked on over 22 years then there will be inconsistencies. Do we come to any conclusions about what to do about these inconsistencies?Yes for a few things. But there is still lots to think about!
PRONOM Goes to iPres
The September 2024 iPres was held in Ghent, and was a wonderful event with the opportunity to meet and collaborate with digital preservation colleagues across the world. David Clipsham, Preservica, had the idea to host a workshop with Ross Spencer, Tyler Thorsted, Brigham Young University myself at The National Archives (UK),. The workshop was called ‘What’s in the Box: Container-based File Format Identification’
Whilst previously we have done workshops that have been an entry level guide to file format research, this one was pitched at a more complicated topic. Container signatures. These are file formats that while they may appear to be one singular entity are actually files within files. Therefore when researching them for the first time you have to be aware of their structure. Many files that are containers may surprise you, for example if you use the Microsoft suite, a word document is a container file with many different files inside. To learn more about container signatures I would recommend this blog post explaining DROID container signature files.
Due to the complications of the topic, the workshop was really hard to pitch at the right level for everyone. Some people were new to file format research, whereas others had years of experience and lots of previous technical ability. Despite this there was positive feedback and the workshop was incredibly popular. The numbers were so large that we were placed in a very grand room with tapestries and a huge chandelier. Due to the popularity, we are hoping at some point to repeat the workshop for those who could not attend iPres in person. Possibly expanding the time so participants can spend longer on the interactive exercises and discussions.
PRONOM Research Week
This year PRONOM Research Week was a huge success, and a big thank you to everyone who participated. The week kicked off at our fortnightly PRONOM Open Drop-In session that happened to fall on World Digital Preservation Day. This was far more popular than normal and it was amazing to see so many new faces. As usual we allowed for multiple ways of contributing to the week.
Starting with our PRONOM Descriptathon! This is a chance to help us fill out any file format entries that do not yet have a description and can be contributed to all year round, of course! However during PRONOM Research Week we got 20 new descriptions from the Getty, Library of Congress, Duke University Libraries, the Digital Research Alliance of Canada- FRDR and Brigham Young University.
We had lots of submissions via our GitHub account and also in the PRONOM inbox for that week. Many people had heard about the week via Mastodon which was really interesting to learn. There were about 20 formats submitted and we are looking forward to researching these further, crediting those involved, and publishing them on the website.
In addition to all this, we got some pull requests on our GitHub to improve the repository and many people used the week to discuss new ideas and improvements to make to PRONOM. There was a longer discussion on Mastodon about open source icons and some data reviews
We will be publishing as many of the PRONOM Research Week submissions as possible in our next release which should be around the start of 2025.
The PRONOM GitHub
We are constantly trying to improve our GitHub repository. One of our goals is to make this a hub for community resources and ideas.
One way of doing this has been to convert our starter pack into markdown. We are hoping that by moving away from formats such as pdf we can encourage pull requests on our guides to make our resources as beneficial to everyone as possible. We are a small team and while we have some expert knowledge it is not in any way definitive. If this works well, we are hoping to produce more of our guidance in markdown or easily editable formats.
We are also trialing GitHub discussions. To start with, we will move some of our longer standing issues to discussion topics. We are also hoping to put other questions up there in the future.
PRONOM Translated
We are really excited to announce that the PRONOM Starter Pack has now been translated into Spanish. Paquete de inicio is now available in markdown on the PRONOM GitHub pages and would love everyone to be able to share this with anyone who might benefit from this work. Full credit for this should go to Jhon Gonzalez who volunteered his time to translate this resource for the Spanish speaking community. Please share with all your Spanish speaking friends!!!!!
Exciting Plans for Next Year…
This is always really difficult. Priorities often change throughout the year but we are hoping to be able to increase and update public documentation we provide and create more community resources. We are hoping that the GitHub page can become a hub to link to, or hold as many PRONOM resources as possible. We would like to make the resources as easy to follow and comprehensive as possible so please do suggest any improvements that we can make.
We also want to continue our work on the PRONOM descriptions and cleaning up of our past data whilst creating more PUIDs.
Thank you!
The following institutions or people in no particular order all contributed to PRONOM releases this year:
Brigham Young University, Library of Congress, Buddhist Digital Resource Center, Igor Nechaev, Institute of History of Charles University and Archive of Charles University, J. Paul Getty Trust, National Archives and Records Association, National Archives Australia, National Library of Australia, Preservica, Ross Spencer, The Church of Jesus Christ of Latter-day Saints, The Endangered Language Archive, Archaeological Data Service, Archives nationales (French National Archives), Archives New Zealand, Bibliothèque nationale de France, Digital Research Alliance of Canada – Federated Research Data Repository / Alliance de recherche numérique du Canada – Dépôt fédéré de données de recherche, ETH LIbrary and R74n.
Special thanks as well to the Open Preservation Foundation and the Digital Preservation Coalition for all the work they do promoting the project. This ranges from providing a platform for blog posts, giving advice and support and sharing news every time there is a release.
As always this is just a small fraction of the work and help we have received this year. There are many more entries to be published and many many more people we need to credit for hard work in future releases.