SF Technotes

Information Entropy

By Michael Castelluccio
November 22, 2022
0 comments

Wander around in a recent news cycle, and you’ll likely find any number of new ingenious ways people have added to our burgeoning store of data about ourselves and our environs.

 

We’re carpeting the planet with data, yet as reassuring as this accumulation might seem, we haven’t yet solved the problem of information entropy (the gradual decline into disorder). Over time, just as ink on paper fades, the saved digital impressions on discs degrade, even when you make the reflective layer on your CDs out of something as stable as gold. Kodak did that with their Preservation CDs in 2008 but were only able to promise a 100-year safe life for your information before oxidation caused failure—not even as durable as the oak-gall inks and parchment on Ben Franklin’s desk.

 

We’ve never been better able to save and archive almost any kind of information, from print, to paintings, to performances. Science saves procedures and studies, history stores the primary sources that are the culture’s record, and commerce logs assets and liabilities. And this is all now accomplished with the speed of electronic files that require microscopic storage space. You can carry a university library around in your back pocket, but like the fated Great Library at Alexandria, which held the current world’s accumulated knowledge, the information still could be lost in a sudden catastrophic event or it will eventually and inevitably fade on its own.

 

THREE PROBLEMS

 

In a news roundup on the last Wednesday in October 2022, an interesting article titled “Everything dies, including information” was posted by Erik Sherman on MIT’s technologyreview.com. After introducing the idea that, like people, even information has a life span, Sherman complained, “Surely, we’re at a stage technologically where we might devise ways to make knowledge available and accessible forever.”

 

To last forever, however, you’d need hardware formats, software to run the hardware, and systems to search the archives that would all be permanent. Remember floppy disks, the spreadsheet program Quattro, or the search engine AltaVista? Each were good, but definitely not permanent. Sherman points out, “Digital storage systems can become unreadable in as little as three to five years. Librarians and archivists race to copy things over to newer formats. But entropy is always there, waiting in the wings.”

 

That race to maintain and continually update a functional information system is becoming more difficult as the digital world grows. Sherman cites an estimate from the market research firm IDC that claims the amount of data generated by companies, governments, and individuals in the next few years will equal twice “the total of all digital data generated previously since the start of the computing age.” That sounds like a negative Moore’s Law for data.

 

To illustrate the problem of impermanent formats, Sherman describes a situation encountered by NASA. The space agency had stored about 170 tapes of data on lunar dust collected over the Apollo Era (1961–1972). In the mid-2000s, the scientists took out the tapes to review the findings, but they didn’t have a 1960s IBM 729 Mark 5 machine to read the tapes. After an extensive search, they located one at the warehouse of the Australian Computer Museum. It was in poor condition and had to be refurbished before it could be used.

 

 

Like hardware, software also shows up with undisclosed use-by dates. New programs will nudge the established ones aside or older applications will discontinue support and then fade away. Sherman notes two experiments designed to keep old programs viable, a project from 2015 called the Open Library of Images for Virtualized Execution (Olive) archive and the Internet Archive’s Wayback Machine. The Olive Executable Archive describes itself as “a collaborative project seeking to establish a robust ecosystem for long-term preservation of software, games, and other executable content. Born at Carnegie Melon University, Olive addresses the current gap in preservation technology by providing a curated environment for the preservation and distribution of executable content.” You can visit olivearchive.org and fire up Microsoft Office 4.3 or play some Oregon Trail 1.1, but the offerings aren’t very extensive. The Wayback Machine at archive.org, on the other hand, offers 761 billion web pages, an arcade of games from the 1970s through to the 1990s, films, recordings, and more. There are more than 700,000 programs there, some of which can be run in emulation on your browser. See the October 2021 TechNotes, “The Library of Everything” in Strategic Finance for more specifics on this unusual archive.

 

PERMANENCE

 

The notion that information is ephemeral seems more the fault of our devices and software than the content or value of the information. But since we aren’t likely to have enduring formats and programs in our near future, there are procedures that might help.

 

By standardizing the formatting of text, you can keep one important element from changing and require new software and hardware to be able to read text in that standard. The Text Encoding Initiative (TEI) is “a text-centric community of practice in the academic field of digital humanities. The TEI Guidelines collectively define a type of XML format and are the defining output of the community of practices (Wikipedia).” The applications might be limited to a field, but the format has been their standard continuously since the 1980s.

 

 

A second approach could include an expansion of individual rights regarding data. The MyData organization of Helsinki Finland has as its mission, “To empower individuals by improving their right to self-determination regarding their personal data.” Teemu Ropponen at MyData explains, “We should have real-time rights, for example to ask for data deletion, data download, or data portability—to take the data from one source to another.” Ropponen says there’s already an effort within the EU to enshrine data portability in the laws. To have a legal say in how your data is stored, used, deleted, or kept would help stabilize our personal data, for a start.

 

And from another direction there’s the suggestion from Paul Royster, Coordinator of Scholarly Communications at the University of Nebraska, about taking some personal responsibility in deciding what data is most important to us and what we should save. Royster would memorialize the effort by setting aside one day of the year “when we all go through our data—Data Preservation Day.”

 

Perhaps we should also keep in mind that every time we hit the “Save” key that the copy is made by electrons in an unstable environment on a drive that could crash or begin degrading. If the information is important, back it up, and keep track of where you have the copies. You might even print a copy or two. Ink has been a serviceable medium for about 4,500 years now and the hard copies, spared from fire or flood, will probably outlive all of us.

 



Michael Castelluccio has been the technology editor for Strategic Finance for 26 years. His SF TechNotes blog is in its 23rd year. You can contact Mike at mcastelluccio@imanet.org.


0 No Comments