Digital preservation: Difference between revisions
imported>Frank McCown (Initial article brought in from Wikipedia) |
imported>Frank McCown (Major changes to slim it down- note that I wrote most of the content in this article when it was on Wikipedia) |
||
Line 1: | Line 1: | ||
'''Digital preservation''' is defined as the set of processes and activities that ensure long-term, error-free storage of digital information, with means for retrieval and interpretation, for as long as the information is required. | |||
==Challenges== | |||
Jeff Rothenberg once wrote:<ref>{{cite journal | author = Rothenberg, Jeff | year = 1995 | title = Ensuring the Longevity of Digital Documents | journal = Scientific American | volume = 272 | issue = 1}}</ref><ref>{{cite journal | author = Rothenberg, Jeff | year = 1999 | title = [http://www.clir.org/pubs/archives/ensuring.pdf Ensuring the Longevity of Digital Information]}} Expanded version of ''Ensuring the Longevity of Digital Documents''.</ref> | |||
Digital | <blockquote>"Digital information lasts forever—or five years, whichever comes first." </blockquote> | ||
[[Preservation]] of digital information is widely considered to require more constant and ongoing attention than preservation of other media.<ref>{{cite web |url=http://eprints.ucl.ac.uk/archive/00001854/|title=Lifecycle Information for E-literature |accessdate=2007-06-14 |publisher=[http://www.ucl.ac.uk/ls/life/ LIFE]}}</ref> This constant input of effort, time, and money to handle rapid technological and organizational advance is considered the main stumbling block for preserving digital information. While we are still able to read our written heritage from several thousand years ago, the digital information created merely a decade ago is in serious danger of being lost. | |||
==Strategies== | ==Strategies== | ||
There are several strategies which individuals and organizations may use to combat the loss of digital information. | There are several strategies which individuals and organizations may use to combat the loss of digital information:<ref>{{cite journal | author = Garrett, J., D. Waters, H. Gladney, P. Andre, H. Besser, N. Elkington, H. Gladney, M. Hedstrom, P. Hirtle, K. Hunter, R. Kelly, D. Kresh, M. Lesk, M. Levering, W. Lougee, C. Lynch, C. Mandel, S. Mooney, A. Okerson, J. Neal, S. Rosenblatt, and S. Weibe| year = 1996 | title = Preserving digital information: Report of the task force on archiving of digital information | journal = Commission on Preservation and Access and the Research Libraries Group | url=http://www.rlg.org/legacy/ftpd/pub/archtf/final-report.pdf}}</ref> | ||
===Refreshing=== | ===Refreshing=== | ||
Line 17: | Line 19: | ||
===Migration=== | ===Migration=== | ||
''Migration'' is the transferring of data to newer system environments | ''Migration'' is the transferring of data to newer system environments. This may include conversion of resources from one format to another (e.g., conversion of [[Microsoft Word]] to [[PDF]] or [[OpenDocument]]), from one operating system to another (e.g., [[Solaris Operating Environment|Solaris]] to [[Linux]]) or from one [[programming language]] to another (e.g., [[C Programming Language|C]] to [[Java programming language|Java]]) so the resource remains fully accessible and functional. Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats. | ||
===Replication=== | ===Replication=== | ||
Line 25: | Line 27: | ||
===Emulation=== | ===Emulation=== | ||
''Emulation'' is the replicating of functionality of an obsolete system | ''Emulation'' is the replicating of functionality of an obsolete system<ref>{{cite book | first = Jeff | last = Rothenberg | year = 1998 | id = ISBN 1-887334-63-7 | title = [http://www.clir.org/PUBS/reports/rothenberg/contents.html Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation] | publisher = Council on Library and Information Resources | location = Washington, DC, USA }}</ref>. For example, emulating an [[Atari 2600]] on a [[Microsoft Windows|Windows]] system or emulating [[WordPerfect|WordPerfect 1.0]] on a [[Apple Macintosh|Macintosh]]. [[Emulator]]s may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems. The feasibility of emulation as a catch-all solution has been debated in the academic community<ref>{{cite journal | author = Granger, Stewart | year = 2000 | title = Emulation as a Digital Preservation Strategy | journal = D-Lib Magazine | volume = 6 | issue = 10 | url=http://www.dlib.org/dlib/october00/granger/10granger.html}}</ref>. | ||
Raymond A. Lorie has suggested a [[Universal Virtual Computer]] (UVC) could be used to run any software in the future on a yet unknown platform | Raymond A. Lorie has suggested a [[Universal Virtual Computer]] (UVC) could be used to run any software in the future on a yet unknown platform<ref>{{cite conference | author = Lorie, Raymond A.| year = 2001 | title = [http://doi.acm.org/10.1145/379437.379726 Long Term Preservation of Digital Information] | booktitle = Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01) | location = Roanoke, Virginia, USA | pages = 346-352}}</ref>. The UVC strategy uses a combination of emulation and migration, but it has not yet been widely adopted by the digital preservation community. | ||
===Trustworthy digital objects=== | ===Trustworthy digital objects=== | ||
[[digital object|Digital objects]] that can speak to their own authenticity are called ''trustworthy digital objects'' (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic | [[digital object|Digital objects]] that can speak to their own authenticity are called ''trustworthy digital objects'' (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic<ref>{{cite journal | author = Gladney, H. M. | year = 2004 | title = Trustworthy 100-year digital objects: Evidence after every witness is dead | journal = ACM Transactions on Information Systems | pages = 406-436 | volume = 22 | issue = 3 | url=http://doi.acm.org/10.1145/1010614.1010617}}</ref>. Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs. | ||
== Examples of digital preservation initiatives == | == Examples of digital preservation initiatives == | ||
Line 52: | Line 54: | ||
* [[Digital curation]] | * [[Digital curation]] | ||
* [[Digital obsolescence]] | * [[Digital obsolescence]] | ||
* [[File format]] | * [[File format]] | ||
* [[Universal Virtual Computer]] | * [[Universal Virtual Computer]] | ||
* [[Web archiving]] | * [[Web archiving]] | ||
==References== | ==References== | ||
{{reflist|2}} | |||
Revision as of 17:07, 19 July 2007
Digital preservation is defined as the set of processes and activities that ensure long-term, error-free storage of digital information, with means for retrieval and interpretation, for as long as the information is required.
Challenges
Jeff Rothenberg once wrote:[1][2]
"Digital information lasts forever—or five years, whichever comes first."
Preservation of digital information is widely considered to require more constant and ongoing attention than preservation of other media.[3] This constant input of effort, time, and money to handle rapid technological and organizational advance is considered the main stumbling block for preserving digital information. While we are still able to read our written heritage from several thousand years ago, the digital information created merely a decade ago is in serious danger of being lost.
Strategies
There are several strategies which individuals and organizations may use to combat the loss of digital information:[4]
Refreshing
Refreshing is the copying of data onto newer media or systems. For example, transferring census data from an old tape to a new one or transferring an MP3 from a hard drive to CD. This strategy may need to be combined with migration when the software or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing will likely always be necessary due to the deterioration of physical media.
Migration
Migration is the transferring of data to newer system environments. This may include conversion of resources from one format to another (e.g., conversion of Microsoft Word to PDF or OpenDocument), from one operating system to another (e.g., Solaris to Linux) or from one programming language to another (e.g., C to Java) so the resource remains fully accessible and functional. Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.
Replication
Creating duplicate copies of data on one or more systems is called replication. Data that exists as a single copy in only one location is highly vulnerable to software or hardware failure, intentional or accidental alteration, and environmental catastrophes like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Replicated data may introduce difficulties in refreshing, migration, versioning, and access control since the data is located in multiple places.
Emulation
Emulation is the replicating of functionality of an obsolete system[5]. For example, emulating an Atari 2600 on a Windows system or emulating WordPerfect 1.0 on a Macintosh. Emulators may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems. The feasibility of emulation as a catch-all solution has been debated in the academic community[6].
Raymond A. Lorie has suggested a Universal Virtual Computer (UVC) could be used to run any software in the future on a yet unknown platform[7]. The UVC strategy uses a combination of emulation and migration, but it has not yet been widely adopted by the digital preservation community.
Trustworthy digital objects
Digital objects that can speak to their own authenticity are called trustworthy digital objects (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic[8]. Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.
Examples of digital preservation initiatives
- National Digital Information Infrastructure and Preservation Program. The Library of Congress's National Digital Information Infrastructure and Preservation Program (NDIIPP) is dedicated to ensuring that the digital information that conveys our history and heritage is available and accessible for generations to come. As a pioneer in the field of digital information, the Library has continued to provide digitized access to its vast collections, especially through sites such as American Memory, America's Library, and Exhibits.
- Portico. Portico, originally launched by JSTOR in 2002, is an electronic archiving service which provides "a permanent archive of electronic scholarly journals".
- FDsys. FDsys is system being developed by the United States Government Printing Office to authenticate, preserve, and provide access to government information from all three branches of the Federal government.
- Elsevier Science digital archive. In 2002, the Koninklijke Bibliotheek became the official digital archive for 7 terabytes of Elsevier Science journals.
- LOCKSS. The LOCKSS Program ("Lots Of Copies Keep Stuff Safe"), under the auspices of Stanford University, develops and supports open-source software for digital preservation based on a distributed network of preservation appliances running a sophisticated voting protocol. Originally designed to preserve scholarly journals, the LOCKSS technology is now being used to preserve electronic theses and dissertations, government documents, books, blogs, websites, image collections, etc. The LOCKSS Program also runs its own preservation network ([1], [2]).
- MetaArchive Project. Six universities (Emory University, the Georgia Institute of Technology, the Virginia Polytechnic Institute and State University, Florida State University, Auburn University and the University of Louisville) and the Library of Congress are developing "a cooperative for the preservation of at-risk digital content [about] the culture and history of the American South" in a private LOCKSS network.
- ASERL ETDs. Eight universities of the Association of Southeastern Research Libraries (Florida State University, the Georgia Institute of Technology, North Carolina State University, the University of Kentucky, the University of Miami, the University of Tennessee, Vanderbilt University and the Virginia Polytechnic Institute and State University) are preserving each other's collections of electronic theses and dissertations (ETDs) in a private LOCKSS network.
- GPO LOCKSS Pilot. The Government Printing Office conducted a pilot program to "manage, disseminate, and preserve access to Web-based Federal Government e-journals that are within the scope of the FDLP and IES" (Federal Depository Library Program and International Exchange Service), using LOCKSS technology. Pilot participants included 18 universities, the German National Library, the United States National Agricultural Library and the Government Printing Office ([3]).
- Alaska State Publications Program. To continue complying with its obligations under Alaska state statutes to "make state publications freely available to Alaskans by distributing them to local depository libraries", the Alaska State Library is expanding its depository program to preserve Alaska State publications that are Web-only ([4]) by making them accessible to LOCKSS collection ([5]).
- CLOCKSS. The CLOCKSS ("Controlled LOCKSS") is "a not-for-profit community partnership among publishers and libraries that is developing a distributed, validated, comprehensive archive that preserves and ensures continuing access to electronic scholarly content" using a private LOCKSS network. It mobilizes the resources of twelve large publishers (American Chemical Society, American Medical Association, American Physiological Society, Blackwell Publishing, Elsevier, Institute of Physics, Nature Publishing Group, Oxford University Press, SAGE Publications, Springer Science+Business Media, Taylor and Francis and John Wiley & Sons) and seven institutions (Indiana University, the New York Public Library, the OCLC, Rice University, Stanford University, the University of Virginia and the University of Edinburgh).
- New media art preservation. Arts organizations (including the Solomon R. Guggenheim Museum, the Berkeley Art Museum, the Daniel Langlois Foundation for Art, Science and Technology, the New Museum of Contemporary Art's's Rhizome.org [6] and the Franklin Furnace Archive, amongst others) have been collaborating on various initiatives in the research of New media art preservation. Such initiatives include the Variable Media Network [7] and the Arching the Avant Garde project [8].
- NDHA.The National Digital Heritage Archive (NDHA) Programme is a partnership between the National Library of New Zealand, Ex Libris Group and Sun Microsystems to develop a digital archive and preservation management system.
See also
- Data Format Management
- Digital curation
- Digital obsolescence
- File format
- Universal Virtual Computer
- Web archiving
References
- ↑ Rothenberg, Jeff (1995). "Ensuring the Longevity of Digital Documents". Scientific American 272 (1).
- ↑ Rothenberg, Jeff (1999). "Ensuring the Longevity of Digital Information". Expanded version of Ensuring the Longevity of Digital Documents.
- ↑ Lifecycle Information for E-literature. LIFE. Retrieved on 2007-06-14.
- ↑ Garrett, J., D. Waters, H. Gladney, P. Andre, H. Besser, N. Elkington, H. Gladney, M. Hedstrom, P. Hirtle, K. Hunter, R. Kelly, D. Kresh, M. Lesk, M. Levering, W. Lougee, C. Lynch, C. Mandel, S. Mooney, A. Okerson, J. Neal, S. Rosenblatt, and S. Weibe (1996). "Preserving digital information: Report of the task force on archiving of digital information". Commission on Preservation and Access and the Research Libraries Group.
- ↑ Rothenberg, Jeff (1998). Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Washington, DC, USA: Council on Library and Information Resources. ISBN 1-887334-63-7.
- ↑ Granger, Stewart (2000). "Emulation as a Digital Preservation Strategy". D-Lib Magazine 6 (10).
- ↑ Lorie, Raymond A. (2001). "Long Term Preservation of Digital Information". Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01), 346-352.
- ↑ Gladney, H. M. (2004). "Trustworthy 100-year digital objects: Evidence after every witness is dead". ACM Transactions on Information Systems 22 (3): 406-436.