Digital preservation: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Frank McCown
(Initial article brought in from Wikipedia)
 
imported>Frank McCown
(Major changes to slim it down- note that I wrote most of the content in this article when it was on Wikipedia)
Line 1: Line 1:
<div class="messagebox cleanup">Some of the information in this {{{1|article or section}}} has '''not been [[Wikipedia:Verifiability|verified]]''' and might not be reliable. It should be checked for inaccuracies and modified as needed, '''[[Wikipedia:cite sources|citing sources]]'''.</div>
'''Digital preservation''' is defined as the set of processes and activities that ensure long-term, error-free storage of digital information, with means for retrieval and interpretation, for as long as the information is required.  


'''Digital preservation''' refers to the management of [[digital]] information over time. [[Preservation]] of digital information is widely considered to require more constant and ongoing attention than preservation of other media.<ref>{{cite web |url=http://eprints.ucl.ac.uk/archive/00001854/|title=Lifecycle Information for E-literature |accessdate=2007-06-14 |publisher=[http://www.ucl.ac.uk/ls/life/ LIFE]}}</ref> This constant input of effort, time, and money to handle rapid technological and organisational advance is considered the main stumbling block for preserving digital information. Indeed, while we are still able to read our written heritage from several thousand years ago, the digital information created merely a decade ago is in serious danger of being lost.
==Challenges==


Digital preservation can therefore be seen as the set of processes and activities that ensure the continued access to information and all kinds of records, scientific and cultural heritage existing in digital formats.In the language of digital imaging and electronic resources, preservation is no longer just the product of a program but an ongoing process. In this regard the way digital information is stored is important in ensuring their longevity.
Jeff Rothenberg once wrote:<ref>{{cite journal | author = Rothenberg, Jeff | year = 1995 | title = Ensuring the Longevity of Digital Documents | journal = Scientific American | volume = 272 | issue = 1}}</ref><ref>{{cite journal | author = Rothenberg, Jeff | year = 1999 | title = [http://www.clir.org/pubs/archives/ensuring.pdf Ensuring the Longevity of Digital Information]}} Expanded version of ''Ensuring the Longevity of Digital Documents''.</ref>


Digital preservation is defined as: long-term, error-free storage of digital information, with means for retrieval and interpretation, for all the time span that the information is required for. "Retrieval" means obtaining needed digital files from the long-term, error-free digital storage, without possibility of corrupting the continued error-free storage of the digital files. "Interpretation" means that the retrieved digital files, files that, for example, are of texts, charts, images or sounds, are decoded and transformed into usable representations. This is often interpreted as "rendering", i.e. making it available for a human to access. However, in many cases it will mean able to be processed by computational means.
<blockquote>"Digital information lasts forever—or five years, whichever comes first." </blockquote>
 
[[Preservation]] of digital information is widely considered to require more constant and ongoing attention than preservation of other media.<ref>{{cite web |url=http://eprints.ucl.ac.uk/archive/00001854/|title=Lifecycle Information for E-literature |accessdate=2007-06-14 |publisher=[http://www.ucl.ac.uk/ls/life/ LIFE]}}</ref> This constant input of effort, time, and money to handle rapid technological and organizational advance is considered the main stumbling block for preserving digital information. While we are still able to read our written heritage from several thousand years ago, the digital information created merely a decade ago is in serious danger of being lost.


==Strategies==
==Strategies==


There are several strategies which individuals and organizations may use to combat the loss of digital information.
There are several strategies which individuals and organizations may use to combat the loss of digital information:<ref>{{cite journal | author = Garrett, J., D. Waters, H. Gladney, P. Andre, H. Besser, N. Elkington, H. Gladney, M. Hedstrom, P. Hirtle, K. Hunter, R. Kelly, D. Kresh, M. Lesk, M. Levering, W. Lougee, C. Lynch, C. Mandel, S. Mooney, A. Okerson, J. Neal, S. Rosenblatt, and S. Weibe| year = 1996 | title = Preserving digital information: Report of the task force on archiving of digital information | journal = Commission on Preservation and Access and the Research Libraries Group |  url=http://www.rlg.org/legacy/ftpd/pub/archtf/final-report.pdf}}</ref>


===Refreshing===
===Refreshing===
Line 17: Line 19:
===Migration===
===Migration===


''Migration'' is the transferring of data to newer system environments (Garrett et al., 1996). This may include conversion of resources from one format to another (e.g., conversion of [[Microsoft Word]] to [[PDF]] or [[OpenDocument]]), from one operating system to another (e.g., [[Solaris Operating Environment|Solaris]] to [[Linux]]) or from one [[programming language]] to another (e.g., [[C Programming Language|C]] to [[Java (programming language)|Java]]) so the resource remains fully accessible and functional.  Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.
''Migration'' is the transferring of data to newer system environments. This may include conversion of resources from one format to another (e.g., conversion of [[Microsoft Word]] to [[PDF]] or [[OpenDocument]]), from one operating system to another (e.g., [[Solaris Operating Environment|Solaris]] to [[Linux]]) or from one [[programming language]] to another (e.g., [[C Programming Language|C]] to [[Java programming language|Java]]) so the resource remains fully accessible and functional.  Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.


===Replication===
===Replication===
Line 25: Line 27:
===Emulation===
===Emulation===


''Emulation'' is the replicating of functionality of an obsolete system (Rothenberg, 1998).  For example, emulating an [[Atari 2600]] on a [[Microsoft Windows|Windows]] system or emulating [[WordPerfect|WordPerfect 1.0]] on a [[Apple Macintosh|Macintosh]].  [[Emulator]]s may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems. The feasibility of emulation as a catch-all solution has been debated in the academic community (Granger, 2000).
''Emulation'' is the replicating of functionality of an obsolete system<ref>{{cite book | first = Jeff | last = Rothenberg | year = 1998 | id = ISBN 1-887334-63-7 | title = [http://www.clir.org/PUBS/reports/rothenberg/contents.html Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation] | publisher = Council on Library and Information Resources | location = Washington, DC, USA }}</ref>.  For example, emulating an [[Atari 2600]] on a [[Microsoft Windows|Windows]] system or emulating [[WordPerfect|WordPerfect 1.0]] on a [[Apple Macintosh|Macintosh]].  [[Emulator]]s may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems. The feasibility of emulation as a catch-all solution has been debated in the academic community<ref>{{cite journal | author = Granger, Stewart | year = 2000 | title = Emulation as a Digital Preservation Strategy | journal = D-Lib Magazine | volume = 6 | issue = 10 | url=http://www.dlib.org/dlib/october00/granger/10granger.html}}</ref>.


Raymond A. Lorie has suggested a [[Universal Virtual Computer]] (UVC) could be used to run any software in the future on a yet unknown platform (Lorie, 2001).  The UVC strategy uses a combination of emulation and migration.  The UVC strategy has not yet been widely adopted by the digital preservation community.
Raymond A. Lorie has suggested a [[Universal Virtual Computer]] (UVC) could be used to run any software in the future on a yet unknown platform<ref>{{cite conference | author = Lorie, Raymond A.| year = 2001 | title = [http://doi.acm.org/10.1145/379437.379726 Long Term Preservation of Digital Information] | booktitle = Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01) | location = Roanoke, Virginia, USA | pages = 346-352}}</ref>.  The UVC strategy uses a combination of emulation and migration, but it has not yet been widely adopted by the digital preservation community.


===Trustworthy digital objects===
===Trustworthy digital objects===


[[digital object|Digital objects]] that can speak to their own authenticity are called ''trustworthy digital objects'' (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic (Gladney, 2004). Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.
[[digital object|Digital objects]] that can speak to their own authenticity are called ''trustworthy digital objects'' (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic<ref>{{cite journal | author = Gladney, H. M. | year = 2004 | title = Trustworthy 100-year digital objects: Evidence after every witness is dead | journal = ACM Transactions on Information Systems | pages = 406-436 | volume = 22 | issue = 3 | url=http://doi.acm.org/10.1145/1010614.1010617}}</ref>. Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.


== Examples of digital preservation initiatives ==
== Examples of digital preservation initiatives ==
Line 52: Line 54:
* [[Digital curation]]
* [[Digital curation]]
* [[Digital obsolescence]]
* [[Digital obsolescence]]
* [[Digital object identifier|Digital object identifier]]
* [[Enterprise content management]]
* [[File format]]
* [[File format]]
* [[Library of Congress Digital Library project]]
* [[Metadata]]
* [[National Digital Information Infrastructure and Preservation Program]]
* [[New media art preservation]]
* [[Universal Virtual Computer]]
* [[Universal Virtual Computer]]
* [[Web archiving]]
* [[Web archiving]]
* [[Web crawler]]


==References==
==References==
 
{{reflist|2}}
* {{cite journal | author = Garrett, J., D. Waters, H. Gladney, P. Andre, H. Besser, N. Elkington, H. Gladney, M. Hedstrom, P. Hirtle, K. Hunter, R. Kelly, D. Kresh, M. Lesk, M. Levering, W. Lougee, C. Lynch, C. Mandel, S. Mooney, A. Okerson, J. Neal, S. Rosenblatt, and S. Weibe| year = 1996 | title = Preserving digital information: Report of the task force on archiving of digital information | journal = Commission on Preservation and Access and the Research Libraries Group |  url=http://www.rlg.org/legacy/ftpd/pub/archtf/final-report.pdf}}
 
* {{cite journal | author = Gladney, H. M. | year = 2004 | title = Trustworthy 100-year digital objects: Evidence after every witness is dead | journal = ACM Transactions on Information Systems | pages = 406-436 | volume = 22 | issue = 3 | url=http://doi.acm.org/10.1145/1010614.1010617}}
 
* {{cite journal | author = Granger, Stewart | year = 2000 | title = Emulation as a Digital Preservation Strategy | journal = D-Lib Magazine | volume = 6 | issue = 10 | url=http://www.dlib.org/dlib/october00/granger/10granger.html}}
 
* {{cite journal | author = Hedstrom, M., Ross, S., Ashley, K., Christensen-Dalsgaard, B., Duff, W., Gladney, H., Huc, C., Kenney, A.R., Moore, R., Neuhold, E.| year = 2003 | title = [http://delos-noe.iei.pi.cnr.it/activities/internationalforum/Joint-WGs/digitalarchiving/Digitalarchiving.pdf Invest to Save: Report and Recommendations of the NSF-DELOS Working Group on Digital Archiving and Preservation] | journal = NSF/DELOS | location = Pisa & Washington DC, USA }}
 
* {{cite journal | author = Jantz, R. & Giarlo, M.J. | year = 2005 | title = Digital preservation: Architecture and technology for trusted digital repositories | journal = D-Lib Magazine | volume = 11 | issue = 6 | url=http://dx.doi.org/10.1045/june2005-jantz}}
 
* {{cite conference | author = Lorie, Raymond A.| year = 2001 | title = [http://doi.acm.org/10.1145/379437.379726 Long Term Preservation of Digital Information] | booktitle = Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01) | location = Roanoke, Virginia, USA | pages = 346-352}}
 
* {{cite book | first = S| last = Ross | year = 2000 | id = ISBN 0-7123-4717-8 | title = [http://portico.bl.uk/services/npo/pdf/wigan.pdf Changing Trains at Wigan: Digital Preservation and the Future of Scholarship] | publisher = National Preservation Office (British Library)  | location = London, UK }}
 
* {{cite book | author = Ross, S. and  Gow, A. | year = 1999 || id = ISBN 1-900508-51-6 | title = [http://www.ukoln.ac.uk/services/elib/papers/supporting/pdf/p2.pdf Digital archaeology? Rescuing Neglected or Damaged Data Resources] | publisher = British Library and Joint Information Systems Committee | location = Bristol & London | }}
 
* {{cite book | first = Jeff | last = Rothenberg | year = 1998 | id = ISBN 1-887334-63-7 | title = [http://www.clir.org/PUBS/reports/rothenberg/contents.html Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation] | publisher = Council on Library and Information Resources | location = Washington, DC, USA }}
 
* {{cite journal | author = Rothenberg, Jeff | year = 1995 | title = Ensuring the Longevity of Digital Documents | journal = Scientific American | volume = 272 | issue = 1}}
 
* {{cite journal | author = Rothenberg, Jeff | year = 1999 | title = [http://www.clir.org/pubs/archives/ensuring.pdf Ensuring the Longevity of Digital Information]}} Expanded version of ''Ensuring the Longevity of Digital Documents''.
 
==External links==
{{External links}}
* [http://www.si.umich.edu/CAMILEON/ CAMiLEON project] — Emulation research conducted at the [[University of Michigan]]  and University of Leeds in 2000–2002
* [http://www.lockss.org/ LOCKSS official site] (see also [http://www.lockss.org/clockss/ CLOCKSS official site])
* [http://www.library.cornell.edu/iris/tutorial/dpm/ Digital Preservation Tutorial] - [[Cornell University Library]]
* [http://www.dpc.delos.info/ DELOS Digital Preservation Cluster]
* [http://www.digitalpreservationeurope.eu/ DigitalPreservationEurope]
* [[Digital Preservation Coalition]] (UK). http://www.dpconline.org/
* [http://europa.eu.int/comm/secretariat_general/edoc_management/dlm_forum/ DLM-Forum] of the European Commission
* [http://www.docam.ca DOCAM - Documentation and Preservation of the Media Arts Heritage] - international research alliance on the development of new methodologies and tools to address the issues of preserving and documenting digital, technological and electronic works of art
* [http://www.erpanet.org/ ERPANET]
* [http://unesdoc.unesco.org/images/0013/001300/130071e.pdf Guidelines for the Preservation of Digital Heritage]. UNESCO, March 2003.
* [http://www.interpares.org/ International Research on Permanent Authentic Records in Electronic Systems (InterPARES)]
* [http://www.digitalpreservation.gov/ Library of Congress, National Digital Information Infrastructure and Preservation Program]
*[http://www.digitalpreservation.gov/formats/ The Library of Congress, Sustainability of Digital Formats]
* [http://www.chin.gc.ca/English/Digital_Content/ Managing and Preserving Digital Content] - Canadian Heritage Information Network (CHIN)
* [http://www.nla.gov.au/padi Preserving Access to Digital Information (PADI)]
* [http://ahds.ac.uk/preservation/ UK Arts and Humanities Data Service]
* [http://www.dcc.ac.uk/ UK Digital Curation Centre (DCC)]
* [http://www.esds.ac.uk/ UK Economic and Social Data Service (ESDS)]
* [http://www.hatii.arts.gla.ac.uk/ Humanities Advanced Technology and Information Institute (HATII)]
* [http://www.casparpreserves.eu/ CASPAR]
* [http://www.planets-project.eu/ PLANETS]
* [http://roda.iantt.pt/ RODA] — An initiative from the Portuguese National Archives that aims at creating a digital repository system capable of preserving authentic digital objects.
* [http://crib.dsi.uminho.pt/ CRIB] — A Service Oriented Architecture designed to assist cultural heritage institutions in the implementation of migration-based preservation interventions.
* [http://www.variablemedia.net/ Variable Media Network]
 
 
 
[[Category:Digital libraries]]
 
[[de:Elektronische Archivierung]]
[[fr:Archivage électronique]]

Revision as of 17:07, 19 July 2007

Digital preservation is defined as the set of processes and activities that ensure long-term, error-free storage of digital information, with means for retrieval and interpretation, for as long as the information is required.

Challenges

Jeff Rothenberg once wrote:[1][2]

"Digital information lasts forever—or five years, whichever comes first."

Preservation of digital information is widely considered to require more constant and ongoing attention than preservation of other media.[3] This constant input of effort, time, and money to handle rapid technological and organizational advance is considered the main stumbling block for preserving digital information. While we are still able to read our written heritage from several thousand years ago, the digital information created merely a decade ago is in serious danger of being lost.

Strategies

There are several strategies which individuals and organizations may use to combat the loss of digital information:[4]

Refreshing

Refreshing is the copying of data onto newer media or systems. For example, transferring census data from an old tape to a new one or transferring an MP3 from a hard drive to CD. This strategy may need to be combined with migration when the software or hardware required to read the data is no longer available or is unable to understand the format of the data. Refreshing will likely always be necessary due to the deterioration of physical media.

Migration

Migration is the transferring of data to newer system environments. This may include conversion of resources from one format to another (e.g., conversion of Microsoft Word to PDF or OpenDocument), from one operating system to another (e.g., Solaris to Linux) or from one programming language to another (e.g., C to Java) so the resource remains fully accessible and functional. Resources that are migrated run the risk of losing some type of functionality since newer formats may be incapable of capturing all the functionality of the original format, or the converter itself may be unable to interpret all the nuances of the original format. The latter is often a concern with proprietary data formats.

Replication

Creating duplicate copies of data on one or more systems is called replication. Data that exists as a single copy in only one location is highly vulnerable to software or hardware failure, intentional or accidental alteration, and environmental catastrophes like fire, flooding, etc. Digital data is more likely to survive if it is replicated in several locations. Replicated data may introduce difficulties in refreshing, migration, versioning, and access control since the data is located in multiple places.

Emulation

Emulation is the replicating of functionality of an obsolete system[5]. For example, emulating an Atari 2600 on a Windows system or emulating WordPerfect 1.0 on a Macintosh. Emulators may be built for applications, operating systems, or hardware platforms. Emulation has been a popular strategy for retaining the functionality of old video game systems. The feasibility of emulation as a catch-all solution has been debated in the academic community[6].

Raymond A. Lorie has suggested a Universal Virtual Computer (UVC) could be used to run any software in the future on a yet unknown platform[7]. The UVC strategy uses a combination of emulation and migration, but it has not yet been widely adopted by the digital preservation community.

Trustworthy digital objects

Digital objects that can speak to their own authenticity are called trustworthy digital objects (TDOs). TDOs were proposed by Henry M. Gladney to enable digital objects to maintain a record of their change history so future users can know with certainty that the contents of the object are authentic[8]. Other preservation strategies like replication and migration are necessary for the long-term preservation of TDOs.

Examples of digital preservation initiatives

See also

References

  1. Rothenberg, Jeff (1995). "Ensuring the Longevity of Digital Documents". Scientific American 272 (1).
  2. Rothenberg, Jeff (1999). "Ensuring the Longevity of Digital Information". Expanded version of Ensuring the Longevity of Digital Documents.
  3. Lifecycle Information for E-literature. LIFE. Retrieved on 2007-06-14.
  4. Garrett, J., D. Waters, H. Gladney, P. Andre, H. Besser, N. Elkington, H. Gladney, M. Hedstrom, P. Hirtle, K. Hunter, R. Kelly, D. Kresh, M. Lesk, M. Levering, W. Lougee, C. Lynch, C. Mandel, S. Mooney, A. Okerson, J. Neal, S. Rosenblatt, and S. Weibe (1996). "Preserving digital information: Report of the task force on archiving of digital information". Commission on Preservation and Access and the Research Libraries Group.
  5. Rothenberg, Jeff (1998). Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. Washington, DC, USA: Council on Library and Information Resources. ISBN 1-887334-63-7. 
  6. Granger, Stewart (2000). "Emulation as a Digital Preservation Strategy". D-Lib Magazine 6 (10).
  7. Lorie, Raymond A. (2001). "Long Term Preservation of Digital Information". Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '01), 346-352.
  8. Gladney, H. M. (2004). "Trustworthy 100-year digital objects: Evidence after every witness is dead". ACM Transactions on Information Systems 22 (3): 406-436.