User talk:Thomas Wright Sulcer/sandbox7: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Thomas Wright Sulcer
(Added diagram)
m (removing empty ref)
 
(45 intermediate revisions by 3 users not shown)
Line 1: Line 1:
Temporary file: Intron; (possible suggestions for new material March 5, 2010)
==={{pl|Panton Principles}}===
Already started. Further possible additions below.


[[Image:DNA splicing introns off to make proteins.jpg|thumb|right|alt=Diagram showing how DNA is translated into proteins by splicing off introns.|When DNA makes a protein, introns are spliced off during this process, so only the exons remain.]]
==Principles==
An '''intron''' is the '''''int'''''ervening, non-coding sequence of [[nucleic acid]] that is between the '''''ex'''''pressed sequences ([[exon]]s) in a [[gene]]. It is removed from the primary [[RNA]] transcript by [[RNA splicing|splicing]] and is a common feature of [[eucaryote|eucaryotic]] genes. Introns are the spacer regions of DNA that separate the information-coding parts of a gene.<ref name=twsMAR02p>{{cite news
But the big incentive for the declaration is for scientists who come across data which has been used in previous studies. Is the data acceptable for using? Are there limitations on what it can used for? Will using a specific set of data generate legal issues? Since many of these issues are unclear, scientists coming across data in the [[public sphere]] may face uncertainty about whether they are allowed to access it, use it, study it further, or base new studies on it. It is enough of an issue that a group of scientists have issued a statement known as the ''Panton Principles''.
|author= Nicholas Wade
 
|title= As Scientists Pinpoint the Genetic Reason for Lactose Intolerance, Unknowns Remain
What the drafters of the Panton Principles want is that when scientists release data, they attach a statement or marker which describes the wishes of the originating scientist regarding the future use of the data. The idea is to come up with an easily understood tag applicable to all data they choose to release, so that others who come across the data will be able to understand what the data creator's intentions were when releasing the data. The hope is, of course, that all data might be freely used for any purpose, but the tag enables this to be more readily understood.
|quote= The authors of the new report say the two DNA units that switch off the lactase gene are in the 9th and 13th introns in a neighboring gene whose role strangely has nothing to do with lactose metabolism. Introns are the spacer regions of DNA that separate the information-coding parts of a gene. Because the cell cuts out and discards the introns when a gene is activated, these disposable pieces of DNA have long been ignored. Now it seems they play unexpected roles in gene control.
|publisher= The New York Times
|date= January 14, 2002
|url= http://www.nytimes.com/2002/01/14/us/scientists-pinpoint-genetic-reason-for-lactose-intolerance-unknowns-remain.html?pagewanted=1
|accessdate= 2010-03-02
}}</ref>


Generally introns are found in the [[DNA]] of more advanced species. Scientists believe that the only living creatures on earth billions of years ago were [[bacteria]] which belonged to a group called [[prokaryotes]].<ref name=twsMAR02m>{{cite news
There are concerns that current license formats such as the Public Domain Dedication and License (PDDL) and the Creative Commons CC0 are complex from a legal standpoint. And the movement favoring the Panton Principles is, in some respects, a way to simplify matters. One scientist explained that the benefit of declaring data "open" is that it makes it possible for subsequent researchers to use it freely, without [[fear]] or [[anxiety]] or uncertainty:
|author= Carl Zimmer
<blockquote>The biggest danger is NOT making the assertion that the data is Open. There may be second-order problems from CC0 or PPDL but they are nothing compared to the uncertainty of NOT making this simple assertion. Do not try to be clever and use SA, NC or other restricted licenses. Simply state the data are Open.</blockquote>
|title= From Bacteria to Us: What Went Right When Humans Started to Evolve?
[[Image:Panton Arms Pub in Cambridge UK.jpg|thumb|350px|right|alt=A building.|The [[Panton Arms]] pub in [[Cambridge]], [[United Kingdom]] where the Panton Principles were drawn up.]]
|quote= Eukaryotes can do more with their genes, too. They can switch genes on and off in complex patterns to control where and when they make proteins. And they can make many proteins from a single gene. That is because eukaryote genes are segmented into what are called exons. Exons are interspersed with functionless stretches of DNA known as introns. Human cells edit out the introns when they copy a gene for use in building a protein. But a key ability is that they can also edit out exons, meaning that they can make different proteins from the same gene. This versatility means that eukaryotes can build different kinds of cells, tissues and organs, without which humans would look like bacteria.
|publisher= The New York Times
|date= January 3, 2006
|url= http://www.nytimes.com/2006/01/03/science/03zimm.html
|accessdate= 2010-03-02
}}</ref> But about two billion years ago, a group branched off from prokaryotes called [[eukaryotes]] which evolved into much more complex organisms including animals, plants, fungi, and some protozoans, and which had bigger and more complex DNA.<ref name=twsMAR02m/> A ''New York Times'' science reporter explains:
{{quote|The eukaryote genome is downright baroque. It is typically much bigger and carries many more genes. Eukaryotes can do more with their genes, too. They can switch genes on and off in complex patterns to control where and when they make proteins. And they can make many proteins from a single gene.<ref name=twsMAR02m/>}}
Why is the DNA more advanced? It's segmented. The DNA alternates between what are called [[exons]] and [[introns]]. This allows human cells to "edit out introns" when copying genes to build proteins, and lets cells make a wide variety of different proteins from the same gene.<ref name=twsMAR02m/> It is believed that the exons were functionless stretches whose only purpose was to break up stretches of exons, but this view has been questioned.<ref name=twsMAR02m/> Scientist Michael Lynch suspected the addition of introns into DNA was a harmful accident at first. When an intron was wedged into the middle of a gene, cells had to be able to recognize the boundaries and "skip over" the introns when making a protein.<ref name=twsMAR02m/> Lynch hypothesized that this first led to defective proteins, but that the overall effect of interspersed introns was a phenomenon called [[genetic drift]] which ultimately helped evolution, since it created new opportunities for adaptations to be successful.<ref name=twsMAR02m/>


There is no single mechanism for splicing an intron from the primary RNA transcript.  Introns are classified into four groups based on the mechanism of splicing. In general splicing can be either autocatalytic or be catalyzed by a large complex of proteins know as the [[spliceosome]]. Splicing can also be viewed as a form of gene regulation since [[alternative splicing]] can result in different combinations of exons in the mature RNA transcript. In this way one gene can code for distinct open reading frames and hence has the potential to code for more than one protein.
Walter Jessen explained:
<blockquote>Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.<ref name=twsMAR23a544>{{cite news
|author= Walter Jessen
|title= The Panton Principles for Open Data in Science
|publisher= Next Generation Science
|date= February 19, 2010
|url= http://www.nextgenerationscience.com/open-access/the-panton-principles-for-open-data-in-science/
|accessdate= 2010-03-23
}}</ref></blockquote>


The human DNA helix, if unraveled, would be about three feet long, and the sections of introns inside it have been referred to as "mostly chaotic" and an "indecipherable wilderness".<ref name=twsMAR02n>{{cite news
In March 2010, the Panton Principles is an Internet-based initiative calling for a scientist to make an "explicit and robust statement" regarding his or her wishes for the data by using a "recognized waiver or license that is appropriate for data."<ref name=twsMAR23av33>{{cite news
|title= DNA Evidence
|author= Bill Hooker
|quote= DNA probe analysis grew out of basic genetic research, with far different aims. A kind of serendipitous gift to police science, it takes advantage of a peculiarity within the human genetic code. Along the three feet of the double helix in each complete DNA molecule there exists, in addition to the tens of thousands of protein-coding genes, a so-far indecipherable wilderness called the intron. The intron, although it seems mostly chaotic, nevertheless contains certain repetitive sequences of the genetic alphabet, which geneticists sometimes call "stutters" or "burps."
|title= Panton Principles for Open Data in Science
|publisher= The New York Times
|publisher= Science Commons Symposium
|date= June 18, 2009
|date= 2010-03-23
|url= http://topics.nytimes.com/topics/reference/timestopics/subjects/d/dna_evidence/index.html
|url= http://www.sennoma.net/main/archives/2010/02/panton_principles_for_open_dat.php
|accessdate= 2010-03-02
|accessdate= 2010-03-23
}}</ref> The call for a set of principles stems in part from a sense that "many widely recognized licenses are not intended for ... data". Licenses such as "Creative Commons" have been described as unsuitable for handling issues such as scientific data.<ref name=twsMAR23>{{cite web
|title= The Panton Principles for Open Data in Science
|publisher= Connected Knowledge
|quote= Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described here. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.
|date= February 19th, 2010
|url= http://www.connected-knowledge.com/?p=583
|accessdate= 2010-03-23
}}</ref>
}}</ref>


Sometimes the [[DNA]] that codes for introns is classified as [[junk DNA]] but this is an oversimplification since introns can contain functional RNA or DNA sequences; these include transfer RNA ([[tRNA]]) or microRNA ([[miRNA]]) sequences. In the chromosome, DNA sequences that code for an intron can also include enhancers that are important for gene expression.
[[Image:Panton Arms Signers.jpg‎|thumb|350px|left|alt=People in front of a building.|The signers of the Panton Principles in September 2009 included (from left:) Jenny Meyer, Jordan Hatcher, Rufus Pollock, John Wilbanks, Cameron Neylon, Peter Murray-Rust, Carolina Rossini.]]
 
An Internet search of about twenty major newspapers and magazines, including the ''[[New York Times]]'' and ''[[BBC News]]'' using the search term "Panton Principles" did not find any results on March 23, 2010, although there are Internet sites dealing with scientific issues that have posted comments about the initiative.


The true role of introns is unclear but one hypothesis is that introns allow [[gene shuffling]] to occur, resulting in the creation of new exon combinations and novel proteins. The presence of an intron can also enhance gene expression since the process of RNA splicing seems to facilitate the trafficking of the mRNA out of the eucaryotic nucleus.
Here are the four principles:


==Discoveries==
# When publishing data make an explicit and robust statement of your wishes.
# Use a recognized waiver or license that is appropriate for data.
# If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
# Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.


After completing the human genome in 2001, scientists found 22,000 genes, but noticed there were more than 100,000 different proteins. If each gene could make only one protein, then how could there be so many different proteins? A ''Washington Post'' reporter explained:
Why are the principles necessary? A post-doctoral student in [[Sweden]] explained in a [[blog]] about being perplexed when finding useful [[data]] but without any explicit information about what could be done with it. Contacting the creators of the data for permission is cumbersome and slow, and there is the possibility that the initial author of the data is "missing in action". An explicit statement is much preferred.<ref name=twsMAR234qqw>{{cite news
{{quote|When a gene is activated, it is first transcribed into an intermediate molecule called mRNA. The introns are clipped out and the exons spliced together, and the whole thing is then translated into the protein. Biologists used to think one gene produced one protein. Now it's clear that one gene can produce many different proteins. Under certain conditions, a cell clips out not only the intron fillers but also one or more of the exons. This is like taking a speech and removing many of the sentences. Done in different ways, it can produce many different messages.<ref name=twsMAR03vgff>{{cite news
|author= Egon Willighagen
|author= David Brown
|title= Panton Principles
|title= How Science Is Rewriting the Book on Genes
|publisher= Egon Willighagen's Blog
|publisher= Washington Post
|date= February 19, 2010
|date= November 12, 2007
|url= http://chem-bla-ics.blogspot.com/2010/02/open-data-panton-principles.html
|url= http://www.washingtonpost.com/wp-dyn/content/story/2007/11/11/ST2007111101076.html
|accessdate= 2010-03-23
|accessdate= 2010-03-03
}}</ref> But the idea about using a "waiver or license appropriate for data" was, in the view of this blogger, "debatable", particularly when it came to the possibility of mixing data sets, and prefers the [[copyleft]] license approach. He didn't like the non-commercial restrictive clause since, in his view, it doesn't make things easier, and prefers [[public domain]] via the PDDL or CCZero licenses.<ref name=twsMAR234qqw/>
}}</ref>}}


Scientists think that about five percent of the human genome has a "message" of one sort or another, with a particular order of nucleotide letters (A, G, C, T) being of utmost importance in determining important aspects of a human's body chemistry. Any addition, deletion, or change can have a big effect, including death.<ref name=twsMAR03vgff/> Still, there are large stretches of genes which had been thought to have been "junk DNA" but is turning out to have an important role. These conserved "non-coding elements" include insulators, micro-RNAs, exon-splicing enhancers, e'-untranslated hairpins and other molecules which are "emerging from the shadows."<ref name=twsMAR03vgff/> They regulate the activity of genes that ''do'' make the proteins by turning them on and off, tweaking them, and coordinating the sequential action of their effects.<ref name=twsMAR03vgff/> And how these processes operate are important not only for geneticists, but for scientists who study evolution, since "more of evolution's survival-of-the-fittest battles occurred in writing the instruction manual for running the genes than in designing the genes themselves," according to a ''Washington Post'' report in 2007.<ref name=twsMAR03vgff/>
The statement grew out of discussion between many scientists, although one source credits the launch of the Panton Principles to [[Jonathan Gray]].<ref name=twsMAR239iio>{{cite news
|author= Cameron Neylon
|title= The Panton Principles: Finding agreement on the public domain for published scientific data
|publisher= Science in the Open (blog)
|quote= The launch of the Panton Principles, many months after they were first suggested is really largely down to the work of Jonathan Gray. This was one of several projects that I haven’t been able to follow through properly on and I want to acknowledge the effort that Jonathan has put into making that happen.
|date= 22 February 2010
|url= http://cameronneylon.net/blog/the-panton-principles-finding-agreement-on-the-public-domain-for-published-scientific-data/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+ScienceInTheOpen+(Science+in+the+open)
|accessdate= 2010-03-23
}}</ref> Cameron Neylon described how the principles came about:
<blockquote>The Principles came out of a discussion in the [[Panton Arms]] a pub near to the [[Chemistry]] Department of [[Cambridge University]] ... Where we found agreement was that for science, and for scientific data, and particularly science funded by public investment, that the [[public domain]] was the best approach and that we would all recommend it. ... placing data explicitly, irrevocably, and legally in the public domain satisfies both the Open Knowledge Definition and the Science Commons Principles for Open Data was something that we could all personally sign up to. The end result is something that I have no doubt is imperfect ... Above all, it is a start.<ref name=twsMAR239iio/></blockquote>


==The naming of intron==
==References==
The names ''intron'' and ''exon'' were described as "less catchy" but nevertheless "novel" names, and the naming was against a trend of using "dull and pedantic" impenetrable Latin names.<ref name=twsMAR02o>{{cite news
{{reflist}}
|author= Stephen S. Hall
|title= Scientists Find Catchy Names Help Ideas Fly
|quote= Novel though less catchy are the terms "intron" and "exon," used to distinguish different regions of the genetic material. It used to be thought that a stretch of DNA forming a gene was simply copied onto a strand of RNA, which directed synthesis of a protein.
|publisher= The New York Times
|date= October 20, 1992
|url= http://www.nytimes.com/1992/10/20/science/scientists-find-catchy-names-help-ideas-fly.html?pagewanted=all
|accessdate= 2010-03-02
}}</ref> In 1977, three scientists at a cafeteria in Basel, [[Switzerland]], discussed possible names. Dr. Walter Gilbert was credited for thinking up the name ''intron'', while Dr. Melvin Cohn coined the term ''exon'', although he was originally thinking of the large oil firm [[Exxon]]. Dr. Cohn explained:
{{quote|I was actually thinking of 'exxon' with two x's like the oil company -- it was a joke -- I was making fun of Gilbert because I didn't think those were the best possible terms. They were too slangy and didn't best describe what was going on.<ref name=twsMAR02o/>}}
Nevertheless, Dr. Gilbert wrote down the terms on a napkin, and both terms appeared in a 1978 commentary in the journal ''Nature''.<ref name=twsMAR02o/>


==Recent research involving introns==
----
==Possible further articles==


According to one source, scientists are exploring ways of using data from introns when doing DNA analysis regarding police investigations. While each intron seems chaotic, there are repetitive sequences of the genetic alphabet sometimes called "stutters" or "burps" which can be analyzed.<ref name=twsMAR02n/> When multiple probes are used, it's possible to analyze introns in the DNA to identify its uniqueness and produce an identification "as reliable as a human fingerprint."<ref name=twsMAR02n/>
==={{pl|Data sharing}}===


Scientists discovered that two DNA units that switch off the lactase gene are in "the 9th and 13th introns of a neighboring gene whose role strangely has nothing to do with lactose metabolism."<ref name=twsMAR02p/> Scientists are discovering that the spacer-regions in DNA called introns "play unexpected roles in gene control."<ref name=twsMAR02p/>
[[Image:JKepler.png|thumb|240px|left|alt=Picture of a portrait of a man.|[[Johannes Kepler]] used data measurements from [[Tycho Brahe]] to develop three fundamental laws of planetary motion.]]
There's a well-known example from the history of [[astronomy]] in which one scientist took the data from another in a totally new direction. The [[Denmark|Danish]] scientist [[Tycho Brahe]] (1546-1601) worked tirelessly to make accurate measurements of [[planet|planetary]] [[parallax]] were accurate to the arcminute. By systematic and rigorous observation, night after night, Brahe amassed a comprehensive set of data detailing the positions of the planets and [[star (astronomy)|stars]]. But Brahe was unable to fit his data into a comprehensive [[scientific theory|theory]]. After Brahe's death, fellow scientist [[Johannes Kepler]] used Brahe's data to develop the ''[[Mathematicus Imperialis]]'' at the court of emperor [[Rudolph II]] in [[Prague]], [[Czechoslovakia]] and, using Brahe's data, figured out the three laws of planetary motion, including the fact that planets moved in [[ellipse|elliptical]] [[orbit|orbits]] not [[circle|circular]] ones.


Researchers are studying the relationship between introns and gender-related differences in colon cancer.<ref name=twsMAR02q>{{cite news
Scientists may have reasons &mdash; actual or perceived &mdash; to withhold or delay the release of data: For instance, they may need time to investigate data fully to remove artifacts or to prevent valuable information from being overlooked. The data may also contain the seed for a [[scientific paper|publication]], [[patent]] or [[business model]], or private information about patients. It is possible, as well, that a scientist may selectively pick and choose data which supports a given conclusion while ignoring outliers, perhaps to make a case for a specific hypothesis. In such an instance, revealing the entire data set may allow other researchers to use their own data to prove them wrong. Conversely, making data available as they arise lends additional credit to the researchers involved, and making it available in a reusable form (i.e. in some standard format and with proper annotations) may allow others to build upon their work even ahead of formal publication.  
|author= Reuters Health
|title= Gene effect on colon cancer differs by gender
|quote= The research team at the Keck School of Medicine in Los Angeles, headed by Dr. Heinz-Josef Lenz, studied two variant forms of EGFR. One of the variants involved a change at a spot called codon 497 and the other involved a change in an area known as intron 1.
|publisher= Reuters
|date= April 15, 2008
|url= http://www.reuters.com/article/idUSCOL24375320080502
|accessdate= 2010-03-02
}}</ref>


There are suggestions that the composition of a specific intron, a three-SNP haplotype in the intron 1 of OCA2, is related to "human eye color variation", and scientists believe two major genes and several minor ones which account for the "tremendous variation in human eye color."<ref name=twsMAR02r>{{cite news
==={{pl|Scientific data}}===
|author= April Holladay
Started.
|title= Hazel is in the eye of the beholder; more on memory
|quote= D.L. Duffy, G.W. Montgomery, W. Chen, Z.Z. Zhao, L. Le, M.R. James, N.K. Hayward, N.G. Martin, R.A. Sturm. A three-SNP haplotype in the intron 1 of OCA2 explains most human eye color variation. American Journal of Human Genetics, 80: 241-252 (2007).
|publisher= USA Today
|date= 2007-03-19
|url= http://www.usatoday.com/tech/columnist/aprilholladay/2007-03-19-hazel-eye-memory_N.htm
|accessdate= 2010-03-02
}}</ref>


There was speculation that introns inside the dysbindin gene may have a role in [[schizophrenia]].<ref name=twsMAR02rxxx>{{cite news
==={{pl|Open data}}===
|author= Nicholas Wade
|title= Schizophrenia May Be Tied To 2 Genes, Research Finds
|quote= Despite years of false leads, setbacks and unsustained claims, researchers hope they are now starting to close in on some of the genes that go awry in schizophrenia, a devastating mental disease that affects two million Americans... Dr. Straub found genetic variations in the dysbindin gene that were more common in the schizophrenic patients. Curiously, they are all in introns, the spacer regions of the DNA that lie between the working parts of the dysbindin gene. The Richmond team is not sure that any of the intron changes is the causative mutation of schizophrenia and is analyzing the working parts more closely.
|publisher= The New York Times
|date= July 4, 2002
|url= http://www.nytimes.com/2002/07/04/us/schizophrenia-may-be-tied-to-2-genes-research-finds.html?pagewanted=1
|accessdate= 2010-03-02
}}</ref>


==References==
Sometimes data can be used for different purposes by different scientists. While data is often released on the Internet, it's sometimes unclear what guidelines apply as to how the data can be used or whether there are [[copyright]] restrictions. Accordingly, a group of scientists in [[Cambridge, U.K.|Cambridge]], [[United Kingdom|U.K.]] in a pub called the [[Panton Arms]] wrote in September 2009 a set of guidelines called the [[Panton Principles]]. The idea behind this effort is that a scientist, releasing data into the public, can attach a tag to the data indicating that the data is free to use and is not subject to copyright restrictions. Hopefully this will enable future scientists to use data freely without anxiety about any possible [[law|legal]] repercussions.
{{Reflist}}

Latest revision as of 02:31, 6 October 2024

Stub Panton Principles

Already started. Further possible additions below.

Principles

But the big incentive for the declaration is for scientists who come across data which has been used in previous studies. Is the data acceptable for using? Are there limitations on what it can used for? Will using a specific set of data generate legal issues? Since many of these issues are unclear, scientists coming across data in the public sphere may face uncertainty about whether they are allowed to access it, use it, study it further, or base new studies on it. It is enough of an issue that a group of scientists have issued a statement known as the Panton Principles.

What the drafters of the Panton Principles want is that when scientists release data, they attach a statement or marker which describes the wishes of the originating scientist regarding the future use of the data. The idea is to come up with an easily understood tag applicable to all data they choose to release, so that others who come across the data will be able to understand what the data creator's intentions were when releasing the data. The hope is, of course, that all data might be freely used for any purpose, but the tag enables this to be more readily understood.

There are concerns that current license formats such as the Public Domain Dedication and License (PDDL) and the Creative Commons CC0 are complex from a legal standpoint. And the movement favoring the Panton Principles is, in some respects, a way to simplify matters. One scientist explained that the benefit of declaring data "open" is that it makes it possible for subsequent researchers to use it freely, without fear or anxiety or uncertainty:

The biggest danger is NOT making the assertion that the data is Open. There may be second-order problems from CC0 or PPDL but they are nothing compared to the uncertainty of NOT making this simple assertion. Do not try to be clever and use SA, NC or other restricted licenses. Simply state the data are Open.

A building.
The Panton Arms pub in Cambridge, United Kingdom where the Panton Principles were drawn up.

Walter Jessen explained:

Science is based on building on, reusing and openly criticising the published body of scientific knowledge. For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open. By open data in science we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published science should be explicitly placed in the public domain.[1]

In March 2010, the Panton Principles is an Internet-based initiative calling for a scientist to make an "explicit and robust statement" regarding his or her wishes for the data by using a "recognized waiver or license that is appropriate for data."[2] The call for a set of principles stems in part from a sense that "many widely recognized licenses are not intended for ... data". Licenses such as "Creative Commons" have been described as unsuitable for handling issues such as scientific data.[3]

People in front of a building.
The signers of the Panton Principles in September 2009 included (from left:) Jenny Meyer, Jordan Hatcher, Rufus Pollock, John Wilbanks, Cameron Neylon, Peter Murray-Rust, Carolina Rossini.

An Internet search of about twenty major newspapers and magazines, including the New York Times and BBC News using the search term "Panton Principles" did not find any results on March 23, 2010, although there are Internet sites dealing with scientific issues that have posted comments about the initiative.

Here are the four principles:

  1. When publishing data make an explicit and robust statement of your wishes.
  2. Use a recognized waiver or license that is appropriate for data.
  3. If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
  4. Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.

Why are the principles necessary? A post-doctoral student in Sweden explained in a blog about being perplexed when finding useful data but without any explicit information about what could be done with it. Contacting the creators of the data for permission is cumbersome and slow, and there is the possibility that the initial author of the data is "missing in action". An explicit statement is much preferred.[4] But the idea about using a "waiver or license appropriate for data" was, in the view of this blogger, "debatable", particularly when it came to the possibility of mixing data sets, and prefers the copyleft license approach. He didn't like the non-commercial restrictive clause since, in his view, it doesn't make things easier, and prefers public domain via the PDDL or CCZero licenses.[4]

The statement grew out of discussion between many scientists, although one source credits the launch of the Panton Principles to Jonathan Gray.[5] Cameron Neylon described how the principles came about:

The Principles came out of a discussion in the Panton Arms a pub near to the Chemistry Department of Cambridge University ... Where we found agreement was that for science, and for scientific data, and particularly science funded by public investment, that the public domain was the best approach and that we would all recommend it. ... placing data explicitly, irrevocably, and legally in the public domain satisfies both the Open Knowledge Definition and the Science Commons Principles for Open Data was something that we could all personally sign up to. The end result is something that I have no doubt is imperfect ... Above all, it is a start.[5]

References

  1. Walter Jessen. The Panton Principles for Open Data in Science, Next Generation Science, February 19, 2010. Retrieved on 2010-03-23.
  2. Bill Hooker. Panton Principles for Open Data in Science, Science Commons Symposium, 2010-03-23. Retrieved on 2010-03-23.
  3. The Panton Principles for Open Data in Science. Connected Knowledge (February 19th, 2010). Retrieved on 2010-03-23. “Many widely recognized licenses are not intended for, and are not appropriate for, data or collections of data. A variety of waivers and licenses that are designed for and appropriate for the treatment of data are described here. Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.”
  4. 4.0 4.1 Egon Willighagen. Panton Principles, Egon Willighagen's Blog, February 19, 2010. Retrieved on 2010-03-23.
  5. 5.0 5.1 Cameron Neylon. The Panton Principles: Finding agreement on the public domain for published scientific data, Science in the Open (blog), 22 February 2010. Retrieved on 2010-03-23. “The launch of the Panton Principles, many months after they were first suggested is really largely down to the work of Jonathan Gray. This was one of several projects that I haven’t been able to follow through properly on and I want to acknowledge the effort that Jonathan has put into making that happen.”

Possible further articles

Stub Data sharing

Picture of a portrait of a man.
Johannes Kepler used data measurements from Tycho Brahe to develop three fundamental laws of planetary motion.

There's a well-known example from the history of astronomy in which one scientist took the data from another in a totally new direction. The Danish scientist Tycho Brahe (1546-1601) worked tirelessly to make accurate measurements of planetary parallax were accurate to the arcminute. By systematic and rigorous observation, night after night, Brahe amassed a comprehensive set of data detailing the positions of the planets and stars. But Brahe was unable to fit his data into a comprehensive theory. After Brahe's death, fellow scientist Johannes Kepler used Brahe's data to develop the Mathematicus Imperialis at the court of emperor Rudolph II in Prague, Czechoslovakia and, using Brahe's data, figured out the three laws of planetary motion, including the fact that planets moved in elliptical orbits not circular ones.

Scientists may have reasons — actual or perceived — to withhold or delay the release of data: For instance, they may need time to investigate data fully to remove artifacts or to prevent valuable information from being overlooked. The data may also contain the seed for a publication, patent or business model, or private information about patients. It is possible, as well, that a scientist may selectively pick and choose data which supports a given conclusion while ignoring outliers, perhaps to make a case for a specific hypothesis. In such an instance, revealing the entire data set may allow other researchers to use their own data to prove them wrong. Conversely, making data available as they arise lends additional credit to the researchers involved, and making it available in a reusable form (i.e. in some standard format and with proper annotations) may allow others to build upon their work even ahead of formal publication.

Stub Scientific data

Started.

Stub Open data

Sometimes data can be used for different purposes by different scientists. While data is often released on the Internet, it's sometimes unclear what guidelines apply as to how the data can be used or whether there are copyright restrictions. Accordingly, a group of scientists in Cambridge, U.K. in a pub called the Panton Arms wrote in September 2009 a set of guidelines called the Panton Principles. The idea behind this effort is that a scientist, releasing data into the public, can attach a tag to the data indicating that the data is free to use and is not subject to copyright restrictions. Hopefully this will enable future scientists to use data freely without anxiety about any possible legal repercussions.