Talk:NoSQL: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Pat Palmer
(history of NOSQL movement (a rant))
imported>Pat Palmer
(→‎More progress: I'd prefer to wait a bit before seeking Approval status)
 
(22 intermediate revisions by 3 users not shown)
Line 10: Line 10:


==Pat's review of this article==
==Pat's review of this article==
This is a great beginning.  The article as it stands today covers a lot of ground, and I especially appreciate the DOI in the reference list, plus the spare, lean and precise technical language employwed in many sections.  Some comments, questions, and ideas for additional development are detailed in the following subsections (to be added shortly):
This is a great beginning.  The article as it stands today is written at a high level of expertise--I learned a lot from reading it--and I appreciate the DOI in the reference list, plus the spare, lean and precise technical language employed in many sections.  Some comments, questions, and ideas for additional development are detailed in the following subsections:


===intended readers===
===intended readers===
This article is the product of one or more writer who appear to have a high level of technical expertise with databases in general, and who writes with admirable clarity and conciseness about technical matters.  The article seems to assume that readers are already familiar with the capabilities of conventional relational database management systems, and also with map-reduce algorithms.  While the technology being described is sufficiently complicated that much of the article probably may necessarily be beyond the scope of what an expert lay reader might understand, IMO it is still important to strive for satisfying both intelligent lay experts and deep subject experts. Perhaps there might be added some kind of statement about this in the overview at the beginning.  It might also be useful to provide some history and market descriptive material near the top, before launching heavily into the tech speak.[[User:Pat Palmer|Pat Palmer]] 20:09, 19 August 2010 (UTC)
This article is the product of one or more writer who appear to have a high level of technical expertise with databases in general, and who writes with admirable clarity and conciseness about technical matters.  The article seems to assume that readers are already familiar with the capabilities of conventional relational database management systems, and also with map-reduce algorithms.  While the technology being described is sufficiently complicated that much of the article probably may necessarily be beyond the scope of what an expert lay reader might understand, IMO it is still important to strive for satisfying both intelligent lay experts and deep subject experts. Perhaps there might be added some kind of statement about this in the overview at the beginning.  It might also be useful to provide some history and market descriptive material near the top, before launching heavily into the tech speak.[[User:Pat Palmer|Pat Palmer]] 20:09, 19 August 2010 (UTC)
:P. S., While I was able to follow most of the article, I am forced to admit that I am not sure what is meant by "ad hoc query".  I could just Google it.  It would also be great if the article briefly defined it (and other terms) on first use, or else pointed off to another article that contains a definition.[[User:Pat Palmer|Pat Palmer]] 20:38, 19 August 2010 (UTC)


===intro===
===intro===
The nice intro would possibly be even better by providing a quick summary of when, where, why, who, etc.  See next comment below about the history of this technology.[[User:Pat Palmer|Pat Palmer]] 20:25, 19 August 2010 (UTC)
The nice intro would possibly be even better by providing a quick summary of when, where, why, who, etc.  See next comment below about the history of this technology.  Also, the surge of interest in the open source community as a result of recently create cloud computing technologies would fit will into the opening overview.  I would consider moving the very last section up into the end of the introductory overview.[[User:Pat Palmer|Pat Palmer]] 20:25, 19 August 2010 (UTC)
 
===code snippets===
It is important that the two code snippets identify the language that they are written in.  I can pretty much surmise that one is Java, and the other C (or C++), but it needs to be made explicit somewhere.[[User:Pat Palmer|Pat Palmer]] 20:36, 19 August 2010 (UTC)
 
===semi-structured data===
Wow, use of JSON to identify rather free-form data.  After just reading the Semantic Web article, I am very curious to know whether there is (or could be some day) a relationship between semi-structured data and Semantic Web technologies.  Or is there already, and I am just to new to these ideas to get it?[[User:Pat Palmer|Pat Palmer]] 20:41, 19 August 2010 (UTC)
 
===data held redundantly on multiple servers===
Although there is a diagram on distributed hash tables included in the article, its notes are in Japanese.  I am sorely missing the elegant explanation given during the class presentation on how this works, and I urge that this explanation be added to the article, since understanding it makes the cloud computer implementations less mystifying.[[User:Pat Palmer|Pat Palmer]] 21:05, 19 August 2010 (UTC)
 
===references===
I noted the following reference (whose link is broken) on the Wikipedia NOSQL article.  I believe it would provide useful background for this article: ''Agrawal, R., et al. (2009). The Claremont Report on Database Research. Association for Computing Machinery. Communications of the ACM, 52(6), 56.  Retrieved August 19, 2010, from ABI/INFORM Global. (Document ID: 1753261191).''  The abstract for it is:
 
:"A group of database researchers, architects, users, and pundits met in May 2008 at the Claremont Resort in Berkeley, CA, to discuss the state of database  it search and its effects on practice. This was the seventh meeting of this sort over the past 20 years and was distinguished by a broad consensus that the database community is at a turning point in its history, due to both an explosion of data and usage scenarios and major shifts in computing hardware and platforms. This article explores the conclusions of this self-assessment. The theme of the Claremont meeting was that database  research and the data-management industry are at a turning point, with unusually rich opportunities for technical advances, intellectual achievement, entrepreneurship, and benefits for science and society. Given the large number of opportunities, it is important for the database  research community to address issues that maximize relevance within the field, across computing, and in external fields as well."[[User:Pat Palmer|Pat Palmer]] 21:16, 19 August 2010 (UTC)


===history of NOSQL movement (a rant)===
===history of NOSQL movement (a rant)===
While the recent surge of activity in the NOSQL movement is a product of the emergence of cloud computing platforms by Google, the needs of social networking sites such as LinkedIn and Facebook, and the open source movement's interest in low-cost software, the technology has its roots in an earlier, high commercially successful product: Lotus Notes, first released in 1989 by a small group of developers (led by Ray Ozzie) in Boston.  The product was eventually bought up by IBM and still enjoys widespread utilization today.  I consulted for a few years on this really fantastic product, and what I see now is that much of the writing in Wikipedia and elsewhere in the NOSQL movement "glosses over" that fact that Lotus Notes DID IT FIRST, and in fact, it was a first in many other respects, such as role-based security, use of a totally optimistic non-locking strategy, provision for replication and off-line use of databases, automatic indexing, and automatic rendering of all database contents onto web pages.  The article cited for the history buries Lotus Notes in the 1980's with a 2-line blurb.  This is a glaring historical inaccuracy.  The open source community is possibly overlooking Lotus Notes because it is not free (in fact, just like Oracle or SQL Server, one has to pay a fair amount for it), but this is no excuse to wrongly imply, for example, the Berkley DB (charmingly described in Wikipedia as created by "old school" and which dates, I believe, to 1996) is one of the firstLotus Notes definitely broke the mold and needs to be more prominently mentioned in any accurate history of a non-relational, distributed database architecture THAT ACTUALLY WORKS.  The advent of today's cloud technologies such as Google's MapReduce platform has, of course, brought things to a new level.  There, end of my rant. How could any younger person who wasn't there possibly know this?  I'm telling you now.  This article, to be really good, needs to address this glaring, well, injustice.[[User:Pat Palmer|Pat Palmer]] 20:25, 19 August 2010 (UTC)
I noted with interest (and disappointment, though not surprise) that Lotus Notes is not currently shown in the list of "popular document-based databases".  While the recent surge of NOSQL development is a product of the emergence of cloud computing platforms by Google, LinkedIn, Facebook, and the open source movement's interest in low-cost software, the technology has its roots in an earlier, high commercially successful product: Lotus Notes, first released in 1989 by a small group of developers (led by Ray Ozzie) in Boston.  The product was eventually bought up by IBM and still enjoys significant sales today.  Currently, much of the writing in Wikipedia and elsewhere in the NOSQL movement misses that Lotus Notes DID NOSQL FIRST.  It was a first in many other respects, such as role-based security, use of an optimistic non-locking strategy, provision for replication and off-line use of databases, automatic indexing and rendering of all database contents as HTML, and it has fantastic administration and support tools.   
 
The article cited for the history buries Lotus Notes in the 1980's with a 2-line blurb.  This is a glaring historical inaccuracy.  The open source community is possibly overlooking Lotus Notes because it is not free (in fact, just like Oracle or SQL Server, one has to pay a fair amount for it).  Lotus Notes achieves its "distributed" claim to fame not with cloud computing, but with fast replication to local disks, along with fast re-merging of local replicates laterThe product deserves prominent mention as a non-relational, distributed database architecture THAT ACTUALLY WORKS (but which is not "open source").  The advent of today's cloud technologies such as Google's MapReduce platform has, of course, brought things to a new level in terms of scalability. How could any younger person who hasn't had a chance to work with Lotus Notes possibly know its technology, given that it is proprietary?  I'm mentioning this, though, because I do know about LN and feel that this article, to be really accurate and fair, should consider addressing Lotus Notes more thoroughly in its written history of NOSQL. 
 
IMO, there are other NOSQL movement "history" issues as well.  Wikipedia charmingly describes Berkley DB as created by "old school" (hello?).  DaaS (Database as a Service) is not particularly new--only the use of the phrase is; web hosting services have been offering hosted relational databases and Lotus Notes databases "for hire" well before the phrase came into common use.  I fear I am privileged to notice this kind of thing simply by being very experienced, which really means just getting old (big sigh here).[[User:Pat Palmer|Pat Palmer]] 20:25, 19 August 2010 (UTC)
 
== More progress ==
 
Tom, I appreciate your additions. While I made some edits, they were nonsubstantive in a way that would not preclude my nominating this for Approval. Of course, if Pat and one other Computers editor were to co-nominate, we could make substantive edits.
 
After making the [[ACID properties]] link non-red, I commented out the definition to which one can link. (An aside -- reservation systems are a better example than surgery. Maybe not a facelift, but there are quite a few situations where there are multiple simultaneous surgical procedures, variously one to support another (e.g., harvesting graft material), or because there are simultaneous traumas that all have to be fixed or the patient will die).
 
We should coordinate (including graphics) with  the existing  [[linked list]] article. Your graphic of a singly-linked list doesn't suggest the horrible overhead of having to go back to the head and do a linear search, although inserting into a doubly-linked list isn't the cleanest thing in the world. --[[User:Howard C. Berkowitz|Howard C. Berkowitz]] 19:05, 17 September 2010 (UTC)
 
:Howard I know very little about everything, especially NoSQL, and I'm only trying to learn stuff, please fix things to your heart's content, and chop or amplify or change as you see fit. Even though this article is getting longer, it can get much better and I trust that persons such as yourself and Pat are highly competent to take it to infinity and beyond. What's cool is how when you get on to something, it improves and improves and improves and this article needs heavy hitters such as yourself. I'll only stand by at this point and perhaps make suggestions on the talk pages from now on.--[[User:Thomas Wright Sulcer|Thomas Wright Sulcer]] 22:00, 17 September 2010 (UTC)
 
::Howard, I will not have time to work on this for approval anytime soon, so from a strictly personal standpoint, I'd prefer to leave it open for improvement for the time being.  I am still exploring this topic myself.[[User:Pat Palmer|Pat Palmer]] 05:43, 22 September 2010 (UTC)

Latest revision as of 23:43, 21 September 2010

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
To learn how to update the categories for this article, see here. To update categories, edit the metadata template.
 Definition A number of non-relational distributed database architectures, usually that store data as key-value pairs. [d] [e]
Checklist and Archives
 Workgroup category Computers [Categories OK]
 Talk Archive none  English language variant American English

Just a question on an interesting topic

Only a personal interest, but to what extent do these use XML, or are the key-value relationships implemented more as type-length-value or something even simpler? --Howard C. Berkowitz 14:54, 28 July 2010 (UTC)

Useful article?

You may want to use ACID properties. Howard C. Berkowitz 02:29, 8 August 2010 (UTC)

Pat's review of this article

This is a great beginning. The article as it stands today is written at a high level of expertise--I learned a lot from reading it--and I appreciate the DOI in the reference list, plus the spare, lean and precise technical language employed in many sections. Some comments, questions, and ideas for additional development are detailed in the following subsections:

intended readers

This article is the product of one or more writer who appear to have a high level of technical expertise with databases in general, and who writes with admirable clarity and conciseness about technical matters. The article seems to assume that readers are already familiar with the capabilities of conventional relational database management systems, and also with map-reduce algorithms. While the technology being described is sufficiently complicated that much of the article probably may necessarily be beyond the scope of what an expert lay reader might understand, IMO it is still important to strive for satisfying both intelligent lay experts and deep subject experts. Perhaps there might be added some kind of statement about this in the overview at the beginning. It might also be useful to provide some history and market descriptive material near the top, before launching heavily into the tech speak.Pat Palmer 20:09, 19 August 2010 (UTC)

P. S., While I was able to follow most of the article, I am forced to admit that I am not sure what is meant by "ad hoc query". I could just Google it. It would also be great if the article briefly defined it (and other terms) on first use, or else pointed off to another article that contains a definition.Pat Palmer 20:38, 19 August 2010 (UTC)

intro

The nice intro would possibly be even better by providing a quick summary of when, where, why, who, etc. See next comment below about the history of this technology. Also, the surge of interest in the open source community as a result of recently create cloud computing technologies would fit will into the opening overview. I would consider moving the very last section up into the end of the introductory overview.Pat Palmer 20:25, 19 August 2010 (UTC)

code snippets

It is important that the two code snippets identify the language that they are written in. I can pretty much surmise that one is Java, and the other C (or C++), but it needs to be made explicit somewhere.Pat Palmer 20:36, 19 August 2010 (UTC)

semi-structured data

Wow, use of JSON to identify rather free-form data. After just reading the Semantic Web article, I am very curious to know whether there is (or could be some day) a relationship between semi-structured data and Semantic Web technologies. Or is there already, and I am just to new to these ideas to get it?Pat Palmer 20:41, 19 August 2010 (UTC)

data held redundantly on multiple servers

Although there is a diagram on distributed hash tables included in the article, its notes are in Japanese. I am sorely missing the elegant explanation given during the class presentation on how this works, and I urge that this explanation be added to the article, since understanding it makes the cloud computer implementations less mystifying.Pat Palmer 21:05, 19 August 2010 (UTC)

references

I noted the following reference (whose link is broken) on the Wikipedia NOSQL article. I believe it would provide useful background for this article: Agrawal, R., et al. (2009). The Claremont Report on Database Research. Association for Computing Machinery. Communications of the ACM, 52(6), 56. Retrieved August 19, 2010, from ABI/INFORM Global. (Document ID: 1753261191). The abstract for it is:

"A group of database researchers, architects, users, and pundits met in May 2008 at the Claremont Resort in Berkeley, CA, to discuss the state of database it search and its effects on practice. This was the seventh meeting of this sort over the past 20 years and was distinguished by a broad consensus that the database community is at a turning point in its history, due to both an explosion of data and usage scenarios and major shifts in computing hardware and platforms. This article explores the conclusions of this self-assessment. The theme of the Claremont meeting was that database research and the data-management industry are at a turning point, with unusually rich opportunities for technical advances, intellectual achievement, entrepreneurship, and benefits for science and society. Given the large number of opportunities, it is important for the database research community to address issues that maximize relevance within the field, across computing, and in external fields as well."Pat Palmer 21:16, 19 August 2010 (UTC)

history of NOSQL movement (a rant)

I noted with interest (and disappointment, though not surprise) that Lotus Notes is not currently shown in the list of "popular document-based databases". While the recent surge of NOSQL development is a product of the emergence of cloud computing platforms by Google, LinkedIn, Facebook, and the open source movement's interest in low-cost software, the technology has its roots in an earlier, high commercially successful product: Lotus Notes, first released in 1989 by a small group of developers (led by Ray Ozzie) in Boston. The product was eventually bought up by IBM and still enjoys significant sales today. Currently, much of the writing in Wikipedia and elsewhere in the NOSQL movement misses that Lotus Notes DID NOSQL FIRST. It was a first in many other respects, such as role-based security, use of an optimistic non-locking strategy, provision for replication and off-line use of databases, automatic indexing and rendering of all database contents as HTML, and it has fantastic administration and support tools.

The article cited for the history buries Lotus Notes in the 1980's with a 2-line blurb. This is a glaring historical inaccuracy. The open source community is possibly overlooking Lotus Notes because it is not free (in fact, just like Oracle or SQL Server, one has to pay a fair amount for it). Lotus Notes achieves its "distributed" claim to fame not with cloud computing, but with fast replication to local disks, along with fast re-merging of local replicates later. The product deserves prominent mention as a non-relational, distributed database architecture THAT ACTUALLY WORKS (but which is not "open source"). The advent of today's cloud technologies such as Google's MapReduce platform has, of course, brought things to a new level in terms of scalability. How could any younger person who hasn't had a chance to work with Lotus Notes possibly know its technology, given that it is proprietary? I'm mentioning this, though, because I do know about LN and feel that this article, to be really accurate and fair, should consider addressing Lotus Notes more thoroughly in its written history of NOSQL.

IMO, there are other NOSQL movement "history" issues as well. Wikipedia charmingly describes Berkley DB as created by "old school" (hello?). DaaS (Database as a Service) is not particularly new--only the use of the phrase is; web hosting services have been offering hosted relational databases and Lotus Notes databases "for hire" well before the phrase came into common use. I fear I am privileged to notice this kind of thing simply by being very experienced, which really means just getting old (big sigh here).Pat Palmer 20:25, 19 August 2010 (UTC)

More progress

Tom, I appreciate your additions. While I made some edits, they were nonsubstantive in a way that would not preclude my nominating this for Approval. Of course, if Pat and one other Computers editor were to co-nominate, we could make substantive edits.

After making the ACID properties link non-red, I commented out the definition to which one can link. (An aside -- reservation systems are a better example than surgery. Maybe not a facelift, but there are quite a few situations where there are multiple simultaneous surgical procedures, variously one to support another (e.g., harvesting graft material), or because there are simultaneous traumas that all have to be fixed or the patient will die).

We should coordinate (including graphics) with the existing linked list article. Your graphic of a singly-linked list doesn't suggest the horrible overhead of having to go back to the head and do a linear search, although inserting into a doubly-linked list isn't the cleanest thing in the world. --Howard C. Berkowitz 19:05, 17 September 2010 (UTC)

Howard I know very little about everything, especially NoSQL, and I'm only trying to learn stuff, please fix things to your heart's content, and chop or amplify or change as you see fit. Even though this article is getting longer, it can get much better and I trust that persons such as yourself and Pat are highly competent to take it to infinity and beyond. What's cool is how when you get on to something, it improves and improves and improves and this article needs heavy hitters such as yourself. I'll only stand by at this point and perhaps make suggestions on the talk pages from now on.--Thomas Wright Sulcer 22:00, 17 September 2010 (UTC)
Howard, I will not have time to work on this for approval anytime soon, so from a strictly personal standpoint, I'd prefer to leave it open for improvement for the time being. I am still exploring this topic myself.Pat Palmer 05:43, 22 September 2010 (UTC)