Distributed computing: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Markus Baumeister
m (some spell fixes and stylistic changes)
mNo edit summary
 
(15 intermediate revisions by 10 users not shown)
Line 1: Line 1:
[[Category: Big Write]]
{{subpages}}
[[Category:Mathematics Workgroup|Mathematics]]
{{TOC|right}}
[[Category:Computers Workgroup]]
In [[computer science]], '''distributed computing''' is a strategy for improving resource utilization of large numbers of compuers working on a common tasks. Such resource utilization may focus on providing raw processing speed for computationally hard problems, as with parallel processing, or efficient use for surges in workload, as in the new deployment model in [[cloud computing]]


In [[computer science]], '''distributed computation''' is a strategy for improving the speed of highly [[parallel computation|parallelizable]] tasks by distributing pieces of the problem across many [[computers]] that together form a distributed computer.   
For computationally intense [[parallel computation|parallelizable]] tasks, pieces of the problem are distributed across many [[computers]] that together form a distributed computing system.  Unlike [[cluster (Computer Science)|clusters]], the computers in a distributed computer may be connected over large [[computer network|networks]], and may be owned by many people or institutionsSeveral distributed computing systems even consist of home computers that communicate with a central control via the [[internet]] and whose owners decided to donate computing time to the executed project.


Unlike [[cluster (Computer Science)|clusters]], the computers in a distributed computer may be connected over large [[networks]], and may be owned by many people or institutions.  Several distributed computers even consist of home computers that communicate with a central control via the [[internet]] and whose owners decided to donate computing time to the executed project.
Yet another variant is [[federated computing]], where the federation involves different administrations and distributed resources. All federated networks are distributed, but not all distributed networks are federated.


== Network Topology ==
One might think that a federated network is synonymous with an extranet, but this does not need to be the case. Federations can involve distributed authority within the same enterprise.


A distributed computer system generally employs one or more ''master'' computers, and very many ''worker'' computers.  The master computer's role is to break the problem into a series of smaller problems (work units) and to send these to participating workers.  The workers then perform the work and send the results back to the master computer.
The Holy Grail of federation tends to be federated identity management, manifested as [[single sign-on]] (SSO). Unless there is an need to access especially sensitive information, federated identity management frees the user from remembering hard-to-remember passwords.  SSO requires [[strong authentication]].
 
== Network topology ==
 
A distributed computing system generally employs one or more ''master'' computers, and very many ''worker'' computers.  The roles, however, are very different in parallel processing and in cloud computing.
 
In parallel computation, the master computer's role is to break the problem into a series of smaller problems (work units) and to send these to participating workers.  The workers then perform the work and send the results back to the master computer.
 
For cloud computing, the greatest number of computers are usually customer desktops, which act as clients to a much smaller number of physical application servers. It is common practice, however, for the customer processing to be done by [[virtual machine]]s.


== Performance ==
== Performance ==


The performance of a distributed computation system is dependant on three important variables: the time necessary for the master computer to perform administrative or "bookkeeping" tasks, the time necessary for the master and workers to communicate over the network, and the time necessary for the workers to perform their tasks.
The performance of a distributed computing system is dependent on three important variables: the time necessary for the master computer to perform administrative or "bookkeeping" tasks, the time necessary for the master and workers to communicate over the [[computer network|network]], and the time necessary for the workers to perform their tasks.


Administrative tasks in a distributed computation system should not be underestimated.  The master computer must maintain a list of all available worker computers as well as their state (busy, available, etc), must be able to split the entire task into discrete work units, and must be able to combine all finished work units into the final result.  Additionally, to prevent result forging [[computer security|attacks]] the master must attempt to identify incorrect results and reduce their affect on the overall computation.
The overhead involved in administrative tasks in a distributed computing system should not be underestimated.  The master computer must maintain a list of all available worker computers as well as their state (busy, available, etc), must be able to split the entire task into discrete work units, and must be able to combine all finished work units into the final result.  Additionally, to prevent result forging [[computer security|attacks]] the master must attempt to identify incorrect results and reduce their affect on the overall computation.


Communication time is also critical to system performance.  If the round trip communication time between the master and a worker node rivals the computation time, then no benefit is gained from distribution.  Thus, distributed systems generally attempt to batch multiple work loads into a single communication.
Communication time is also critical to system performance.  If the round trip communication time between the master and a worker node rivals the computation time, then no benefit is gained from distribution.  Thus, distributed systems generally attempt to batch multiple work loads into a single communication.


== Parallelizable Tasks ==
== Parallelizable tasks ==


''Main article:'' [[parallel computation]]
''Main article:'' [[parallel computation]]


A parallelizable task is one in which discrete fractions of the task can be computed without information from other tasks.
A parallelizable task is one in which discrete fractions of the task can be computed without information from other tasks.


Here are some easily parallelized tasks, for which distributed computing works efficiently:
Here are some easily parallelized tasks, for which distributed computing works efficiently:


* [[embarrassingly parallel|Embarrassingly parallel]] problems, such as computing the [[mandelbrot set|Mandelbrot set]].
* [[Embarrassingly parallel]] problems, such as computing the [[mandelbrot set|Mandelbrot set]].
* Any [[tree search]] problem, such as:
* Any [[tree search]] problem, such as:
** Many board games, including [[chess]] and [[checkers]],
** Many board games, including [[chess]] and [[checkers]],
** Path finding algorithms, and
** Path finding algorithms, and
** Minimization over many structured spaces (for instance, optimal goloumb rulers or protein folding).
** Minimization over many structured spaces (for instance, optimal goloumb rulers or [[protein folding]]).
* Any problem with [[optimal substructure]], such as sorting a list of numbers.
* Any problem with [[optimal substructure]], such as sorting a list of numbers.


Line 52: Line 59:
=== Scalability ===
=== Scalability ===
{{main|Scalability}}
{{main|Scalability}}
A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. Scalability can be measured in three different dimensions:
A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. On-demand scalability is the most important characteristic of [[cloud computing]]. Scalability can be measured in three different dimensions:


; Load scalability
: A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.  
: A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.  
; Geographic scalability
; Geographic scalability
Line 65: Line 71:
== Drawbacks and disadvantages ==
== Drawbacks and disadvantages ==
{{see also|Fallacies of Distributed Computing}}
{{see also|Fallacies of Distributed Computing}}
If not planned properly, a distributed system can decrease the overall reliability of computations if the unavailability of a node can cause a disruption of the other nodes.  [[Leslie Lamport]] describes this type of distributed system fragility like this: "You know you have one when the crash of a computer you've never heard of stops you from getting any work done."{{Fact|date=January 2007}}<!-- This is widely attributed to him. I can't find a primary source - though absence of proof is not proof of absence. --~~~~ -->
If not planned properly, a distributed system can decrease the overall reliability of computations if the unavailability of a node can cause a disruption of the other nodes.  [[Leslie Lamport]] describes this type of distributed system fragility like this: "You know you have one when the crash of a computer you've never heard of stops you from getting any work done."<!-- This is widely attributed to him. I can't find a primary source - though absence of proof is not proof of absence. -->


Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may now require connecting to remote nodes or inspecting communications being sent between nodes.
Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may now require connecting to remote nodes or inspecting communications being sent between nodes.
Line 72: Line 78:


Additionally,  the performance of a distributed computer does not grow linearly with the number of computational units supplied; a distributed computer does not become twice as fast as the number of computers doubles.  This non-linear speed-up is due to the increased communications and administrative tasks that the master computer must perform.
Additionally,  the performance of a distributed computer does not grow linearly with the number of computational units supplied; a distributed computer does not become twice as fast as the number of computers doubles.  This non-linear speed-up is due to the increased communications and administrative tasks that the master computer must perform.
Additionally, distributed computing systems are open to potential attack by hostile nodes.  A node returning fraudulent data would be difficult to [[error detection|detect]] without [[redundancy]].  Additionally, many distributed systems only allow one or two nodes to work on the same work unit simultaneously.  A node could download a work unit, and then not complete it, which would deny that work unit to other nodes.  Many distributed computing systems have [[deadline]]s for work units to minimize this risk.


== Famous Examples ==
== Famous Examples ==
 
===Computation-intense===
* [[SETI@Home]] is perhaps the most famous example of distributed computation. It is comprised of more than one million computers, of varying [[computer architecture|architectures]] and [[operating systems|platforms]], and is dedicated to computing [[fourier transform|Fourier Transforms]] on data received from [[radio telescope|radio telescopes]].
*  [[Berkeley Open Infrastructure for Network Computing|BOINC]] is a project by the [[University of California, Berkeley]] that uses a common set of client software to host many different distributed computing [http://boinc.berkeley.edu/projects.php projects].
* [[FOLDING@HOME]] is the second best known example of distributed collaborative computing. In this project the objective is to calculate the acttual real time folding of proteins or enzymes to obtain their biological active shape. Defects in this mechanism - as studies indicated - lead to deseases like Alzheimer and BSE (mad cow desease).
* [[SETI@Home]] is perhaps the most famous example of distributed computation. It is comprised of more than one million computers, of varying [[computer architecture|architectures]] and [[operating systems|platforms]], and is dedicated to computing [[fourier transform|Fourier Transforms]] on data received from [[radio telescope|radio telescopes]].  SETI@Home is sponsored by [[University of California, Berkeley|Berkeley's]] Space Sciences Laboratory.  It is the predecessor to, and is currently a project on, BOINC.
* [[FOLDING@HOME]] is the second best known example of distributed collaborative computing. In this project the objective is to calculate the actual real time folding of proteins or enzymes to obtain their biological active shape. Defects in this mechanism - as studies indicated - lead to diseases like Alzheimer and BSE (mad cow disease).
* [[Beowulf cluster|Beowulf clusters]], also known as "cheap supercomputers," use distributed computation over a [[Computer_network#Spanned_area|local area network]] using "Commercial Off the Shelf" (COTS) hardware
===Cloud computing===
*[[Google Apps]]
*[[SalesForce.com]][[Category:Suggestion Bot Tag]]

Latest revision as of 16:01, 7 August 2024

This article is developing and not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

In computer science, distributed computing is a strategy for improving resource utilization of large numbers of compuers working on a common tasks. Such resource utilization may focus on providing raw processing speed for computationally hard problems, as with parallel processing, or efficient use for surges in workload, as in the new deployment model in cloud computing

For computationally intense parallelizable tasks, pieces of the problem are distributed across many computers that together form a distributed computing system. Unlike clusters, the computers in a distributed computer may be connected over large networks, and may be owned by many people or institutions. Several distributed computing systems even consist of home computers that communicate with a central control via the internet and whose owners decided to donate computing time to the executed project.

Yet another variant is federated computing, where the federation involves different administrations and distributed resources. All federated networks are distributed, but not all distributed networks are federated.

One might think that a federated network is synonymous with an extranet, but this does not need to be the case. Federations can involve distributed authority within the same enterprise.

The Holy Grail of federation tends to be federated identity management, manifested as single sign-on (SSO). Unless there is an need to access especially sensitive information, federated identity management frees the user from remembering hard-to-remember passwords. SSO requires strong authentication.

Network topology

A distributed computing system generally employs one or more master computers, and very many worker computers. The roles, however, are very different in parallel processing and in cloud computing.

In parallel computation, the master computer's role is to break the problem into a series of smaller problems (work units) and to send these to participating workers. The workers then perform the work and send the results back to the master computer.

For cloud computing, the greatest number of computers are usually customer desktops, which act as clients to a much smaller number of physical application servers. It is common practice, however, for the customer processing to be done by virtual machines.

Performance

The performance of a distributed computing system is dependent on three important variables: the time necessary for the master computer to perform administrative or "bookkeeping" tasks, the time necessary for the master and workers to communicate over the network, and the time necessary for the workers to perform their tasks.

The overhead involved in administrative tasks in a distributed computing system should not be underestimated. The master computer must maintain a list of all available worker computers as well as their state (busy, available, etc), must be able to split the entire task into discrete work units, and must be able to combine all finished work units into the final result. Additionally, to prevent result forging attacks the master must attempt to identify incorrect results and reduce their affect on the overall computation.

Communication time is also critical to system performance. If the round trip communication time between the master and a worker node rivals the computation time, then no benefit is gained from distribution. Thus, distributed systems generally attempt to batch multiple work loads into a single communication.

Parallelizable tasks

Main article: parallel computation

A parallelizable task is one in which discrete fractions of the task can be computed without information from other tasks.

Here are some easily parallelized tasks, for which distributed computing works efficiently:

Goals and advantages

There are many different types of distributed computing systems and many challenges to overcome in successfully designing one. The main goal of a distributed computing system is to connect users and resources in a transparent, open, and scalable way. Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems. Because projects are divided into smaller units the necessity to use huge super computers is bypassed and all the loads can be executed either in a GRID network or on participating home computers during idle time of the computers. This leads to a drastic reduction in costs involved and only depends upon a certain number or participating nodes.

Openness

Openness is the property of distributed systems such that each subsystem is continually open to interaction with other systems (see references). Web Services protocols are standards which enable distributed systems to be extended and scaled. In general, an open system that scales has an advantage over a perfectly closed and self-contained system.

Consequently, open distributed systems are required to meet the following challenges:

Monotonicity
Once something is published in an open system, it cannot be taken back.
Pluralism
Different subsystems of an open distributed system include heterogeneous, overlapping and possibly conflicting information. There is no central arbiter of truth in open distributed systems.
Unbounded nondeterminism
Asynchronously, different subsystems can come up and go down and communication links can come in and go out between subsystems of an open distributed system. Therefore the time that it will take to complete an operation cannot be bounded in advance (see unbounded nondeterminism).

Scalability

For more information, see: Scalability.

A scalable system is one that can easily be altered to accommodate changes in the number of users, resources and computing entities affected to it. On-demand scalability is the most important characteristic of cloud computing. Scalability can be measured in three different dimensions:

A distributed system should make it easy for us to expand and contract its resource pool to accommodate heavier or lighter loads.
Geographic scalability
A geographically scalable system is one that maintains its usefulness and usability, regardless of how far apart its users or resources are.
Administrative scalability
No matter how many different organizations need to share a single distributed system, it should still be easy to use and manage.

Some loss of performance may occur in a system that allows itself to scale in one or more of these dimensions. There is a limit up to which we can scale/add processors to the system, and above that the performance of the system degrades.

Drawbacks and disadvantages

See also: Fallacies of Distributed Computing

If not planned properly, a distributed system can decrease the overall reliability of computations if the unavailability of a node can cause a disruption of the other nodes. Leslie Lamport describes this type of distributed system fragility like this: "You know you have one when the crash of a computer you've never heard of stops you from getting any work done."

Troubleshooting and diagnosing problems in a distributed system can also become more difficult, because the analysis may now require connecting to remote nodes or inspecting communications being sent between nodes.

Not many types of computation are well-suited for distributed environments, due typically to the amount of network communication or synchronization that would be required between nodes. If bandwidth, latency, or communication requirements are too significant, then the benefits of distributed computing may be negated and the performance may be worse than a non-distributed environment.

Additionally, the performance of a distributed computer does not grow linearly with the number of computational units supplied; a distributed computer does not become twice as fast as the number of computers doubles. This non-linear speed-up is due to the increased communications and administrative tasks that the master computer must perform.

Additionally, distributed computing systems are open to potential attack by hostile nodes. A node returning fraudulent data would be difficult to detect without redundancy. Additionally, many distributed systems only allow one or two nodes to work on the same work unit simultaneously. A node could download a work unit, and then not complete it, which would deny that work unit to other nodes. Many distributed computing systems have deadlines for work units to minimize this risk.

Famous Examples

Computation-intense

  • BOINC is a project by the University of California, Berkeley that uses a common set of client software to host many different distributed computing projects.
  • SETI@Home is perhaps the most famous example of distributed computation. It is comprised of more than one million computers, of varying architectures and platforms, and is dedicated to computing Fourier Transforms on data received from radio telescopes. SETI@Home is sponsored by Berkeley's Space Sciences Laboratory. It is the predecessor to, and is currently a project on, BOINC.
  • FOLDING@HOME is the second best known example of distributed collaborative computing. In this project the objective is to calculate the actual real time folding of proteins or enzymes to obtain their biological active shape. Defects in this mechanism - as studies indicated - lead to diseases like Alzheimer and BSE (mad cow disease).
  • Beowulf clusters, also known as "cheap supercomputers," use distributed computation over a local area network using "Commercial Off the Shelf" (COTS) hardware

Cloud computing