Hierarchical clustering: Difference between revisions
imported>Daniel Mietchen (slight expansion) |
imported>Daniel Mietchen (alternative term) |
||
Line 1: | Line 1: | ||
{{subpages}} | {{subpages}} | ||
'''Hierarchical clustering''' is a branch of [[cluster analysis]] which treats clusters hierarchically, i.e. as a set of levels. The construction of the hierarchy can be performed using two major approaches, or combinations thereof: In agglomerative hierarchical clustering (a [[bottom-up]] approach), existing clusters are merged [[iteration|iteratively]], while divisive hierarchical clustering (a [[top-down]] approach) starts out with all data in one cluster that is then split iteratively. At each step of the process, a mathematical measure of [[distance]] or [[similarity]] between (agglomerative) or within clusters (divisive) is being computed to determine how to split or merge. Several different distance and similarity measures can be used, which generally result in different hierarchies (especially for agglomerative ones which start out based on local information only), thus complicating their interpretation. Nonetheless, hierarchical clustering is more intuitively understandable than [[flat clustering]], and so it enjoys considerable popularity for multivariate analysis of data, e.g. of [[gene]] or [[protein]] [[sequence]]s. | '''Hierarchical clustering''' (also known as [[numerical]] [[taxonomy]]) is a branch of [[cluster analysis]] which treats clusters hierarchically, i.e. as a set of levels. The construction of the hierarchy can be performed using two major approaches, or combinations thereof: In agglomerative hierarchical clustering (a [[bottom-up]] approach), existing clusters are merged [[iteration|iteratively]], while divisive hierarchical clustering (a [[top-down]] approach) starts out with all data in one cluster that is then split iteratively. At each step of the process, a mathematical measure of [[distance]] or [[similarity]] between (agglomerative) or within clusters (divisive) is being computed to determine how to split or merge. Several different distance and similarity measures can be used, which generally result in different hierarchies (especially for agglomerative ones which start out based on local information only), thus complicating their interpretation. Nonetheless, hierarchical clustering is more intuitively understandable than [[flat clustering]], and so it enjoys considerable popularity for multivariate analysis of data, e.g. of [[gene]] or [[protein]] [[sequence]]s. |
Revision as of 06:33, 13 November 2009
Hierarchical clustering (also known as numerical taxonomy) is a branch of cluster analysis which treats clusters hierarchically, i.e. as a set of levels. The construction of the hierarchy can be performed using two major approaches, or combinations thereof: In agglomerative hierarchical clustering (a bottom-up approach), existing clusters are merged iteratively, while divisive hierarchical clustering (a top-down approach) starts out with all data in one cluster that is then split iteratively. At each step of the process, a mathematical measure of distance or similarity between (agglomerative) or within clusters (divisive) is being computed to determine how to split or merge. Several different distance and similarity measures can be used, which generally result in different hierarchies (especially for agglomerative ones which start out based on local information only), thus complicating their interpretation. Nonetheless, hierarchical clustering is more intuitively understandable than flat clustering, and so it enjoys considerable popularity for multivariate analysis of data, e.g. of gene or protein sequences.