Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm. / Grotkjær, Thomas; Winther, Ole; Regenberg, Birgitte; Nielsen, Jens; Hansen, Lars Kai.

In: Bioinformatics, Vol. 22, No. 1, 2006, p. 58-67.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Grotkjær, T, Winther, O, Regenberg, B, Nielsen, J & Hansen, LK 2006, 'Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm', Bioinformatics, vol. 22, no. 1, pp. 58-67. https://doi.org/10.1093/bioinformatics/bti746

APA

Grotkjær, T., Winther, O., Regenberg, B., Nielsen, J., & Hansen, L. K. (2006). Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm. Bioinformatics, 22(1), 58-67. https://doi.org/10.1093/bioinformatics/bti746

Vancouver

Grotkjær T, Winther O, Regenberg B, Nielsen J, Hansen LK. Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm. Bioinformatics. 2006;22(1):58-67. https://doi.org/10.1093/bioinformatics/bti746

Author

Grotkjær, Thomas ; Winther, Ole ; Regenberg, Birgitte ; Nielsen, Jens ; Hansen, Lars Kai. / Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm. In: Bioinformatics. 2006 ; Vol. 22, No. 1. pp. 58-67.

Bibtex

@article{c59a050231dc43b6a4fb1565ef0ef1dd,
title = "Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm",
abstract = "Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualization and interpretation of the results. Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data.",
author = "Thomas Grotkj{\ae}r and Ole Winther and Birgitte Regenberg and Jens Nielsen and Hansen, {Lars Kai}",
year = "2006",
doi = "10.1093/bioinformatics/bti746",
language = "English",
volume = "22",
pages = "58--67",
journal = "Bioinformatics (Online)",
issn = "1367-4811",
publisher = "Oxford University Press",
number = "1",

}

RIS

TY - JOUR

T1 - Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm

AU - Grotkjær, Thomas

AU - Winther, Ole

AU - Regenberg, Birgitte

AU - Nielsen, Jens

AU - Hansen, Lars Kai

PY - 2006

Y1 - 2006

N2 - Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualization and interpretation of the results. Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data.

AB - Motivation: Hierarchical and relocation clustering (e.g. K-means and self-organizing maps) have been successful tools in the display and analysis of whole genome DNA microarray expression data. However, the results of hierarchical clustering are sensitive to outliers, and most relocation methods give results which are dependent on the initialization of the algorithm. Therefore, it is difficult to assess the significance of the results. We have developed a consensus clustering algorithm, where the final result is averaged over multiple clustering runs, giving a robust and reproducible clustering, capable of capturing small signal variations. The algorithm preserves valuable properties of hierarchical clustering, which is useful for visualization and interpretation of the results. Results: We show for the first time that one can take advantage of multiple clustering runs in DNA microarray analysis by collecting re-occurring clustering patterns in a co-occurrence matrix. The results show that consensus clustering obtained from clustering multiple times with Variational Bayes Mixtures of Gaussians or K-means significantly reduces the classification error rate for a simulated dataset. The method is flexible and it is possible to find consensus clusters from different clustering algorithms. Thus, the algorithm can be used as a framework to test in a quantitative manner the homogeneity of different clustering algorithms. We compare the method with a number of state-of-the-art clustering methods. It is shown that the method is robust and gives low classification error rates for a realistic, simulated dataset. The algorithm is also demonstrated for real datasets. It is shown that more biological meaningful transcriptional patterns can be found without conservative statistical or fold-change exclusion of data.

U2 - 10.1093/bioinformatics/bti746

DO - 10.1093/bioinformatics/bti746

M3 - Journal article

C2 - 16257984

AN - SCOPUS:30344442460

VL - 22

SP - 58

EP - 67

JO - Bioinformatics (Online)

JF - Bioinformatics (Online)

SN - 1367-4811

IS - 1

ER -

ID: 239905037