BayesMD: Flexible Biological Modeling for Motif Discovery

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

BayesMD: Flexible Biological Modeling for Motif Discovery. / Tang, Man-Hung Eric; Krogh, Anders; Winther, Ole.

In: Journal of Computational Biology, Vol. 15, No. 10, 2008, p. 1347-63.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Tang, M-HE, Krogh, A & Winther, O 2008, 'BayesMD: Flexible Biological Modeling for Motif Discovery', Journal of Computational Biology, vol. 15, no. 10, pp. 1347-63. https://doi.org/10.1089/cmb.2007.0176

APA

Tang, M-H. E., Krogh, A., & Winther, O. (2008). BayesMD: Flexible Biological Modeling for Motif Discovery. Journal of Computational Biology, 15(10), 1347-63. https://doi.org/10.1089/cmb.2007.0176

Vancouver

Tang M-HE, Krogh A, Winther O. BayesMD: Flexible Biological Modeling for Motif Discovery. Journal of Computational Biology. 2008;15(10):1347-63. https://doi.org/10.1089/cmb.2007.0176

Author

Tang, Man-Hung Eric ; Krogh, Anders ; Winther, Ole. / BayesMD: Flexible Biological Modeling for Motif Discovery. In: Journal of Computational Biology. 2008 ; Vol. 15, No. 10. pp. 1347-63.

Bibtex

@article{535b4000c54411dd9473000ea68e967b,
title = "BayesMD: Flexible Biological Modeling for Motif Discovery",
abstract = "We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior over the position of binding sites. This prior represents information complementary to the motif and background priors coming from conservation, local sequence complexity, nucleosome occupancy, etc. and assumptions about the number of occurrences. The Bayesian inference is carried out using a combination of exact marginalization (multinomial parameters) and sampling (over the position of sites). Robust sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.",
author = "Tang, {Man-Hung Eric} and Anders Krogh and Ole Winther",
note = "KEYWORDS: computational molecular biology, gene expression, machine learning, Markov chains, Monte Carlo likelihood, recognition of genes and regulatory elements, sequence analysis, stochastic processes",
year = "2008",
doi = "10.1089/cmb.2007.0176",
language = "English",
volume = "15",
pages = "1347--63",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert, Inc. Publishers",
number = "10",

}

RIS

TY - JOUR

T1 - BayesMD: Flexible Biological Modeling for Motif Discovery

AU - Tang, Man-Hung Eric

AU - Krogh, Anders

AU - Winther, Ole

N1 - KEYWORDS: computational molecular biology, gene expression, machine learning, Markov chains, Monte Carlo likelihood, recognition of genes and regulatory elements, sequence analysis, stochastic processes

PY - 2008

Y1 - 2008

N2 - We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior over the position of binding sites. This prior represents information complementary to the motif and background priors coming from conservation, local sequence complexity, nucleosome occupancy, etc. and assumptions about the number of occurrences. The Bayesian inference is carried out using a combination of exact marginalization (multinomial parameters) and sampling (over the position of sites). Robust sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.

AB - We present BayesMD, a Bayesian Motif Discovery model with several new features. Three different types of biological a priori knowledge are built into the framework in a modular fashion. A mixture of Dirichlets is used as prior over nucleotide probabilities in binding sites. It is trained on transcription factor (TF) databases in order to extract the typical properties of TF binding sites. In a similar fashion we train organism-specific priors for the background sequences. Lastly, we use a prior over the position of binding sites. This prior represents information complementary to the motif and background priors coming from conservation, local sequence complexity, nucleosome occupancy, etc. and assumptions about the number of occurrences. The Bayesian inference is carried out using a combination of exact marginalization (multinomial parameters) and sampling (over the position of sites). Robust sampling results are achieved using the advanced sampling method parallel tempering. In a post-analysis step candidate motifs with high marginal probability are found by searching among those motifs that contain sites that occur frequently. Thereby, maximum a posteriori inference for the motifs is avoided and the marginal probabilities can be used directly to assess the significance of the findings. The framework is benchmarked against other methods on a number of real and artificial data sets. The accompanying prediction server, documentation, software, models and data are available from http://bayesmd.binf.ku.dk/.

U2 - 10.1089/cmb.2007.0176

DO - 10.1089/cmb.2007.0176

M3 - Journal article

C2 - 19040368

VL - 15

SP - 1347

EP - 1363

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 10

ER -

ID: 8947723