Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. / Grapotte, Mathys; Saraswat, Manu; Bessière, Chloé; Menichelli, Christophe; Ramilowski, Jordan A.; Severin, Jessica; Hayashizaki, Yoshihide; Itoh, Masayoshi; Tagami, Michihira; Murata, Mitsuyoshi; Kojima-Ishiyama, Miki; Noma, Shohei; Noguchi, Shuhei; Kasukawa, Takeya; Hasegawa, Akira; Suzuki, Harukazu; Nishiyori-Sueki, Hiromi; Frith, Martin C.; Abugessaisa, Imad; Aitken, Stuart; Aken, Bronwen L.; Alam, Intikhab; Alam, Tanvir; Alasiri, Rami; Alhendi, Ahmad M. N.; Alinejad-Rokny, Hamid; Alvarez, Mariano J.; Andersson, Robin; Arakawa, Takahiro; Araki, Marito; Arbel, Taly; Bornholdt, Jette; Boyd, Mette; Chen, Yun; Coskun, Mehmet; Dalby, Maria; Ienasescu, Hans; Jørgensen, Mette; Kaczkowski, Bogumil; Kere, Juha; Li, Kang; Lilje, Berit; Nepal, Chirag; Nguyen, Quan Hoang; Nielsen, Lars K.; Rennie, Sarah; Sandelin, Albin; Valen, Eivind; Vitezic, Morana; Vitting-Seerup, Kristoffer; FANTOM Consortium.

In: Nature Communications, Vol. 12, 3297, 2021.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Grapotte, M, Saraswat, M, Bessière, C, Menichelli, C, Ramilowski, JA, Severin, J, Hayashizaki, Y, Itoh, M, Tagami, M, Murata, M, Kojima-Ishiyama, M, Noma, S, Noguchi, S, Kasukawa, T, Hasegawa, A, Suzuki, H, Nishiyori-Sueki, H, Frith, MC, Abugessaisa, I, Aitken, S, Aken, BL, Alam, I, Alam, T, Alasiri, R, Alhendi, AMN, Alinejad-Rokny, H, Alvarez, MJ, Andersson, R, Arakawa, T, Araki, M, Arbel, T, Bornholdt, J, Boyd, M, Chen, Y, Coskun, M, Dalby, M, Ienasescu, H, Jørgensen, M, Kaczkowski, B, Kere, J, Li, K, Lilje, B, Nepal, C, Nguyen, QH, Nielsen, LK, Rennie, S, Sandelin, A, Valen, E, Vitezic, M, Vitting-Seerup, K & FANTOM Consortium 2021, 'Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network', Nature Communications, vol. 12, 3297. https://doi.org/10.1038/s41467-021-23143-7

APA

Grapotte, M., Saraswat, M., Bessière, C., Menichelli, C., Ramilowski, J. A., Severin, J., Hayashizaki, Y., Itoh, M., Tagami, M., Murata, M., Kojima-Ishiyama, M., Noma, S., Noguchi, S., Kasukawa, T., Hasegawa, A., Suzuki, H., Nishiyori-Sueki, H., Frith, M. C., Abugessaisa, I., ... FANTOM Consortium (2021). Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. Nature Communications, 12, [3297]. https://doi.org/10.1038/s41467-021-23143-7

Vancouver

Grapotte M, Saraswat M, Bessière C, Menichelli C, Ramilowski JA, Severin J et al. Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. Nature Communications. 2021;12. 3297. https://doi.org/10.1038/s41467-021-23143-7

Author

Grapotte, Mathys ; Saraswat, Manu ; Bessière, Chloé ; Menichelli, Christophe ; Ramilowski, Jordan A. ; Severin, Jessica ; Hayashizaki, Yoshihide ; Itoh, Masayoshi ; Tagami, Michihira ; Murata, Mitsuyoshi ; Kojima-Ishiyama, Miki ; Noma, Shohei ; Noguchi, Shuhei ; Kasukawa, Takeya ; Hasegawa, Akira ; Suzuki, Harukazu ; Nishiyori-Sueki, Hiromi ; Frith, Martin C. ; Abugessaisa, Imad ; Aitken, Stuart ; Aken, Bronwen L. ; Alam, Intikhab ; Alam, Tanvir ; Alasiri, Rami ; Alhendi, Ahmad M. N. ; Alinejad-Rokny, Hamid ; Alvarez, Mariano J. ; Andersson, Robin ; Arakawa, Takahiro ; Araki, Marito ; Arbel, Taly ; Bornholdt, Jette ; Boyd, Mette ; Chen, Yun ; Coskun, Mehmet ; Dalby, Maria ; Ienasescu, Hans ; Jørgensen, Mette ; Kaczkowski, Bogumil ; Kere, Juha ; Li, Kang ; Lilje, Berit ; Nepal, Chirag ; Nguyen, Quan Hoang ; Nielsen, Lars K. ; Rennie, Sarah ; Sandelin, Albin ; Valen, Eivind ; Vitezic, Morana ; Vitting-Seerup, Kristoffer ; FANTOM Consortium. / Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network. In: Nature Communications. 2021 ; Vol. 12.

Bibtex

@article{d4bf9ac0cb7e4ffc81e817b237ace431,
title = "Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network",
abstract = "Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.",
author = "Mathys Grapotte and Manu Saraswat and Chlo{\'e} Bessi{\`e}re and Christophe Menichelli and Ramilowski, {Jordan A.} and Jessica Severin and Yoshihide Hayashizaki and Masayoshi Itoh and Michihira Tagami and Mitsuyoshi Murata and Miki Kojima-Ishiyama and Shohei Noma and Shuhei Noguchi and Takeya Kasukawa and Akira Hasegawa and Harukazu Suzuki and Hiromi Nishiyori-Sueki and Frith, {Martin C.} and Imad Abugessaisa and Stuart Aitken and Aken, {Bronwen L.} and Intikhab Alam and Tanvir Alam and Rami Alasiri and Alhendi, {Ahmad M. N.} and Hamid Alinejad-Rokny and Alvarez, {Mariano J.} and Robin Andersson and Takahiro Arakawa and Marito Araki and Taly Arbel and Jette Bornholdt and Mette Boyd and Yun Chen and Mehmet Coskun and Maria Dalby and Hans Ienasescu and Mette J{\o}rgensen and Bogumil Kaczkowski and Juha Kere and Kang Li and Berit Lilje and Chirag Nepal and Nguyen, {Quan Hoang} and Nielsen, {Lars K.} and Sarah Rennie and Albin Sandelin and Eivind Valen and Morana Vitezic and Kristoffer Vitting-Seerup and {FANTOM Consortium}",
note = "Author correction: https://www.nature.com/articles/s41467-022-28758-y",
year = "2021",
doi = "10.1038/s41467-021-23143-7",
language = "English",
volume = "12",
journal = "Nature Communications",
issn = "2041-1723",
publisher = "nature publishing group",

}

RIS

TY - JOUR

T1 - Discovery of widespread transcription initiation at microsatellites predictable by sequence-based deep neural network

AU - Grapotte, Mathys

AU - Saraswat, Manu

AU - Bessière, Chloé

AU - Menichelli, Christophe

AU - Ramilowski, Jordan A.

AU - Severin, Jessica

AU - Hayashizaki, Yoshihide

AU - Itoh, Masayoshi

AU - Tagami, Michihira

AU - Murata, Mitsuyoshi

AU - Kojima-Ishiyama, Miki

AU - Noma, Shohei

AU - Noguchi, Shuhei

AU - Kasukawa, Takeya

AU - Hasegawa, Akira

AU - Suzuki, Harukazu

AU - Nishiyori-Sueki, Hiromi

AU - Frith, Martin C.

AU - Abugessaisa, Imad

AU - Aitken, Stuart

AU - Aken, Bronwen L.

AU - Alam, Intikhab

AU - Alam, Tanvir

AU - Alasiri, Rami

AU - Alhendi, Ahmad M. N.

AU - Alinejad-Rokny, Hamid

AU - Alvarez, Mariano J.

AU - Andersson, Robin

AU - Arakawa, Takahiro

AU - Araki, Marito

AU - Arbel, Taly

AU - Bornholdt, Jette

AU - Boyd, Mette

AU - Chen, Yun

AU - Coskun, Mehmet

AU - Dalby, Maria

AU - Ienasescu, Hans

AU - Jørgensen, Mette

AU - Kaczkowski, Bogumil

AU - Kere, Juha

AU - Li, Kang

AU - Lilje, Berit

AU - Nepal, Chirag

AU - Nguyen, Quan Hoang

AU - Nielsen, Lars K.

AU - Rennie, Sarah

AU - Sandelin, Albin

AU - Valen, Eivind

AU - Vitezic, Morana

AU - Vitting-Seerup, Kristoffer

AU - FANTOM Consortium

N1 - Author correction: https://www.nature.com/articles/s41467-022-28758-y

PY - 2021

Y1 - 2021

N2 - Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.

AB - Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.

U2 - 10.1038/s41467-021-23143-7

DO - 10.1038/s41467-021-23143-7

M3 - Journal article

C2 - 34078885

AN - SCOPUS:85107388625

VL - 12

JO - Nature Communications

JF - Nature Communications

SN - 2041-1723

M1 - 3297

ER -

ID: 276159194