ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data. / Wang, Yucheng; Korneliussen, Thorfinn Sand; Holman, Luke E.; Manica, Andrea; Pedersen, Mikkel Winther.

In: Methods in Ecology and Evolution, Vol. 13, No. 12, 2022, p. 2699-2708.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Wang, Y, Korneliussen, TS, Holman, LE, Manica, A & Pedersen, MW 2022, 'ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data', Methods in Ecology and Evolution, vol. 13, no. 12, pp. 2699-2708. https://doi.org/10.1111/2041-210X.14006

APA

Wang, Y., Korneliussen, T. S., Holman, L. E., Manica, A., & Pedersen, M. W. (2022). ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data. Methods in Ecology and Evolution, 13(12), 2699-2708. https://doi.org/10.1111/2041-210X.14006

Vancouver

Wang Y, Korneliussen TS, Holman LE, Manica A, Pedersen MW. ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data. Methods in Ecology and Evolution. 2022;13(12):2699-2708. https://doi.org/10.1111/2041-210X.14006

Author

Wang, Yucheng ; Korneliussen, Thorfinn Sand ; Holman, Luke E. ; Manica, Andrea ; Pedersen, Mikkel Winther. / ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data. In: Methods in Ecology and Evolution. 2022 ; Vol. 13, No. 12. pp. 2699-2708.

Bibtex

@article{8ef8e9009a2141a5b4904bd50972c103,
title = "ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data",
abstract = "Metagenomic data generated from environmental samples is increasingly common in the analysis of modern and ancient biological communities. To obtain taxonomic profiles from this type of data, DNA sequences are aligned against large genomic reference databases and the lowest common ancestor (LCA) needs to be inferred for each sequence with multiple alignments. To date, efforts have mainly focused on improving the speed, sensitivity and specificity of alignment tools, and little effort has been applied to the LCA algorithm that generates the taxonomic profiles from alignments. We present ngsLCA, a command-line toolkit with two separate modules: the main program (in C/C++) performing LCA inference, and an R package for generating tables and visualisations of the taxonomic profiles. ngsLCA processed large datasets in BAM/SAM alignment format 4–11 times faster and used less memory compared to other available programs. It is compatible with the NCBI taxonomy and has flexible parameter settings. Furthermore, the toolkit offers functions for filtering, contamination removal, taxonomic clustering, and multiple ways of visualising the generated taxonomic profiles. ngsLCA bridges a gap in current metagenomic analyses by supplying a computationally light, easy-to-use, accurate, fast and flexible LCA algorithm with R functions for processing and illustrating the taxonomic profiles.",
keywords = "environmental DNA (eDNA), lowest common ancestor (LCA), metagenomics, next-generation sequencing, sedimentary ancient DNA (sedaDNA), shotgun sequencing, taxonomic profiling, toolkit",
author = "Yucheng Wang and Korneliussen, {Thorfinn Sand} and Holman, {Luke E.} and Andrea Manica and Pedersen, {Mikkel Winther}",
note = "Publisher Copyright: {\textcopyright} 2022 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society.",
year = "2022",
doi = "10.1111/2041-210X.14006",
language = "English",
volume = "13",
pages = "2699--2708",
journal = "Methods in Ecology and Evolution",
issn = "2041-210X",
publisher = "Wiley-Blackwell",
number = "12",

}

RIS

TY - JOUR

T1 - ngsLCA — A toolkit for fast and flexible lowest common ancestor inference and taxonomic profiling of metagenomic data

AU - Wang, Yucheng

AU - Korneliussen, Thorfinn Sand

AU - Holman, Luke E.

AU - Manica, Andrea

AU - Pedersen, Mikkel Winther

N1 - Publisher Copyright: © 2022 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society.

PY - 2022

Y1 - 2022

N2 - Metagenomic data generated from environmental samples is increasingly common in the analysis of modern and ancient biological communities. To obtain taxonomic profiles from this type of data, DNA sequences are aligned against large genomic reference databases and the lowest common ancestor (LCA) needs to be inferred for each sequence with multiple alignments. To date, efforts have mainly focused on improving the speed, sensitivity and specificity of alignment tools, and little effort has been applied to the LCA algorithm that generates the taxonomic profiles from alignments. We present ngsLCA, a command-line toolkit with two separate modules: the main program (in C/C++) performing LCA inference, and an R package for generating tables and visualisations of the taxonomic profiles. ngsLCA processed large datasets in BAM/SAM alignment format 4–11 times faster and used less memory compared to other available programs. It is compatible with the NCBI taxonomy and has flexible parameter settings. Furthermore, the toolkit offers functions for filtering, contamination removal, taxonomic clustering, and multiple ways of visualising the generated taxonomic profiles. ngsLCA bridges a gap in current metagenomic analyses by supplying a computationally light, easy-to-use, accurate, fast and flexible LCA algorithm with R functions for processing and illustrating the taxonomic profiles.

AB - Metagenomic data generated from environmental samples is increasingly common in the analysis of modern and ancient biological communities. To obtain taxonomic profiles from this type of data, DNA sequences are aligned against large genomic reference databases and the lowest common ancestor (LCA) needs to be inferred for each sequence with multiple alignments. To date, efforts have mainly focused on improving the speed, sensitivity and specificity of alignment tools, and little effort has been applied to the LCA algorithm that generates the taxonomic profiles from alignments. We present ngsLCA, a command-line toolkit with two separate modules: the main program (in C/C++) performing LCA inference, and an R package for generating tables and visualisations of the taxonomic profiles. ngsLCA processed large datasets in BAM/SAM alignment format 4–11 times faster and used less memory compared to other available programs. It is compatible with the NCBI taxonomy and has flexible parameter settings. Furthermore, the toolkit offers functions for filtering, contamination removal, taxonomic clustering, and multiple ways of visualising the generated taxonomic profiles. ngsLCA bridges a gap in current metagenomic analyses by supplying a computationally light, easy-to-use, accurate, fast and flexible LCA algorithm with R functions for processing and illustrating the taxonomic profiles.

KW - environmental DNA (eDNA)

KW - lowest common ancestor (LCA)

KW - metagenomics

KW - next-generation sequencing

KW - sedimentary ancient DNA (sedaDNA)

KW - shotgun sequencing

KW - taxonomic profiling

KW - toolkit

U2 - 10.1111/2041-210X.14006

DO - 10.1111/2041-210X.14006

M3 - Journal article

AN - SCOPUS:85139980564

VL - 13

SP - 2699

EP - 2708

JO - Methods in Ecology and Evolution

JF - Methods in Ecology and Evolution

SN - 2041-210X

IS - 12

ER -

ID: 323855167