Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes. / Have, Christian Theil; Appel, Emil Vincent Rosenbaum; Grarup, Niels; Hansen, Torben; Bork-Jensen, Jette.

In: International Journal of Bioscience, Biochemistry and Bioinformatics, Vol. 4, No. 5, 370, 2014, p. 355-360.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Have, CT, Appel, EVR, Grarup, N, Hansen, T & Bork-Jensen, J 2014, 'Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes', International Journal of Bioscience, Biochemistry and Bioinformatics, vol. 4, no. 5, 370, pp. 355-360. https://doi.org/10.7763/IJBBB.2014.V4.370

APA

Have, C. T., Appel, E. V. R., Grarup, N., Hansen, T., & Bork-Jensen, J. (2014). Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes. International Journal of Bioscience, Biochemistry and Bioinformatics, 4(5), 355-360. [370]. https://doi.org/10.7763/IJBBB.2014.V4.370

Vancouver

Have CT, Appel EVR, Grarup N, Hansen T, Bork-Jensen J. Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes. International Journal of Bioscience, Biochemistry and Bioinformatics. 2014;4(5):355-360. 370. https://doi.org/10.7763/IJBBB.2014.V4.370

Author

Have, Christian Theil ; Appel, Emil Vincent Rosenbaum ; Grarup, Niels ; Hansen, Torben ; Bork-Jensen, Jette. / Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes. In: International Journal of Bioscience, Biochemistry and Bioinformatics. 2014 ; Vol. 4, No. 5. pp. 355-360.

Bibtex

@article{a4415ffc01e34b529875eab08079a354,
title = "Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes",
abstract = "Abstract—Undetected mislabeled samples may affect theresults of genotype studies, particular when rare geneticvariants are investigated. Mislabeled samples are often notdetected during quality control and if they are detected, theyare normally discarded due to a lack of a reliable method torecover the correct labels.Here we describe a statistical method which given a few extraindependent genotypes (barcode genotypes) detects mislabeledsamples and recovers the correct labels for sample mix-ups. Wehave implemented the method in a program (namedWunderbar) and we evaluate the reliability of the method onsimulated data. We find that even with only a small number ofbarcode genotypes, Wunderbar is capable of identifyingmislabeled samples and sample mix-ups with high sensitivityand specificity, even with a high genotyping error rate and evenin the presence of dependency between the individual barcodegenotypes.To detect mislabeled samples we calculate the probabilitythat the discordance between genotypes in the data and in theindependent genotypes can be attributed to random(non-mislabeling) genotyping errors. To identify mix-ups wecalculate the probability of identifying the set of identicalgenotypes between sample x and sample y by chance. Based onthis we calculate a mix-up confidence score with penalizationfor introducing mismatches in the proposed new label andadjustment for independency among the genotypes. Thisconfidence score is used to identify probable mix-ups.",
author = "Have, {Christian Theil} and Appel, {Emil Vincent Rosenbaum} and Niels Grarup and Torben Hansen and Jette Bork-Jensen",
year = "2014",
doi = "10.7763/IJBBB.2014.V4.370",
language = "English",
volume = "4",
pages = "355--360",
journal = "International Journal of Bioscience, Biochemistry and Bioinformatics",
issn = "2010-3638",
publisher = "International Academy Publishing",
number = "5",

}

RIS

TY - JOUR

T1 - Identification of Mislabeled Samples and Sample Mix-ups in Genotype Data using Barcode Genotypes

AU - Have, Christian Theil

AU - Appel, Emil Vincent Rosenbaum

AU - Grarup, Niels

AU - Hansen, Torben

AU - Bork-Jensen, Jette

PY - 2014

Y1 - 2014

N2 - Abstract—Undetected mislabeled samples may affect theresults of genotype studies, particular when rare geneticvariants are investigated. Mislabeled samples are often notdetected during quality control and if they are detected, theyare normally discarded due to a lack of a reliable method torecover the correct labels.Here we describe a statistical method which given a few extraindependent genotypes (barcode genotypes) detects mislabeledsamples and recovers the correct labels for sample mix-ups. Wehave implemented the method in a program (namedWunderbar) and we evaluate the reliability of the method onsimulated data. We find that even with only a small number ofbarcode genotypes, Wunderbar is capable of identifyingmislabeled samples and sample mix-ups with high sensitivityand specificity, even with a high genotyping error rate and evenin the presence of dependency between the individual barcodegenotypes.To detect mislabeled samples we calculate the probabilitythat the discordance between genotypes in the data and in theindependent genotypes can be attributed to random(non-mislabeling) genotyping errors. To identify mix-ups wecalculate the probability of identifying the set of identicalgenotypes between sample x and sample y by chance. Based onthis we calculate a mix-up confidence score with penalizationfor introducing mismatches in the proposed new label andadjustment for independency among the genotypes. Thisconfidence score is used to identify probable mix-ups.

AB - Abstract—Undetected mislabeled samples may affect theresults of genotype studies, particular when rare geneticvariants are investigated. Mislabeled samples are often notdetected during quality control and if they are detected, theyare normally discarded due to a lack of a reliable method torecover the correct labels.Here we describe a statistical method which given a few extraindependent genotypes (barcode genotypes) detects mislabeledsamples and recovers the correct labels for sample mix-ups. Wehave implemented the method in a program (namedWunderbar) and we evaluate the reliability of the method onsimulated data. We find that even with only a small number ofbarcode genotypes, Wunderbar is capable of identifyingmislabeled samples and sample mix-ups with high sensitivityand specificity, even with a high genotyping error rate and evenin the presence of dependency between the individual barcodegenotypes.To detect mislabeled samples we calculate the probabilitythat the discordance between genotypes in the data and in theindependent genotypes can be attributed to random(non-mislabeling) genotyping errors. To identify mix-ups wecalculate the probability of identifying the set of identicalgenotypes between sample x and sample y by chance. Based onthis we calculate a mix-up confidence score with penalizationfor introducing mismatches in the proposed new label andadjustment for independency among the genotypes. Thisconfidence score is used to identify probable mix-ups.

U2 - 10.7763/IJBBB.2014.V4.370

DO - 10.7763/IJBBB.2014.V4.370

M3 - Journal article

VL - 4

SP - 355

EP - 360

JO - International Journal of Bioscience, Biochemistry and Bioinformatics

JF - International Journal of Bioscience, Biochemistry and Bioinformatics

SN - 2010-3638

IS - 5

M1 - 370

ER -

ID: 120736068