Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables

Research output: Contribution to journalJournal articleResearchpeer-review

Standard

Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables. / Lamarine, Marc; Hager, Jörg; Saris, Wim H M; Astrup, Arne; Valsesia, Armand.

In: Frontiers in Nutrition, Vol. 5, 38, 2018.

Research output: Contribution to journalJournal articleResearchpeer-review

Harvard

Lamarine, M, Hager, J, Saris, WHM, Astrup, A & Valsesia, A 2018, 'Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables', Frontiers in Nutrition, vol. 5, 38. https://doi.org/10.3389/fnut.2018.00038

APA

Lamarine, M., Hager, J., Saris, W. H. M., Astrup, A., & Valsesia, A. (2018). Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables. Frontiers in Nutrition, 5, [38]. https://doi.org/10.3389/fnut.2018.00038

Vancouver

Lamarine M, Hager J, Saris WHM, Astrup A, Valsesia A. Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables. Frontiers in Nutrition. 2018;5. 38. https://doi.org/10.3389/fnut.2018.00038

Author

Lamarine, Marc ; Hager, Jörg ; Saris, Wim H M ; Astrup, Arne ; Valsesia, Armand. / Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables. In: Frontiers in Nutrition. 2018 ; Vol. 5.

Bibtex

@article{b56c4462db5c4ab589568872eef3e15c,
title = "Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables",
abstract = "Aim of Study: The use of weighed food diaries in nutritional studies provides a powerful method to quantify food and nutrient intakes. Yet, mapping these records onto food composition tables (FCTs) is a challenging, time-consuming and error-prone process. Experts make this effort manually and no automation has been previously proposed. Our study aimed to assess automated approaches to map food items onto FCTs.Methods: We used food diaries (~170,000 records pertaining to 4,200 unique food items) from the DiOGenes randomized clinical trial. We attempted to map these items onto six FCTs available from the EuroFIR resource. Two approaches were tested: the first was based solely on food name similarity (fuzzy matching). The second used a machine learning approach (C5.0 classifier) combining both fuzzy matching and food energy. We tested mapping food items using their original names and also an English-translation. Top matching pairs were reviewed manually to derive performance metrics: precision (the percentage of correctly mapped items) and recall (percentage of mapped items).Results: The simpler approach: fuzzy matching, provided very good performance. Under a relaxed threshold (score > 50%), this approach enabled to remap 99.49% of the items with a precision of 88.75%. With a slightly more stringent threshold (score > 63%), the precision could be significantly improved to 96.81% while keeping a recall rate > 95% (i.e., only 5% of the queried items would not be mapped). The machine learning approach did not lead to any improvements compared to the fuzzy matching. However, it could increase substantially the recall rate for food items without any clear equivalent in the FCTs (+7 and +20% when mapping items using their original or English-translated names). Our approaches have been implemented as R packages and are freely available from GitHub.Conclusion: This study is the first to provide automated approaches for large-scale food item mapping onto FCTs. We demonstrate that both high precision and recall can be achieved. Our solutions can be used with any FCT and do not require any programming background. These methodologies and findings are useful to any small or large nutritional study (observational as well as interventional).",
keywords = "Faculty of Science, Fuzzy matching, Food composition tables, Food diaries, Macronutrient, Food mapping, Dietary studies",
author = "Marc Lamarine and J{\"o}rg Hager and Saris, {Wim H M} and Arne Astrup and Armand Valsesia",
note = "CURIS 2018 NEXS 158",
year = "2018",
doi = "10.3389/fnut.2018.00038",
language = "English",
volume = "5",
journal = "Frontiers in Nutrition",
issn = "2296-861X",
publisher = "Frontiers",

}

RIS

TY - JOUR

T1 - Fast and accurate approaches for large-scale, automated mapping of food diaries on food composition tables

AU - Lamarine, Marc

AU - Hager, Jörg

AU - Saris, Wim H M

AU - Astrup, Arne

AU - Valsesia, Armand

N1 - CURIS 2018 NEXS 158

PY - 2018

Y1 - 2018

N2 - Aim of Study: The use of weighed food diaries in nutritional studies provides a powerful method to quantify food and nutrient intakes. Yet, mapping these records onto food composition tables (FCTs) is a challenging, time-consuming and error-prone process. Experts make this effort manually and no automation has been previously proposed. Our study aimed to assess automated approaches to map food items onto FCTs.Methods: We used food diaries (~170,000 records pertaining to 4,200 unique food items) from the DiOGenes randomized clinical trial. We attempted to map these items onto six FCTs available from the EuroFIR resource. Two approaches were tested: the first was based solely on food name similarity (fuzzy matching). The second used a machine learning approach (C5.0 classifier) combining both fuzzy matching and food energy. We tested mapping food items using their original names and also an English-translation. Top matching pairs were reviewed manually to derive performance metrics: precision (the percentage of correctly mapped items) and recall (percentage of mapped items).Results: The simpler approach: fuzzy matching, provided very good performance. Under a relaxed threshold (score > 50%), this approach enabled to remap 99.49% of the items with a precision of 88.75%. With a slightly more stringent threshold (score > 63%), the precision could be significantly improved to 96.81% while keeping a recall rate > 95% (i.e., only 5% of the queried items would not be mapped). The machine learning approach did not lead to any improvements compared to the fuzzy matching. However, it could increase substantially the recall rate for food items without any clear equivalent in the FCTs (+7 and +20% when mapping items using their original or English-translated names). Our approaches have been implemented as R packages and are freely available from GitHub.Conclusion: This study is the first to provide automated approaches for large-scale food item mapping onto FCTs. We demonstrate that both high precision and recall can be achieved. Our solutions can be used with any FCT and do not require any programming background. These methodologies and findings are useful to any small or large nutritional study (observational as well as interventional).

AB - Aim of Study: The use of weighed food diaries in nutritional studies provides a powerful method to quantify food and nutrient intakes. Yet, mapping these records onto food composition tables (FCTs) is a challenging, time-consuming and error-prone process. Experts make this effort manually and no automation has been previously proposed. Our study aimed to assess automated approaches to map food items onto FCTs.Methods: We used food diaries (~170,000 records pertaining to 4,200 unique food items) from the DiOGenes randomized clinical trial. We attempted to map these items onto six FCTs available from the EuroFIR resource. Two approaches were tested: the first was based solely on food name similarity (fuzzy matching). The second used a machine learning approach (C5.0 classifier) combining both fuzzy matching and food energy. We tested mapping food items using their original names and also an English-translation. Top matching pairs were reviewed manually to derive performance metrics: precision (the percentage of correctly mapped items) and recall (percentage of mapped items).Results: The simpler approach: fuzzy matching, provided very good performance. Under a relaxed threshold (score > 50%), this approach enabled to remap 99.49% of the items with a precision of 88.75%. With a slightly more stringent threshold (score > 63%), the precision could be significantly improved to 96.81% while keeping a recall rate > 95% (i.e., only 5% of the queried items would not be mapped). The machine learning approach did not lead to any improvements compared to the fuzzy matching. However, it could increase substantially the recall rate for food items without any clear equivalent in the FCTs (+7 and +20% when mapping items using their original or English-translated names). Our approaches have been implemented as R packages and are freely available from GitHub.Conclusion: This study is the first to provide automated approaches for large-scale food item mapping onto FCTs. We demonstrate that both high precision and recall can be achieved. Our solutions can be used with any FCT and do not require any programming background. These methodologies and findings are useful to any small or large nutritional study (observational as well as interventional).

KW - Faculty of Science

KW - Fuzzy matching

KW - Food composition tables

KW - Food diaries

KW - Macronutrient

KW - Food mapping

KW - Dietary studies

U2 - 10.3389/fnut.2018.00038

DO - 10.3389/fnut.2018.00038

M3 - Journal article

C2 - 29868600

VL - 5

JO - Frontiers in Nutrition

JF - Frontiers in Nutrition

SN - 2296-861X

M1 - 38

ER -

ID: 196202243