A global land cover training dataset from 1984 to 2020
Research output: Contribution to journal › Journal article › Research › peer-review
Standard
A global land cover training dataset from 1984 to 2020. / Stanimirova, Radost; Tarrio, Katelyn; Turlej, Konrad; McAvoy, Kristina; Stonebrook, Sophia; Hu, Kai Ting; Arévalo, Paulo; Bullock, Eric L.; Zhang, Yingtong; Woodcock, Curtis E.; Olofsson, Pontus; Zhu, Zhe; Barber, Christopher P.; Souza, Carlos M.; Chen, Shijuan; Wang, Jonathan A.; Mensah, Foster; Calderón-Loor, Marco; Hadjikakou, Michalis; Bryan, Brett A.; Graesser, Jordan; Beyene, Dereje L.; Mutasha, Brian; Siame, Sylvester; Siampale, Abel; Friedl, Mark A.
In: Scientific Data, Vol. 10, 879, 2023.Research output: Contribution to journal › Journal article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - JOUR
T1 - A global land cover training dataset from 1984 to 2020
AU - Stanimirova, Radost
AU - Tarrio, Katelyn
AU - Turlej, Konrad
AU - McAvoy, Kristina
AU - Stonebrook, Sophia
AU - Hu, Kai Ting
AU - Arévalo, Paulo
AU - Bullock, Eric L.
AU - Zhang, Yingtong
AU - Woodcock, Curtis E.
AU - Olofsson, Pontus
AU - Zhu, Zhe
AU - Barber, Christopher P.
AU - Souza, Carlos M.
AU - Chen, Shijuan
AU - Wang, Jonathan A.
AU - Mensah, Foster
AU - Calderón-Loor, Marco
AU - Hadjikakou, Michalis
AU - Bryan, Brett A.
AU - Graesser, Jordan
AU - Beyene, Dereje L.
AU - Mutasha, Brian
AU - Siame, Sylvester
AU - Siampale, Abel
AU - Friedl, Mark A.
N1 - Publisher Copyright: © 2023, The Author(s).
PY - 2023
Y1 - 2023
N2 - State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.
AB - State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.
U2 - 10.1038/s41597-023-02798-5
DO - 10.1038/s41597-023-02798-5
M3 - Journal article
C2 - 38062043
AN - SCOPUS:85178953466
VL - 10
JO - Scientific data
JF - Scientific data
SN - 2052-4463
M1 - 879
ER -
ID: 380698695