Authors
Rey Koki (CIRES,NOAA/GSL)
Abstract
The global rise in wildfire frequency and intensity over the past decade underscores
the need for improved fire monitoring techniques. To advance deep learning re-
search on wildfire detection and its associated human health impacts, we introduce
SmokeViz, a large-scale machine learning dataset of smoke plumes in satellite
imagery. The dataset is derived from expert annotations created by smoke analysts
at the National Oceanic and Atmospheric Administration, which provide coarse
temporal and spatial approximations of smoke presence. To enhance annotation
precision, we propose pseudo-label dimension reduction (PLDR), a generalizable
method that applies pseudo-labeling to refine datasets with mismatching temporal
and/or spatial resolutions. Unlike typical pseudo-labeling applications that aim to
increase the number of labeled samples, PLDR maintains the original labels but
increases the dataset quality by solving for intermediary pseudo-labels (IPLs) that
align each annotation to the most representative input data. For SmokeViz, a parent
model produces IPLs to identify the single satellite image within each annotations
time window that best corresponds with the smoke plume. This refinement process
produces a succinct and relevant deep learning dataset consisting of over 160,000
manual annotations. The SmokeViz dataset is expected to be a valuable resource
to develop further wildfire-related machine learning models and is publicly avail-
able at https://noaa-gsl-experimental-pds.s3.amazonaws.com/index.
html#SmokeViz/.