Authors
Peter Z. Vaillancourt (CIRES,NOAA/PSL), Timothy A. Smith (NOAA/PSL), Baiding Liu (NOAA/PSL,RIVA Solutions), Sergey Frolov (NOAA/PSL)

Abstract

The scientific community has rapidly been adopting machine learning (ML) methods in weather prediction as recent research across the field has demonstrated they are becoming competitive with operational forecasting centers. ML weather emulators benefit from being implicitly trained on observations from the ERA5 reanalysis dataset, but do not yet benefit from the expansive observational datasets that are used to initialize the traditional numerical weather prediction pipeline at the National Oceanic and Atmospheric Administration (NOAA). The Data Assimilation and Reanalysis Team in the Modeling Development and Data Assimilation division (MDAD) of NOAA’s Physical Sciences Laboratory (PSL) aims to bridge this gap, ensuring the next generation of NOAA's forecasting system benefits from advances in ML for weather and climate. In order to accomplish this, there are several challenges to address, both scientific and technical. Our team has taken on one of MDAD’s stated challenges, from the 2023 PSL Rendezvous, to addressing our role in NOAA’s mission: a need for an agile framework for developing, training, and integrating these models into the various applications where they are meant to be used, like Data Assimilation and forecasting across scales. Our framework aims to address the main barriers to entry to using ML faced by scientists, researchers, and software developers when interacting with NOAA observational datasets and weather prediction workflows including, for example, data preprocessing and cloud deployment. Flexible cross-platform deployment is one of our central goals for this framework – especially for scalable ML training – as it is for most scientists who are constrained by the computational resources available to them, which can change over time. Our work is currently exploratory and focused on the needs of PSL, to be a resource for teams across the Lab to increase agility in meeting the nation’s needs for accurate forecasting and scientific development therein. However, it will also be extensible to NOAA and the scientific community at large by becoming a free, open source, and publicly available framework for agile ML in weather prediction research.