Authors
Jessica Knezha (CIRES,NOAA/PSL), Sergey Frolov (NOAA/PSL), Amal El Akkraoui (NASA), Jeffery Whitaker (NOAA/PSL), Sherrie Fredrick (CIRES,NOAA/PSL), Adam Schneider (CIRES,NOAA/PSL), Joao Souza (CIRES,NOAA/PSL), Jack Woollen (Lynker, NOAA/EMC), Seth Cohen (NASA), Daniel Rothenberg (Brightband), Johannes Mohrmann (Brightband)

Abstract

The NOAA-NASA Joint Archive (NNJA pronounced like the word ninja) is a publicly accessible, curated collection of Earth system observations spanning 1979 to present, created to accelerate reanalysis production, support development of artificial intelligence (AI) applications, and enable access to the observational record underlying modern reanalyses. The archive encompasses atmospheric satellite radiances and conventional measurements, in-situ ocean profiles, sea ice concentrations and freeboard retrievals, and land surface observations from over 60 sensors provided by NOAA, NASA, and their partners. The dataset extends into near-real time through continuous archival of the NOAA operational data. NNJA observations are distributed through Amazon Web Services in formats compatible with established data assimilation systems and are accompanied by channel-level black and white lists that capture the quality history of the satellite record. Data integrity is verified through a three-tier process: file-level metadata checks, cross-validation against existing black and white lists from prior NOAA and NASA reanalyses, and a comparison of assimilation statistics with reference reanalysis and operational products. To lower barriers for the machine learning community, a companion dataset (NNJA-AI) transforms NNJA observations from legacy formats to column-oriented Apache Parquet, enabling sufficient large-scale queries and direct integration with AI training workflows. Beyond the use in the next generation of reanalysis, we expect NNJA to be useful to the broader community as a comprehensive source of observational data.