. Scaling weather data analysis with cloud-ready architecture

Abstract
Weather model and satellite data is growing exponentially and forecast systems need to evolve to support larger data volumes and velocities. We are researching effective ways to process and deliver insights into large weather datasets in a computationally efficient manner, specifically using microservice architecture, containerization, distributed computing, cloud deployment, and cloud object storage. Simply repeating old methods "in the cloud" is an ineffective and costly mistake, and therefore it is pertinent to evaluate modern system patterns and practices to add value to scientific cloud solutions. There are several paradigm shifts in software design to use cloud resources effectively. Isolating processing code as containerized workers enables scalability, meaning processing can occur concurrently across multiple machines. Since these processes may run on any number of servers, data locations should be decoupled from code meaning it should not matter where data is stored: on local disk, cloud storage, or a cloud archive. Scaling computation alone does not necessarily increase performance though: legacy file formats such as GRIB and NETCDF are ill-equipped at handling concurrent data access, and actually burden the system CPU, memory, and disk when accessed by multiple asynchronous processes. We tested computations on the HRRR ensemble using a Python cluster managed by the Dask distributed processing library and compared the file loading performance of GRIB against zarr, a distributed file format. In sequential operations loading times were equivalent but zarr used 90% less memory and 50% less CPU. During asynchronous operations (loading on many nodes at once), zarr performed twice as fast as GRIB while still using 90% less memory and 50% less CPU in aggregate. Reformatting weather data into a distributed-friendly format such as zarr is a worthwhile consideration when building data science platforms for better performance and cost-effectiveness.