EO-06. Inclusive Earth Data Science enabled by open cloud computing

Abstract
Advances in computing, statistical analyses, remote sensing, and artificial intelligence are ushering in waves of innovation across all of Ecology. These tools are now a mandatory component of any scientific endeavor, which gives them the power to exacerbate social and institutional inequalities between education and research institutions, state, tribal, and federal governments. When the quality of a user’s experience is limited by the hardware or software they are working with, then people or institutions with less money will be at a computational disadvantage, impacting efficiency and output. These inequities limit the scientific community’s ability to build an inclusive global environmental data science community and limit our capacity to understand and plan for future environmental stewardship. To address this problem, a leveling mechanism that allows more people high quality access to data and analytics from any web-capable device is needed. Using only public cloud resources we have deployed a cloud-based workbench in four recent Environmental Data Science workshops (385 participants). We show why open science reduces barriers for both new and advanced researchers, and how these platforms instantly provide a top tier resource to all participants, regardless of geographic location, institutional affiliation, or type of personal hardware. Between August 2022 and June 2023 we deployed virtual machines to participants in 4 different geospatial analysis workshops hosting 15, 20, 50, and 300 people respectively. These deployments were a collaboration between three NSF funded entities: the Environmental Data Science Innovation and Inclusion Lab (ESIIL), CyVerse, and Jetstream2. The curriculum for these workshops greatly exceeded the limits of free-tier commercial cloud resources; each virtual machine needed a minimum of 30 GB RAM and 60 GB storage. Participants needed to quickly transfer large data resources to-and-from a central repository that we managed. We built docker containers with fully functional Python, R, and RStudio environments pre-loaded with all of the libraries needed for lesson modules. Containers were organized and deployed on the Jetstream2. Resources are coordinated so that data and software containers are stored in CyVerse, and the cluster deployed on Jetstream2 with a single command from an instructor’s CyVerse user account. Preliminary results from these workshops demonstrate the capacity of open science cloud research to accelerate ecological discoveries led by any scientist respecting the principles of diversity, equity and inclusion in science. Training the next generation of ecology data scientists on cyberinfrastructure is a pressing task to support planetary resilience.