. Machine learning and underwater biomass characterization

Abstract
Measurements of sea surface expression and marine acoustic measurements of the water column can be analyzed together to provide insight into what is happening in the marine water column. We explored the potential for predicting marine acoustic (sonar) data that describe the biomass of trophic levels within the vertical ocean structure using remotely sensed observations and derived products from satellite measurements. To determine whether satellite data can serve as a proxy or predictor for the subsurface distribution of marine organisms, we leveraged water-column sonar data collected in the California Current System (CCS) between May 2013 to August 2013 by the NOAA Northwest Fisheries Science Center and satellite data that were measured by the MODIS instrument onboard the Terra satellite. These data are archived and made available from the NOAA National Centers for Environmental Information and from NASA OceanColor respectively. Using this large scale dataset we are evaluating the variability of the marine biography in the CCS. The integrated returned energy between two depths in the water column (or nautical area scatter coefficient, NASC) enables us to identify patterns of acoustic reflectance of marine organisms in the horizontal and vertical extents. An analysis of the interannual variability of the distribution of surface chlorophyll concentration from satellite ocean color measurements and variations in the subsurface NASC measurements illustrates a correlation between the parameters for upwelling regions of the CCS. The ability to determine the extent that surface chlorophyll concentrations measured from satellite can serve as an indicator, proxy or predictor for subsurface distribution of zooplankton and/or fish will be discussed. We are also analyzing the influence of other satellite measurements and derived products, such as sea surface temperature and distance from shore. Several techniques of machine learning were used to analyze potential linkages between our satellite and sonar data sets. We tested the use of different neural network frameworks, comparing how well different frameworks performed on withheld validation data. We then worked on predicting the proxy biomass according to amounts given by the NASC values for each sonar frequency and depth bin. In order to evaluate the neural network performance, we utilized random forests, a decision tree based machine learning method, to analyze our data. Using a random forest regression model, we calculated the mean squared error to compare the performance of neural networks and random decision forests. Remotely sensed ocean observations can predict subsurface structural properties derived from sonar data, particularly for shallow depths and particular sonar frequencies. This comparison between neural network and random forest model performances and subsequent next steps will be discussed.