Armstrong, E. M., Bourassa, M. A., Cram, T., Elya, J. L., Greguska, F. R., III, Huang, T., et al. (2018). An information technology foundation for fostering interdisciplinary oceanographic research and analysis. In
American Geophysical Union (Vol. Fall Meeting).
Abstract: Before complex analysis of oceanographic or any earth science data can occur, it must be placed in the proper domain of computing and software resources. In the past this was nearly always the scientist's personal computer or institutional computer servers. The problem with this approach is that it is necessary to bring the data products directly to these compute resources leading to large data transfers and storage requirements especially for high volume satellite or model datasets. In this presentation we will present a new technological solution under development and implementation at the NASA Jet Propulsion Laboratory for conducting oceanographic and related research based on satellite data and other sources. Fundamentally, our approach for satellite resources is to tile (partition) the data inputs into cloud-optimized and computation friendly databases that allow distributed computing resources to perform on demand and server-side computation and data analytics. This technology, known as NEXUS, has already been implemented in several existing NASA data portals to support oceanographic, sea-level, and gravity data time series analysis with capabilities to output time-average maps, correlation maps, Hovmöller plots, climatological averages and more. A further extension of this technology will integrate ocean in situ observations, event-based data discovery (e.g., natural disasters), data quality screening and additional capabilities. This particular activity is an open source project known as the Apache Science Data Analytics Platform (SDAP) (https://sdap.apache.org), and colloquially as OceanWorks, and is funded by the NASA AIST program. It harmonizes data, tools and computational resources for the researcher allowing them to focus on research results and hypothesis testing, and not be concerned with security, data preparation and management. We will present a few oceanographic and interdisciplinary use cases demonstrating the capabilities for characterizing regional sea-level rise, sea surface temperature anomalies, and ocean hurricane responses.
Huang, T., Armstrong, E. M., Bourassa, M. A., Cram, T. A., Elya, J., Greguska, F., et al. (2019). An Integrated Data Analytics Platform.
Mar. Sci., 6.
Abstract: An Integrated Science Data Analytics Platform is an environment that enables the confluence of resources for scientific investigation. It harmonizes data, tools and computational resources to enable the research community to focus on the investigation rather than spending time on security, data preparation, management, etc. OceanWorks is a NASA technology integration project to establish a cloud-based Integrated Ocean Science Data Analytics Platform for big ocean science at NASA�s Physical Oceanography Distributed Active Archive Center (PO.DAAC) for big ocean science. It focuses on advancement and maturity by bringing together several NASA open-source, big data projects for parallel analytics, anomaly detection, in situ to satellite data matchup, quality-screened data subsetting, search relevancy, and data discovery.
Our communities are relying on data available through distributed data centers to conduct their research. In typical investigations, scientists would (1) search for data, (2) evaluate the relevance of that data, (3) download it, and (4) then apply algorithms to identify trends, anomalies, or other attributes of the data. Such a workflow cannot scale if the research involves a massive amount of data or multi-variate measurements. With the upcoming NASA Surface Water and Ocean Topography (SWOT) mission expected to produce over 20PB of observational data during its 3-year nominal mission, the volume of data will challenge all existing Earth Science data archival, distribution and analysis paradigms. This paper discusses how OceanWorks enhances the analysis of physical ocean data where the computation is done on an elastic cloud platform next to the archive to deliver fast, web-accessible services for working with oceanographic measurements.
Jacob, J. C., Armstrong, E. M., Bourassa, M. A., Cram, T., Elya, J. L., Greguska, F. R., III, et al. (2018). OceanWorks: Enabling Interactive Oceanographic Analysis in the Cloud with Multivariate Data. In
American Geophysical Union (Vol. Fall Meeting).
Abstract: NASA's Advanced Information System Technology (AIST) Program sponsors the OceanWorks project to establish an integrated data analytics center at the Physical Oceanography Distributed Active Archive Center (PO.DAAC). OceanWorks provides a series of interoperable capabilities that are essential for cloud-scale oceanographic research. These include big data analytics, data search with subsecond response, intelligent ranking of search results, subsetting based on data quality metrics, and rapid spatiotemporal matchup of satellite measurements with distributed in situ data. The software behind OceanWorks is being developed as an open source project in the Apache Incubator Science Data Analytics Platform (SDAP – http://sdap.apache.org). In this presentation we describe how OceanWorks enables efficient, scalable, interactive and interdisciplinary oceanographic analysis with multivariate data.
Interactivity is enabled by a number of SDAP features. First, SDAP provides Representational State Transfer (REST) interfaces to a number of built-in cloud analytics to compute time series, time-averaged maps, correlation maps, climatological maps, Hovmöller maps, and more. To access these, users simply navigate to a properly constructed parameterized URL in their web browser or issue web services calls in a variety of programming languages or in a Jupyter notebook. Alternatively, Python clients can make function calls via the NEXUS Command Line Interface (CLI). Authenticated users can even inject their own custom code via REST calls or the CLI.
To enable interdisciplinary science, OceanWorks provides access to a rich collection of multivariate satellite and in situ measurements of the oceans (e.g., sea surface temperature, height and salinity, chlorophyll and circulation) and other Earth science data (e.g., aerosol optical depth and wind speed), coupled with on-demand processing capabilities close to the data. We partition the data across space or time into tiles and store them into cloud-aware databases that are collocated with the computations. We will provide examples of scientific studies directly enabled by OceanWorks' multivariate data and cloud analytics.