Login | STScI Home | HubbleSite | Copyright, Content Use, and Policies
STScI Webcast

Engineering and Technology Colloquia Series

Big Data Lessons from the Climate Science Community

Presented by: Dr. Seth McGinnis (National Center for Atmospheric Research)
Category: Engineering Colloquia   Duration: 1 hour and 15 minutes   Broadcast date: January 14, 2016
  • Bookmark/Share

What do space telescopes and climate models have in common? They both generate Big Data. "Big Data" is any data set that's too big in some fashion to be handled by conventional tools and techniques. Usually that's taken to mean data with large storage volume (on the terabyte to petabyte scale or larger), but other characteristics such as variety and velocity can also make a data set Big. An array of different approaches are needed to wrestle such data sets into tractability. Climate scientists have been struggling with Big Data in the volume and variety dimensions for many years. Global and regional climate models run for weeks at a time on state of the art supercomputers and generate huge data archives that require custom storage solutions, while observational data sets for climate impacts span a multitude of instruments, data types, and scientific disciplines. To deal with these collections of Big Data, the climate science community has adapted by developing a panoply of methods and mechanisms, including specialized data formats, metadata standards, archive specifications, smart tools, and automation support. In the near future, we expect a shift to server-side computation using data services and workflow collaboratories. This talk will present these approaches and the mental models that motivate them, which should prove useful to those in other fields who must also grapple with the challenges of Big Data.

Related Documents

Seth's Slides PowerPoint (.ppt)