A new catalog of geospatial datasets can reduce the time to science for people working with large datasets across many disciplines.Theawesome-gee-community-catalogcurates datasets that other researchers and groups produce and makes them more available for others to use on Google Earth Engine, a large-scale geospatial platform. IU alum and geospatial researcher Samapriya Roy created the open-source catalog using Jetstream’s cloud infrastructure.
”Most research articles nowadays employ some sort of storage mechanisms to help replicate the research to some degree,” said Roy. “While these services are great, it often requires an enormous amount of effort to make these datasets ready to use.”
IU Alum, Samapriya Roy, PhD, Creator of Google Earth Engine Community Catalog and Google Dev Expert for GEE
The resource began as a mapping project with Meta’s High Resolution Settlement layer dataset and then grew into using resources at Jetstream2 to scale this from a single dataset request to hundreds and thousands today in Google Earth Engine. The catalog now includes categories like population and socioeconomic conditions across the lode, regional and global land use analysis, and hydrology, agriculture, and weather and climate. The catalog also houses datasets and collections already housed in GEE, but those that need a way to be curated and cataloged effectively.
“It’s great to see our virtual cloud infrastructure used by people across the world, particularly how they can be linked with popular tools from the commercial cloud,” said David Y. Hancock, principal investigator of Jetstream2 and director for Advanced Cyberinfrastructure at Indiana University. “A low barrier to entry, powerful computation resources, and minimal downtimemean that we are seeing new and innovative ways to visualize and process key datasets for both policy and research.”
Hosted on the Gee Community Catalogue, The Brazilian Annual Land Use and Land Cover Mapping Project is an initiative that involves a collaborative network of biomes, land use, remote sensing, GIS, and computer science experts that rely on Google Earth Engine platform and its cloud processing and automated classifiers capabilities to generate Brazil's annual land use and land cover time series.
Current users can submit data requests, which are then selected and sorted by Roy or other community members on GitHub. Data is then processed on Jetstream to launch virtual models and attach large volumes to download, preprocess, and clean the data.The processing power allows researchers to ingest data and metadata seamlessly, consistently, and quickly.Roy and his community of dedicated data scientists have processed more than 100 TB of data using Jetstream2.
“The virtual interface allows users to scale this project as it continues to grow with users across the world,” said Roy. The datasets have only continued to grow, and with over 500 stars on GitHub, it’s only picking up steam.
Catalog statistics, accessed August 1, 2022
Catalog statistics, accessed July 5, 2023
The images above show major increases in catalog size (104TB to 227TB), total images (501,602 to 847,909), total image collections (254 to 398), total feature collections (414 to 556), and total features in catalog (518,696,071 to 1,015,149,426). This increase happened in less than a year!
Learn about Jetstream2 hardware resources, interfaces, and advanced features in the IU Knowledge Base article athttps://kb.iu.edu/d/bfde.