A new catalog of geospatial datasets can reduce the time to science for people working with large datasets across many disciplines. The awesome-gee-community-catalog curates datasets that other researchers and groups produce and makes them more available for others to use on Google Earth Engine, a large-scale geospatial platform. IU alum and geospatial researcher Samapriya Roy created the open-source catalog using Jetstream’s cloud infrastructure.
”Most research articles nowadays employ some sort of storage mechanisms to help replicate the research to some degree,” said Roy. “While these services are great, it often requires an enormous amount of effort to make these datasets ready to use.”
The resource began as a mapping project with Meta’s High Resolution Settlement layer dataset and then grew into using resources at Jetstream2 to scale this from a single dataset request to hundreds and thousands today in Google Earth Engine. The catalog now includes categories like population and socioeconomic conditions across the lode, regional and global land use analysis, and hydrology, agriculture, and weather and climate. The catalog also houses datasets and collections already housed in GEE, but those that need a way to be curated and cataloged effectively.
“It’s great to see our virtual cloud infrastructure used by people across the world, particularly how they can be linked with popular tools from the commercial cloud,” said David Y. Hancock, principal investigator of Jetstream2 and director for Advanced Cyberinfrastructure at Indiana University. “A low barrier to entry, powerful computation resources, and minimal downtime mean that we are seeing new and innovative ways to visualize and process key datasets for both policy and research.”
Current users can submit data requests, which are then selected and sorted by Roy or other community members on GitHub. Data is then processed on Jetstream to launch virtual models and attach large volumes to download, preprocess, and clean the data. The processing power allows researchers to ingest data and metadata seamlessly, consistently, and quickly. Roy and his community of dedicated data scientists have processed more than 100 TB of data using Jetstream2.
“The virtual interface allows users to scale this project as it continues to grow with users across the world,” said Roy. The datasets have only continued to grow, and with over 500 stars on GitHub, it’s only picking up steam.
The images above show major increases in catalog size (104TB to 227TB), total images (501,602 to 847,909), total image collections (254 to 398), total feature collections (414 to 556), and total features in catalog (518,696,071 to 1,015,149,426). This increase happened in less than a year!
Learn about Jetstream2 hardware resources, interfaces, and advanced features in the IU Knowledge Base article athttps://kb.iu.edu/d/bfde.
Learn more about awesome-gee-community-catalog: Samapriya Roy, Valerie Pasquarella, Erin Trochim, & Tyson Swetnam. (2023). samapriya/awesome-gee-community-datasets: Community Catalog (1.0.9). Zenodo.