This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.
See all usage examples for datasets listed in this registry tagged with parquet.
You are currently viewing a subset of data tagged with parquet.
If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.
Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.
If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.
geospatialglobalmappingosmparquettransportation
Overture is a collaboratively built, global, open map data project for developers who build map services or use geospatial data. Overture Open Map Data contains data that are standardized under the themes of Admins, Base, Buildings, Places, and Transportation. Overture also includes a Global Entity Reference System (GERS) which encodes map data to a shared universal reference. Beginning with the Overture 2023-11-14-alpha.0 release, the data is available as cloud-native GeoParquet files.
analyticsbroadbandcitiescivicdisaster responsegeospatialglobalgovernment spendinginfrastructureinternetmappingnetwork trafficparquetregulatorytelecommunicationstiles
Global fixed broadband and mobile (cellular) network performance, allocated to zoom level 16 web mercator tiles (approximately 610.8 meters by 610.8 meters at the equator). Data is provided in both Shapefile format as well as Apache Parquet with geometries represented in Well Known Text (WKT) projected in EPSG:4326. Download speed, upload speed, and latency are collected via the Speedtest by Ookla applications for Android and iOS and averaged for each tile. Measurements are filtered to results containing GPS-quality location accuracy.
astronomyimagingobject detectionparquetsatellite imagerysurvey
The Wide-field Infrared Survey Explorer (WISE) was a NASA Medium Explorer satellite in low-Earth orbit that conducted an all-sky astronomical imaging survey over four infrared bands from 2010-2011. The AllWISE Data Release combines data from all cryogenic and post-cryogenic survey phases and provides a comprehensive view of the mid-infrared sky. The Images Atlas includes 18,240 FITS image sets at 3.4, 4.6, 12 and 22 microns. The Source Catalog contains position, apparent motion, and flux information for over 747 million objects detected on the Atlas Images.
astronomyimagingobject detectionparquetsatellite imagerysurvey
The Near-Earth Object Wide-field Infrared Survey Explorer (NEOWISE) is a NASA Medium-class Explorer satellite in low-Earth orbit conducting an all-sky astronomical imaging survey over two infrared bands. The NEOWISE Reactivation mission began in 2013 when the original WISE satellite was brought out of hibernation to learn more about the population of near-Earth objects and comets that could pose an impact hazard to the Earth. The data is also used to study a wide range of astrophysical phenomena in the time domain including brown dwarfs, supernovae and active galactic nuclei.
astronomyimagingobject detectionparquetsatellite imagerysimulationssurvey
This release consists of simulated data products designed to mimic observations of the same region of the sky as seen by two astronomical facilities: the Nancy Grace Roman Telescope and the Vera C. Rubin Observatory.
astronomyobject detectionparquetsurvey
unWISE is a reprocessing of Wide-field Infrared Survey Explorer (WISE) data which preserves the native angular resolution and is optimized for forced photometry. WISE was a NASA satellite producing all-sky imaging in four infrared bands centered at 3.4, 4.6, 12 and 22 microns (W1, W2, W3, and W4) starting in 2010 until the coolant was exhausted in 2011. It was reactivated in 2013 as NEOWISE and continued imaging in W1 and W2 until 2024.
amazon.sciencecomputer visionlabeledmachine learningparquetvideo
This both the original .tfrecords and a Parquet representation of the YouTube 8 Million dataset. YouTube-8M is a large-scale labeled video dataset that consists of millions of YouTube video IDs, with high-quality machine-generated annotations from a diverse vocabulary of 3,800+ visual entities. It comes with precomputed audio-visual features from billions of frames and audio segments, designed to fit on a single hard disk. This dataset also includes the YouTube-8M Segments data from June 2019. This dataset is 'Lakehouse Ready'. Meaning, you can query this data in-place straight out of...