Registry of Open Data on AWS

About

This registry exists to help people discover and share datasets that are available via AWS resources. See recent additions and learn more about sharing data on AWS.

See all usage examples for datasets listed in this registry tagged with transportation.

Search datasets (currently 13 matching datasets)

You are currently viewing a subset of data tagged with transportation.

Add to this registry

If you want to add a dataset or example of how to use a dataset to this registry, please follow the instructions on the Registry of Open Data on AWS GitHub repository.

Unless specifically stated in the applicable dataset documentation, datasets available through the Registry of Open Data on AWS are not provided and maintained by AWS. Datasets are provided and maintained by a variety of third parties under a variety of licenses. Please check dataset licenses and related documentation to determine if a dataset may be used for your application.

Tell us about your project

If you have a project using a listed dataset, please tell us about it. We may work with you to feature your project in a blog post.

2021 Amazon Last Mile Routing Research Challenge Dataset

amazon.scienceanalyticsdeep learninggeospatiallast milelogisticsmachine learningoptimizationroutingtransportationurban

The 2021 Amazon Last Mile Routing Research Challenge was an innovative research initiative led by Amazon.com and supported by the Massachusetts Institute of Technology’s Center for Transportation and Logistics. Over a period of 4 months, participants were challenged to develop innovative machine learning-based methods to enhance classic optimization-based approaches to solve the travelling salesperson problem, by learning from historical routes executed by Amazon delivery drivers. The primary goal of the Amazon Last Mile Routing Research Challenge was to foster innovative applied research in r...

Usage examples

Constrained Local Search for Last-Mile Routing by William Cook, Stephan Held, Keld Helsgaun
Learn global and optimize local A data-driven methodology for last-mile routing by Mayukh Ghosh, Alex Kuiper, Roshan Mahes, Donato Maragno
Code repository used for the 2021 Amazon Routing Research Challenge (this repository is included for reference and documentation purposes only, you do not need to install it to access the data) by CAVE Lab, MIT Center for Transportation and Logistics
Integrating driver behavior into last-mile delivery routing - Combining machine learning and optimization in a hybrid decision support framework by Peter Dieter, Matthew Caron, Guido Schryen
Inverse Optimization for Routing Problems by Pedro Zattoni Scroccaro, Piet van Beek, Peyman Mohajerin Esfahani, Bilge Atasoy

See 17 usage examples →

Low Altitude Disaster Imagery (LADI) Dataset

aerial imagerycoastalcomputer visiondisaster responseearth observationearthquakesgeospatialimage processingimaginginfrastructurelandmachine learningmappingnatural resourceseismologytransportationurbanwater

The Low Altitude Disaster Imagery (LADI) Dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2023. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets.

Usage examples

LADI v1 Tutorials by Andrew Weinert, Jianyu Mao, Kiana Harris, Nae-Rong Chang, Caleb Pennell, Yiming Ren, Ryan Earley, Nadia Dimitrova
Video Testing at the FirstNet Innovation and Test Lab Using a Public Safety Dataset by Chris Budny, Jeffrey Liu, Andrew Weinert
Train and Deploy an Image Classifier for Disaster Response by Jianyu Mao, Kiana Harris, Nae-Rong Chang, Caleb Pennell, Yiming Ren
Large Scale Organization and Inference of an Imagery Dataset for Public Safety by Jeffrey Liu, David Strohschein, Siddharth Samsi, Andrew Weinert
Accelerate disaster response with computer vision for satellite imagery using Amazon SageMaker and Amazon Augmented AI by Vamshi Krishna Enabothala, Morgan Dutton, and Sandeep Verma

See 11 usage examples →

nuScenes

autonomous vehiclescomputer visionlidarroboticstransportationurban

Public large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car.

Usage examples

nuScenes: A multimodal dataset for autonomous driving by Holger Caesar, Varun Bankiti, Alex H. Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, Oscar Beijbom
nuImages devkit tutorial by Motional
nuScenes devkit tutorial by Motional
nuScenes lidarseg and panoptic tutorial by Motional
nuScenes devkit by Motional

See 9 usage examples →

NOAA National Water Model CONUS Retrospective Dataset

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

The NOAA National Water Model Retrospective dataset contains input and output from multi-decade CONUS retrospective simulations. These simulations used meteorological input fields from meteorological retrospective datasets. The output frequency and fields available in this historical NWM dataset differ from those contained in the real-time operational NWM forecast model. Additionally, note that no streamflow or other data assimilation is performed within any of the NWM retrospective simulations

One application of this dataset is to provide historical context to current near real-time streamflow, soil moisture and snowpack conditions. The retrospective data can be used to infer flow frequencies and perform temporal analyses with hourly streamflow output and 3-hourly land surface output. This dataset can also be used in the development of end user applications which require a long baseline of data for system training or verification purposes.

...

Usage examples

Explore the National Water Model V2.0 Retrospective in Zarr by Rich Signell
Simulating storm surge and compound flooding events with a creek-to-ocean model: Importance of baroclinic effects by Fei Ye, et al.
Processing the 250 TB NWM dataset with Coiled, Dask, and Xarray by Sarah Johnson (Coiled)
On Strictly Enforced Mass Conservation Constraints for Modeling the Rainfall-Runoff Process by Jonathan M. Frame, Frederik Kratzert, Hoshin V. Gupta, Paul Ullrich and Grey S. Nearing
NOAA's National Water Model: Advancing Operational Hydrology Through Continental-scale Modeling. by Brian Cosgrove, David Gochis, Trey Flowers, Aubrey Dugger, Fred Ogden, Tom Graziano, Ed Clark, et al; 2024

See 7 usage examples →

nuPlan

autonomous vehicleslidarroboticstransportationurban

nuPlan is the world's first large-scale planning benchmark for autonomous driving.

Usage examples

nuPlan devkit by Motional
nuPlan Scenario Visualization by Motional
nuPlan Planner Tutorial by Motional
nuPlan Advacned Model Training by Motional
nuPlan Sensor Data Tutorial by Motional

See 7 usage examples →

New York City Taxi and Limousine Commission (TLC) Trip Record Data

citiestransportationurban

Data of trips taken by taxis and for-hire vehicles in New York City. Note: access to this dataset is free, however direct S3 access does require an AWS account. Anonymous downloads are accessible from the dataset's documentation webpage listed below.

Usage examples

Optimizing data for analysis with Amazon Athena and AWS Glue by Manav Sehgal
Exploring data with Python and Amazon S3 Select by Manav Sehgal
Build and run streaming applications with Apache Flink and Amazon Kinesis Data Analytics for Java Applications by Steffen Hausmann
Build a Real-time Stream Processing Pipeline with Apache Flink on AWS by Steffen Hausmann
Deep Dive on Flink & Spark on Amazon EMR by Keith Steward

See 6 usage examples →

Overture Maps Foundation Open Map Data

geospatialglobalmappingosmparquettransportation

Overture is a collaboratively built, global, open map data project for developers who build map services or use geospatial data. Overture Open Map Data contains data that are standardized under the themes of Admins, Base, Buildings, Places, and Transportation. Overture also includes a Global Entity Reference System (GERS) which encodes map data to a shared universal reference. Beginning with the Overture 2023-11-14-alpha.0 release, the data is available as cloud-native GeoParquet files.

Usage examples

Accessing Overture Maps Data by Overture Maps Foundation
Global Entity Reference System by Overture Maps Foundation
Building Heights: From open USGS lidar to open Overture maps by Overture Maps Foundation
Working With Overture Data: A Step-by-Step Guide by Jennings Anderson
Overture Data Schema by Overture Maps Foundation

See 5 usage examples →

Demand-Side Grid (dsgrid) Toolkit

data assimilationelectricityenergyenergy modelingindustrialmeteorologicalsolartransportation

Projects that use the dsgrid toolkit assemble bottom-up descriptions of electricity demand and related data that are highly resolved geographically, temporally, and sectorally. Typically modelers describe multiple scenarios of future energy use at hourly resolution, suitable for inclusion in long-term power system planning models, i.e., capacity expansion and production cost models.

Usage examples

Demand-Side Grid Toolkit by Elaine Hale
Python API for Accessing dsgrid Data for the Electrification Futures Study (EFS) by Elaine Hale
dsgrid Project Standard Scenarios for the TEMPO Project by Elaine Hale
GitHub Repository for Working with the dsgrid Projects by Elaine Hale
dsgrid Documentation by Elaine Hale

See 7 usage examples →

Aurora Multi-Sensor Dataset

autonomous vehiclescomputer visiondeep learningimage processinglidarmachine learningmappingroboticstraffictransportationurbanweather

The Aurora Multi-Sensor Dataset is an open, large-scale multi-sensor dataset with highly accurate localization ground truth, captured between January 2017 and February 2018 in the metropolitan area of Pittsburgh, PA, USA by Aurora (via Uber ATG) in collaboration with the University of Toronto. The de-identified dataset contains rich metadata, such as weather and semantic segmentation, and spans all four seasons, rain, snow, overcast and sunny days, different times of day, and a variety of traffic conditions.
The Aurora Multi-Sensor Dataset contains data from a 64-beam Velodyne HDL-64E LiDAR sensor and seven 1920x1200-pixel resolution cameras including a forward-facing stereo pair and five wide-angle lenses covering a 360-degree view around the vehicle.
This data can be used to develop and evaluate large-scale long-term approaches to autonomous vehicle localization. Its size and diversity make it suitable for a wide range of research areas such as 3D reconstruction, virtual tourism, HD map construction, and map compression, among others.
The data was first presented at the International Conference on Intelligent Robots an
...

Usage examples

"Pit30M: A benchmark for global localization in the age of self-driving cars", in 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (pp. 4477-4484) by Martinez, J., Doubov, S., Fan, J., Bârsan, I. A., Wang, S., Máttyus, G., Urtasun, R.
Introduction to Visualizing Sensor Types (Jupyter notebook) by Andrei Bârsan (note: Aurora makes no representations as to the accuracy or functionality of the tutorial)

See 2 usage examples →

Ford Multi-AV Seasonal Dataset

autonomous vehiclescomputer visionlidarmappingroboticstransportationurbanweather

This research presents a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles The vehicles were manually driven on an average route of 66 km in Michigan that included a mix of driving scenarios like the Detroit Airport, freeways, city-centres, university campus and suburban neighbourhood, etc. Each vehicle used in this data collection is a Ford Fusion outfitted with an Applanix POS-LV inertial measurement unit (IMU), four HDL-32E Velodyne 3D-lidar scanners, 6 Point Grey 1.3 MP Cameras arranged on the...

Usage examples

Ford AV Dataset Tutorial by Ford Motor Company
Autonomous Driving Data Service (ADDS) by Ajay Vohra, Amazon

See 2 usage examples →

NOAA Analysis of Record for Calibration (AORC) Dataset

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

The Analysis Of Record for Calibration (AORC) is a gridded record of near-surface weather conditions covering the continental United States and Alaska and their hydrologically contributing areas. It is defined on a latitude/longitude spatial grid with a mesh length of 30 arc seconds (~800 m), and a temporal resolution of one hour. Elements include hourly total precipitation, temperature, specific humidity, terrain-level pressure, downward longwave and shortwave radiation, and west-east and south-north wind components. It spans the period from 1979 across the Continental U.S. (CONUS) and from 1981 across Alaska, to the near-present (at all locations). This suite of eight variables is sufficient to drive most land-surface and hydrologic models and is used as input to the National Water Model (NWM) retrospective simulation. While the native AORC process generates netCDF output, the data is post-processed to create a cloud optimized Zarr formatted equivalent for dissemination using cloud technology and infrastructure.

AORC Version 1.1 dataset creation
The AORC dataset was created after reviewing, identifying, and processing multiple large-scale, observation, and analysis datasets. There are two versions of The Analysis Of Record for Calibration (AORC) data.

The initial AORC Version 1.0 dataset was completed in November 2019 and consisted of a grid with 8 elements at a resolution of 30 arc seconds. The AORC version 1.1 dataset was created to address issues "see Table 1 in Fall et al., 2023" in the version 1.0 CONUS dataset. Full documentation on version 1.1 of the AORC data and the related journal publication are provided below.

The native AORC version 1.1 process creates a dataset that consists of netCDF files with the following dimensions: 1 hour, 4201 latitude values (ranging from 25.0 to 53.0), and 8401 longitude values (ranging from -125.0 to -67).

The data creation runs with a 10-day lag to ensure the inclusion of any corrections to the input Stage IV and NLDAS data.

Note - The full extent of the AORC grid as defined in its data files exceed those cited above; those outermost rows and columns of data grids are filled with missing values and are the remnant of an early set of required AORC extents that have since been adjusted inward.

AORC Version 1.1 Zarr Conversion

The goal for converting the AORC data from netCDF to Zarr was to allow users to quickly and efficiently load/use the data. For example, one year of data takes 28 mins to load via NetCDF while only taking 3.2 seconds to load via Zarr (resulting in a substantial increase in speed). For longer periods of time, the percentage increase in speed using Zarr (vs NetCDF) is even higher. Using Zarr also leads to less memory and CPU utilization.

It was determined that the optimal conversion for the data was 1 year worth of Zarr files with a chunk size of 18MB. The chunking was completed across all 8 variables. The chunks consist of the following dimensions: 144 time, 128 latitude, and 256 longitude. To create the files in the Zarr format, the NetCDF files were rechunked using chunk() and "Xarray". After chunking the files, they were converted to a monthly Zarr file. Then, each monthly Zarr file was combined using "to_zarr" to create a Zarr file that represents a full year

Users wanting more than 1 year of data will be able to utilize Zarr utilities/libraries to combine multiple years up to the span of the full data set.

There are eight variables representing the meteorological conditions
Total Precipitaion (APCP_surface)

Hourly total precipitation (kgm-2 or mm) for Calibration (AORC) dataset

Air Temperature (TMP_2maboveground)

Temperature (at 2 m above-ground-level (AGL)) (K)

Specific Humidity (SPFH_2maboveground)

Specific humidity (at 2 m AGL) (g g-1)

Downward Long-Wave Radiation Flux (DLWRF_surface)

longwave (infrared)
radiation flux (at the surface) (W m-2)

Downward Short-Wave Radiation Flux (DSWRF_surface)

Downward shortwave (solar)
radiation flux (at the surface) (W m-2)

Pressure (PRES_surface)

Air pressure (at the surface) (Pa)

**U-Component of Wind (UGRD_10maboveground)"
1)U (west-east) - components of the wind (at 10 m AGL) (m s-1)
**V-Component of Wind (VGRD_10maboveground)"

V (south-north) - components of the wind (at 10 m AGL) (m s-1)

Precipitation and Temperature

The gridded AORC precipitation dataset contains one-hour Accumulated Surface Precipitation (APCP) ending at the “top” of each hour, in liquid water-equivalent units (kg m-2 to the nearest 0.1 kg m-2), while the gridded AORC temperature dataset is comprised of instantaneous, 2 m above-ground-level (AGL) temperatures at the top of each hour (in Kelvin, to the nearest 0.1).

Specific Humidity, Pressure, Downward Radiation, Wind

...

Usage examples

The Office of Water Prediction's Analysis of Record for Calibration, version 1.1: Dataset description and precipitation evaluation (09 July 2023). J. Am. Water Resour. Assoc., 59 (6). 1246-1272. by Greg Fall, David Kitzmiller, Sandra Pavlovic, Ziya Zhang, Nathan Patrick, Michael St. Laurent, Carl Trypaluk, Wanru Wu, and Dennis Miller
Explore the AORC 1.1 dataset in Zarr by Michael AuCoin

See 2 usage examples →

MAN TruckScenes

autonomous vehiclescomputer visiondeep learningGPSIMUlidarlogisticsmachine learningobject detectionobject trackingperceptionradarroboticstransportation

A large scale multimodal dataset for Autonomous Trucking. Sensor data was recorded with a heavy truck from MAN equipped with 6 lidars, 6 radars, 4 cameras and a high-precision GNSS. MAN TruckScenes allows the research community to come into contact with truck-specific challenges, such as trailer occlusions, novel sensor perspectives, and terminal environments for the first time. It comprises more than 740 scenes of 20s each within a multitude of different environmental conditions. Bounding boxes are available for 27 object classes, 15 attributes, and a range of more than 230m. The scenes are t...

Usage examples

PyPi package by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin
MANTruckScenes: A multimodal dataset for autonomous trucking in diverse conditions by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin, et al
TruckScenes devkit by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin
TruckScenes devkit tutorial by Felix Fent, Fabian Kuttenreich, Florian Ruch, Farija Rizwin

See 4 usage examples →

NOAA National Water Model Short-Range Forecast

agricultureagricultureclimatedisaster responseenvironmentaltransportationweather

The National Water Model (NWM) is a water resources model that simulates and forecasts water budget variables, including snowpack, evapotranspiration, soil moisture and streamflow, over the entire continental United States (CONUS). The model, launched in August 2016, is designed to improve the ability of NOAA to meet the needs of its stakeholders (forecasters, emergency managers, reservoir operators, first responders, recreationists, farmers, barge operators, and ecosystem and floodplain managers) by providing expanded accuracy, detail, and frequency of water information. It is operated by NOA...

Usage examples

Harmonic Oscillator Seasonal Trend (HOST) Model for Hydrological Drought Pattern Identification and Analysis by K. Raczyński, J. Dyer

See 1 usage example →

NOAA's Coastal Ocean Reanalysis (CORA) Dataset

agricultureagricultureclimatedisaster responseenvironmentaloceanstransportationweather

NOAA's Coastal Ocean Reanalysis (CORA) for the Gulf of Mexico and East Coast (GEC) is produced using verified hourly water levels from the Center of Operational Oceanographic Products & Services (CO-OPS), through hydrodynamic modeling from Advanced Circulation "ADCIRC" and Simulating WAves Nearshore "SWAN" models. Data are assimilated, processed, corrected, and processed again before quality assurance and skill assessment with additional verified tide station-based observations.

Details for CORA Dataset

Timeseries - 1979 to 2022
Size - Approx. 20.5TB
Domain - Lat 5.8 to 45.8 ; Long -98.0 to -53.8
Nodes - 1813443 centroids, 3564104 elements
Grid cells - Currently apporximately 505
Spatial Resolution ...

Usage examples

Notebooks for working with CORA Data by John Ratcliff

See 1 usage example →

Swiss Public Transport Stops

citiesgeospatialinfrastructuremappingtraffictransportation

The basic geo-data set for public transport stops comprises public transport stops in Switzerland and additional selected geo-referenced public transport locations that are of operational or structural importance (operating points).

Usage examples

Map Viewer by Swiss Geoportal

See 1 usage example →

NOAA / NGA Satellite Computed Bathymetry Assessment-SCuBA

agricultureagriculturebathymetryclimatedisaster responseenvironmentaloceanstransportationweather

One of the National Geospatial-Intelligence Agency’s (NGA) and the National Oceanic and Atmospheric Administration’s (NOAA) missions is to ensure the safety of navigation on the seas by maintaining the most current information and the highest quality services for U.S. and global transport networks. To achieve this mission, we need accurate coastal bathymetry over diverse environmental conditions. The SCuBA program focused on providing critical information to improve existing bathymetry resources and techniques with two specific objectives. The first objective was to validate National Aeronautics and Space Administration’s (NASA) Ice, Cloud and land Elevation SATellite-2 (ICESat-2), an Earth observing, space-based light detection and ranging (LiDAR) capability, as a useful bathymetry tool for nearshore bathymetry information in differing environmental conditions. Upon validating the ICESat-2 bathymetry retrievals relative to sea floor type, water clarity, and water surface dynamics, the next objective is to use ICESat-2 as a calibration tool to improve existing Satellite Derived Bathymetry (SDB) coastal bathymetry products with poor coastal depth information but superior spatial coverage. Current resources that monitor coastal bathymetry can have large vertical depth errors (up to 50 percent) in the nearshore region; however, derived results from ICESat-2 shows promising results for improving the accuracy of the bathymetry information in the nearshore region.

Project Overview
One of NGA’s and NOAA’s primary missions is to provide safety of navigation information. However, coastal depth information is still lacking in some regions—specifically, remote regions. In fact, it has been reported that 80 percent of the entire seafloor has not been mapped. Traditionally, airborne LiDARs and survey boats are used to map the seafloor, but in remote areas, we have to rely on satellite capabilities, which currently lack the vertical accuracy desired to support safety of navigation in shallow water. In 2018, NASA launched a space-based LiDAR system called ICESat-2 that has global coverage and a polar orbit originally designed to monitor the ice elevation in polar regions. Remarkably, because it has a green laser beam, ICESat-2 also happens to collect bathymetry information ICESat-2. With algorithm development provided by University of Texas (UT) Austin, NGA Research and Development (R&D) leveraged the ICESat-2 platform to generate SCuBA, an automated depth retrieval algorithm for accurate, global, refraction-corrected underwater depths from 0 m to 30 m, detailed in Figure 1 of the documentation. The key benefit of this product is the vertical depth accuracy of depth retrievals, which is ideal for a calibration tool. NGA and NOAA National Geodetic Survey (NGS), partnered to make this product available to the public for all US territories. ...

NASA SOTERIA Simulation Testbed Data

life sciencesneuroimagingtransportationworkload analysis

Commercial pilot simulation data during safety-of-flight scenarios.

Usage examples

Python Processing Code by Tyler Fettrow
SOTERIA Simulation - Experimental Methods, Data Processing, and Data Quality by Tyler Fettrow, Chad Stephens, Lance Prinzel, Jon Holbrook, Sepher Bastami, Michael Stewart, Kathryn Ballard, Daniel Kiggins

See 2 usage examples →

MWIS VR Instances

amazon.sciencegraphtraffictransportation

Large-scale node-weighted conflict graphs for maximum weight independent set solvers