Raw Data

METR_LA

Place: Los Angeles County, USA

Duration: Mar. 1, 2012 ~ Jun. 27, 2012

Link: https://github.com/liyaguang/DCRNN

Description: The METR-LA dataset collected in the highway by loop detectors, contains traffic speed data from 207 sensors.

LOS_LOOP

Place: Los Angeles County, USA

Duration: Mar. 1, 2012 ~ Jun. 27, 2012

Link: https://github.com/lehaifeng/T-GCN/tree/master/data

Description: It is slightly different from METR_LA, and the missing values are supplemented by linear interpolation.

SZ_TAXI

Place: Shenzhen, China

Duration: Jan. 1, 2015 ~ Jan. 31, 2015

Link: https://github.com/lehaifeng/T-GCN/tree/master/data

Description: The SZ-Taxi dataset contains the taxi trajectory of Shenzhen, including roads adjacency matrix and road traffic speed information.

LOOP_SEATTLE

Place: Greater Seattle Area, USA

Duration: over the entirely of 2015

Link: https://github.com/zhiyongc/Seattle-Loop-Data

Description: The Loop Seattle dataset is collected by the inductive loop detectors deployed on freeways (I-5, I-405, I-90, and SR-520) in Seattle area and contains traffic state data from 323 sensor stations.

Q_TRAFFIC

Place: Beijing, China

Duration: Apr. 1, 2017 ~ May 31, 2017

Link: https://github.com/JingqingZ/BaiduTraffic

Description: The Q-Traffic dataset contains three sub-datasets: query sub-dataset, traffic speed sub-dataset and road network sub-dataset.

PEMS

Place: California, USA

Duration: 2001 ~ present

Link: http://pems.dot.ca.gov

Description: PEMS records California highway speed data, including time_hour, average_time, lane_points.

PEMSD3

Place: District 3 of California, USA

Duration: Sept. 1, 2018 ~ Nov. 30, 2018

Link: https://github.com/Davidham3/STSGCN

Description: The PEMSD3 dataset includes 358 sensors and flow information.

PEMSD4

Place: San Francisco Bay Area, USA

Duration: Jan. 1, 2018 ~ Feb. 28, 2018

Link: https://github.com/Davidham3/ASTGCN/tree/master/data/PEMS04

Description: The PEMSD4 dataset describes the the speed flow occupancy information of California freeway and contains 3848 sensors on 29 roads.

PEMSD7

Place: District 7 of California, USA

Duration: Jul. 1, 2016 ~ Aug. 31, 2016

Link: https://github.com/Davidham3/STSGCN

Description: The PEMSD7 dataset contains traffic flow information from 883 sensor stations.

PEMSD8

Place: San Bernardino Area, USA

Duration: Jul. 1, 2016 ~ Aug. 31, 2016

Link: https://github.com/Davidham3/ASTGCN/tree/master/data/PEMS08

Description: The PEMSD8 dataset describes the speed occupancy of California freeways with data from 1979 sensors on 8 roads.

PEMSD7(M)

Place: District 7 of California, USA

Duration: the weekdays of May and June of 2012

Link: https://github.com/Davidham3/STGCN/tree/master/datasets

Description: The PEMSD7(M) dataset describes highway speed information at 228 stations in the 7th District of California.

PEMS_BAY

Place: San Francisco Bay Area, USA

Duration: Jan. 1, 2017 ~ Jun. 30, 2017

Link: https://github.com/liyaguang/DCRNN

Description: The PEMS-BAY dataset contains 6 months of statistics on traffic speed, including 325 sensors.

BEIJING_SUBWAY

Place: Beijing, China

Duration: Feb. 29, 2016 - Apr. 3, 2016

Link: https://github.com/JinleiZhangBJTU/ResNet-LSTM-GCN

Description: This dataset is collected from the Beijing subway between 05:00 and 23:00 for five consecutive weeks from February 29 to April 3, 2016. There were 17 lines and 276 subway stations (excluding the airport express line and the stations on it) in March 2016 in Beijing.

M_DENSE

Place: Madrid, Spain

Duration: Jan. 1, 2018 - Dec. 21, 2019

Link: https://github.com/rdemedrano/crann_traffic

Description: This dataset contains historical data of traffic measurements in the city of Madrid. The measurements are taken every 15 minutes at each point, including traffic intensity in number of cars per hour.

ROTTERDAM

Place: Rotterdam, Holland

Duration: 135 days of 2018

Link: https://github.com/RomainLITUD/DGCN_traffic_forecasting

Description: ROTTERDAM dataset contains traffic state information of 208 links.

SHMETRO

Place: Shanghai, China

Duration: Jul. 1, 2016 - Sept. 30, 2016

Link: https://github.com/ivechan/PVCGN

Description: This dataset was built based on the metro system of Shanghai, China. A total of 811.8 million transaction records were collected from Jul. 1st 2016 to Sept. 30th 2016, with 8.82 million ridership per day.

HZMETRO

Place: Hangzhou, China

Duration: Jan. 1, 2019 - Jan. 25, 2019

Link: https://github.com/ivechan/PVCGN

Description: This dataset was created with the transaction records of the Hangzhou metro system collected in January 2019. With 80 operational stations and 248 physical edges, this system has 2.35 million ridership per day.

TaxiBJ

Place: Beijing, China

Duration: Jul. 1, 2013 ~ Oct. 30, 2013, Mar. 1, 2014 ~ Jun. 30, 2014, Mar. 1, 2015 ~ Jun. 30, 2015 and Nov. 1, 2015 ~ Apr. 10, 2016

Link: https://github.com/TolicWang/DeepST/issues/3

Description: The TaxiBJ dataset contains the taxicab GPS data, including crowd flow, meteorology and holiday information.

T_DRIVE

Place: Beijing, China

Duration: Feb. 2, 2008 ~ Feb. 8, 2008

Link: https://www.microsoft.com/en-us/research/publication/t-drive-trajectory-data-sample/

Description: The T-Drive trajectory dataset sample containing the weekly trajectories of 10,357 Beijing taxis is about 15 million points, and the total distance of trajectories reaches 9 million kilometers.

PORTO

Place: Porto, Portugal

Duration: Jul. 1, 2013 ~ Jun. 30, 2014

Link: https://archive.ics.uci.edu/ml/datasets/Taxi+Service+Trajectory+-+Prediction+Challenge%2C+ECML+PKDD+2015

Description: The Porto dataset describes trajectories performed by all the 442 taxis running in the city of Porto, in Portugal.

NYCTAXI

Place: New York, USA

Duration: 2009 ~ present

Link: https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

Description: The NYC-Taxi dataset contains trajectories of different types of taxi collected by GPS for New York City from 2009 to 2020.

NYCTAXI_DYNA

  • NYCTAXI_DYNA is a dataset that counts the inflow and outflow of the region with an irregular area division method.

NYCTAXI_OD

  • NYCTAXI_OD is a dataset that counts the origin-destination flow between regions with an irregular area division method.

NYCTAXI_GRID

  • NYCTAXI_GRID is a dataset that counts the inflow and outflow of the region with a grid-base division method.

NYC_TOD

Place: New York, USA

Duration: 2014

Link: https://github.com/liulingbo918/CSTN#:~:text=download%20NYC-TOD.tar.gz%20with%20following%20links%20and%20put%20it%20into%20folder%20NYC-TOD/.

Description: NYC_TOD Calculate the inflow and outflow of the area using a grid-based division method。Generate by the author of CSTN

NYCBIKE

Place: New York, USA

Duration: Jun. 2013 ~ present

Link: https://www.citibikenyc.com/system-data

Description: The NYC-Bike dataset contains bike trajectories collected from NYC CitiBike system.

AUSTINRIDE

Place: Austin, USA

Duration: Jun. 4, 2016 ~ Apr. 13, 2017

Link: https://data.world/ride-austin/ride-austin-june-6-april-13

Description: The AustinRide dataset contains Austin ride trajectories spans from August 1, 2016 to April 13, 2017, including over 1.4 million trips.

BIKEDC

Place: Washington, USA

Duration: Sept. 20, 2010 ~ Oct. 2020

Link: https://www.capitalbikeshare.com/system-data

Description: The BikeDC dataset describes the bike trails of the Washington Bicycle System, which includes 472 stops.

BIKECHI

Place: Chicago, USA

Duration: Jun. 27, 2013 ~ 2018

Link: https://www.divvybikes.com/system-data

Description: The BikeCHI dataset shows the development of bike-sharing in Chicago from 2013 to 2018.

Foursquare

Duration: Apr. 12, 2012 ~ Feb. 16, 2013

Link: https://sites.google.com/site/yangdingqi/home/foursquare-dataset#h.p_ID_46

Description: Foursquare a location-based social networking website where users share their locations by checking-in. We use the second dataset in the link, which is the NYC and Tokyo Check-in Dataset. We preprocessed the raw data provided by the link and split it into Foursquare-TKY and Foursquare-NYC.

Gowalla

Place:

Duration: Feb. 2009 ~ Oct. 2010

Link: https://snap.stanford.edu/data/loc-gowalla.html

Description: Gowalla is a location-based social networking website where users share their locations by checking-in,containing information of users, users’ check-in time, users’ latitude, longitude,users’ location id.

Brightkite

Place: Global

Duration: Apr. 2008 ~ Oct. 2010

Link: http://snap.stanford.edu/data/loc-brightkite.html

Description: Brightkite is a location-based social networking website where users share their locations by checking-in,containing information of users, users’ check-in time, users’ latitude, longitude,users’ location id.

Instagram

Place: New York, USA

Duration: Jun. 15, 2011 - Nov. 8, 2016

Link: https://dmis.korea.ac.kr/cape

Description: The dataset’s biggest feature is that each check-in record contains not only the POI information but also the text information written when the user created the check-in record. Therefore, this dataset is particularly important for related researchs that incorporates trajectory semantic features into trajectory prediction.

Seattle

Place: Seattle,WA,USA

Duration: Jan. 17,2009 20:27:37~22:34:28

Link: https://www.microsoft.com/en-us/research/publication/hidden-markov-map-matching-noise-sparseness/

Description: This dataset (for Map Matching Task) shows a test GPS data taken on a drive in Seattle, WA, USA and its eastern suburbs. The trip starts in the upper right corner near Marymoor Park. The data was sampled at 1 Hz using a RoyalTek RBT-2300 GPS logger. The drive took place on Saturday, January 17, 2009 starting at 20:27:37 UTC (12:27:37 local time) and ending at 22:34:28 UTC (14:34:28 local time), for an elapsed time of 02:06:51.

Global

Place: 100 places all over the world

Duration:

Link: https://zenodo.org/record/57731#.YVwZ7WJBxnK

Description: This dataset (for Map Matching Task) is large enough to prove or disprove map-matching hypotheses on a world-wide scale. Because of the global coverage of this dataset, learning does not have to be be biased to the part of the world where the algorithm was tested.

BJ_ROADMAP

Place: Beijing, China

Duration: -

Link: -

Description: The origin dataset contains properties of nodes and edges in OpenStreetMap format. Two ways, including creating the graph whose nodes are intersections and relationships are road sections, and whose nodes are road sections and relationships are intersections, are implemented, named “bj_roadmap_node” and “bj_roadmap_edge” separately.

Chengdu_Taxi_Sample1

Place: Chengdu, China

Duration: Aug. 03, 2014 - Aug. 30, 2014

Link: https://github.com/UrbComp/DeepTTE/tree/master/data

Description: Chengdu_Taxi_Sample1 dataset is part of Chengdu_Taxi dataset. It contains 1800 taxi trajectory data in Chengdu.

Beijing_Taxi_Sample

Place: Beijing, China

Duration: Oct. 01, 2013 - Oct. 31, 2013

Link: https://github.com/YibinShen/TTPNet/tree/master/data

Description: Beijing_Taxi_Sample dataset is part of Beijing_Taxi dataset. It contains 1000 taxi trajectory data per day in October 2013.

NYC_RISK

Place: New York, USA

Duration: Jan.01,2013 ~ Dec.31,2013

Link: https://github.com/Echohhhhhh/GSNet

Description: The NYC accident dataset contains road, risk and POI data of New York City in 2013.

CHICAGO_RISK

Place: New York, USA

Duration: Feb.01,2016 ~ Sep.30,2016

Link: https://github.com/Echohhhhhh/GSNet

Description: The CHICAGO accident dataset contains road and risk data of Chicago in 2016.