Overview
Measurement data from many experiments hosted on M-Lab are processed via the ETL pipeline and published in two forms:
- Archival Data
- M-Lab publishes raw output from many measurement tests on Google Cloud Storage as file archives.
- See M-Lab Archival Data documentation for more information.
- Google BigQuery
- M-Lab parses data for a subset of tests and publishes the data on BigQuery so that users can run SQL queries on the data.
- See M-Lab BigQuery QuickStart for more information.
Some M-Lab hosted tests do not use our ETL pipeline. Data for these tests are published independently by the test developers.
There is typically at least a 24-hour delay between data collection and data publication. Below we provide links to data for our Current Tests and archival data from Inactive or Retired Tests. Additionally, we list data from Current M-Lab Core Services as well as Retired M-Lab Core Services.
Measurement Data (Active Tests)
- NDT
- Network Diagnostic Tool (NDT) measures characteristics of a TCP connection under heavy load.
- NDT data is processed by the M-Lab ETL Pipeline.
- More technical information is available on GitHub.
- Protocols: ndt7, ndt5, web100
- NDT Raw Data - NDT Data in BigQuery
- Neubot DASH
- Neubot measured the Internet in order to gather data useful to study broadband performance, network neutrality, and Internet censorship.
- More information is available at Nexa Center and GitHub.
- Neubot Raw Data
- Reverse Traceroute
- Reverse traceroute measures the network path back to a user from selected network endpoints, and provides a rich source of information on network routing and topology.
- Reverse Traceroute data is not processed by the M-Lab ETL Pipeline.
- More information is available at Reverse Traceroute
- Reverse Traceroute Raw Data
- WeHe
- Wehe uses your device to exchange Internet traffic recorded from real, popular apps like YouTube and Spotify, and attempts to tell you whether your ISP is giving different performance to an app’s network traffic.
- More information is available from the WeHe website and GitHub.
- WeHe Raw Data
Current M-Lab Core Services and Platform Data
- Packet Headers
- Collects packet headers for all incoming TCP flows and saves each stream of packet header captures into a per-stream .pcap file.
- More information is available on Github.
- Packet Headers Raw Data.
- TCP INFO
- Collects statistics about the TCP connections running on the M-Lab platform using tcp-info.
- More information is available on Github.
- TCP INFO Raw Data.
- Traceroute
- M-Lab uses the Scamper traceroute tool from CAIDA to collect statistics about the TCP connections running on the M-Lab platform using tcp-info.
- More information is available from CAIDA.
- Traceroute Raw Data.
- Traceroute Data in BigQuery
- M-Lab Utilization Telemetry Data
- Since June 2016, M-Lab has collected high resolution switch telemetry for each M-Lab server and site uplink and published it in the utilization dataset.
- More information is available in the blog post announcing this dataset provides more information about the utilization dataset.
- M-Lab utilization Raw Data
- M-Lab utilization Data in BigQuery
Historical Data Sets (Inactive/Retired Tests)
- BISmark
- BISmark measures Internet service provider (ISP) performance and traffic inside home networks.
- BISmark data is not processed by the M-Lab ETL Pipeline.
- More information is available on the Project BISmark website and on the Project BISmark Open Development Portal
- BISmark Raw Data
- Glasnost
- Glasnost detected prioritization or censorship of network traffic.
- More information is available at MPI SWS and GitHub.
- Glasnost Raw Data (archived)
- MobiPerf
- MobiPerf is an open source application for measuring network performance on mobile platforms.
- MobiPerf data is not processed by the M-Lab ETL Pipeline.
- More information is available on the MobiPerf website
- MobiPerf Raw Data
- NPAD
- Network Path and Application Diagnosis (NPAD) diagnoses issues in a network path that can degrade network performance.
- NPAD data is processed by the M-Lab ETL Pipeline.
- More information is available from archived UCAR pages and GitHub.
- NPAD Raw Data
- OONI
- OONI measures censorship, surveillance, and traffic manipulation on the Internet.
- OONI data is not processed by the M-Lab ETL Pipeline.
- More information is available at OONI
- OONI Raw Data
- Pathload2
- Pathload2 measured the available bandwidth of an Internet connection.
- More information is available at https://code.google.com/p/pathload2-gatech/.
- Pathload2 Raw Data (archived)
- SamKnows
- The SamKnows performance testing platform is used by the USA’s Federal Communications Commission (FCC), European Commission, UK government (Ofcom), Brazilian government (Anatel), Singapore’s IDA and other government-backed studies worldwide.
- SamKnows infrastructure includes off-net test servers hosted by M-Lab, and the M-Lab and SamKnows teams coordinate regularly to support the various regulatory reporting periods of data collection conducted by SamKnows.
- SamKnows data is not processed by the M-Lab ETL Pipeline.
- More information is available at the SamKnows website
- ShaperProbe
- ShaperProbe detected prioritization of network traffic.
- Shaperprobe Raw Data (archived)
- Windrider
- WindRider attempted to detect whether your mobile provider was performing application- or service-specific differentiation.
Retired M-Lab Core Services
- Paris Traceroute
- Paris Traceroute maps network topology between two points on the Internet.
- Paris Traceroute data is processed by the M-Lab ETL Pipeline.
- More information is available at Paris Traceroute
- Paris Traceroute Raw Data - Paris Traceroute BigQuery Dataset
- SideStream
- SideStream collects TCP state information about completed TCP connections on a system.
- Sidestream data is processed by the M-Lab ETL Pipeline.
- More information is available on Github.
- Sidestream Raw Data - Sidestream BigQuery Dataset
Data License and Citing M-Lab Data
All data collected by M-Lab tests are available to the public without restriction under a No Rights Reserved Creative Commons Zero Waiver.
Please cite M-Lab data sets as follows:
The M-Lab test name Data Set, date range used. M-Lab test URL
For example:
The M-Lab NDT Data Set 2009-02-11–2015-12-21. https://measurementlab.net/tests/ndt
or, in BibTeX format:
@misc{mlab,
author="{Measurement Lab}",
title="The {M}-{L}ab {NDT} Data Set",
year="(2009-02-11 -- 2015-12-21)",
howpublished="\url{https://measurementlab.net/tests/ndt}",
comment="Depending on if you used viz.measurementlab.net, bigquery, or the raw data, please use one of the following notes:",
note="Bigquery table {\tt measurement-lab.ndt.download}",
note1="Google cloud storage {\tt gs://archive-measurement-lab/ndt}",
note2="Data visualization system \url{https://viz.measurementlab.net}",
}