New Traceroute Table and Schema Now Available
To make our traceroute data in BigQuery more useful, researchers have sought an easy way to reconstruct the path of hops for the same test. This task was particularly hard because the schema, which was designed many years ago, put the hops of the same test in different rows.
To address this need from many of our partners and researchers, M-Lab is delighted to announce that the traceroute
BigQuery table in the aggregate
dataset is now available to the public. The new traceroute
schema has one test per row, and all hops for a single test are inside the same row.
Here is the new schema, which also adds ASN annotation for traceroute source, destination, and hops:
The new BigQuery table preserves all information that was in the previous table, measurement-lab.base_tables.traceroute
, with better geo-location annotation coverage using time-based Maxmind databases. The geo-location annotation rate for traceroute source and destination are 100%. For traceroute tests since 2017, the geo-location annotation rate for hops almost doubled from ~20% in the BigQuery table, to >40% in current BigQuery table. The ASN annotation rate is about 97%.
Sample Queries
Here are some sample queries to access the traceroute
table with the new schema:
Count how many traceroute tests per day given a time range
SELECT
ts,
COUNT(*) AS num
FROM (
SELECT
DATE(TestTime) as ts
FROM `measurement-lab.aggregate.traceroute`
WHERE
DATE(TestTime) BETWEEN DATE("2016-01-01") AND DATE("2018-06-30")
)
GROUP BY ts
ORDER BY ts DESC
Count rate of source and destination IP being annotated with Geolocation information
SELECT
ts,
COUNTIF(s_long IS NOT NULL AND s_lant IS NOT NULL) / COUNT(*) AS s_geo_success_rate,
COUNTIF(d_long IS NOT NULL AND d_lant IS NOT NULL) / COUNT(*) AS d_geo_success_rate
FROM (
SELECT
DATE(TestTime) as ts,
Source.Geo.longitude AS s_long,
Source.Geo.latitude AS s_lant,
Destination.Geo.longitude AS d_long,
Destination.Geo.latitude AS d_lant
FROM `measurement-lab.aggregate.traceroute`
WHERE
DATE(TestTime) BETWEEN DATE("2016-11-01") AND DATE("2016-11-30")
)
GROUP BY ts
ORDER BY ts DESC
Count how many hops were in “New York” per day.
SELECT
ts,
COUNTIF(hop_ip IS NOT NULL) AS total_hops,
COUNTIF(hop_city = 'New York' ) AS num_ny_hops
FROM (
SELECT
DATE(TestTime) as ts,
Parseinfo.TaskFileName,
hops.Source.IP AS hop_ip,
hops.Source.City AS hop_city,
hops.Source.CountryCode AS hop_country
FROM
`measurement-lab.aggregate.traceroute` AS traceroute,
UNNEST(traceroute.Hop) as hops
WHERE
DATE(TestTime) BETWEEN DATE("2017-08-01") AND DATE("2017-08-31")
)
GROUP BY ts
ORDER BY ts DESC
Count how many hops were in a specific ASN
SELECT
ts,
COUNTIF(hop_ip IS NOT NULL) AS total_hops,
COUNTIF(hop_asn = 15169 ) AS num_asn_hops
FROM (
SELECT
DATE(TestTime) as ts,
hops.Source.IP AS hop_ip,
hops.Source.ASN AS hop_asn
FROM
`measurement-lab.aggregate.traceroute` AS traceroute,
UNNEST(traceroute.Hop) as hops
WHERE
DATE(TestTime) BETWEEN DATE("2017-08-01") AND DATE("2017-08-31")
)
GROUP BY ts
ORDER BY ts DESC
Count the annotation rate of ASN for hops
SELECT
ts,
COUNTIF(hop_ip IS NOT NULL) AS total_hops,
COUNTIF(hop_asn <> 0 )/COUNTIF(hop_ip IS NOT NULL) AS rate_asn_hops
FROM (
SELECT
DATE(TestTime) as ts,
hops.Source.IP AS hop_ip,
hops.Source.ASN AS hop_asn
FROM
`measurement-lab.aggregate.traceroute` AS traceroute,
UNNEST(traceroute.Hop) as hops
WHERE
DATE(TestTime) BETWEEN DATE("2017-08-01") AND DATE("2017-08-31")
)
GROUP BY ts
ORDER BY ts DESC