This sample notebook demonstrates working with Google BigQuery datasets.
OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.
Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).
Dataset Source: openaq.org
Category: Science
Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://openaq.org/#/about?_k=s3aspo — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
Update Frequency: Hourly
This kernel shows how easy it can be to run a SQL query against a BigQuery table and get a pandas dataframe as the result. If you're interested in digging deeper, check out these references:
from google.colab import auth
auth.authenticate_user()print('Authenticated')
Authenticated
It's helpful to inspect schema and a sample of the data we're working with
%%bigquery --project torbjornzetterlund-1023 df
* FROM
SELECT -public-data.openaq.global_air_quality`
`bigquery>= '2019-01-01'
where timestamp 10 LIMIT
The OpenAQ dataset is updated hourly to show a nearly live look at government-reported air quality around the world. With this dataset, you can answer questions like:
Where are the european hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?
from IPython.display import IFrame
'https://datastudio.google.com/embed/reporting/1LFVmdo4GWsyWI9k55FFWdkkRo3HQwJzM/page/J4uE', width=700, height=500) IFrame(
Where are the global hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?
%%bigquery --project torbjornzetterlund-1023
#standardSQL
SELECT
location, city, country, value, ', ', CAST(longitude AS STRING)) AS latlong
CONCAT(CAST(latitude AS STRING),
FROM -public-data.openaq.global_air_quality`
`bigquery
WHERE = "pm10" AND timestamp >= '2019-01-01'
pollutant
ORDER BY value DESC
location | city | country | value | latlong | |
---|---|---|---|---|---|
0 | Gartnerløkka | Kristiansand | NO | 1516.6917 | 58.14703, 7.98662 |
1 | ES1610A | Illes Balears | ES | 971.0000 | 39.57008, 2.65583 |
2 | Арвайхээр | Арвайхээр хот | MN | 834.0000 | 46.258915, 102.78922 |
3 | Vindhyachal STPS, Singrauli - MPPCB | Singrauli | IN | 726.0000 | 24.10897, 82.64558 |
4 | Cerrillos | Cerrillos | CL | 664.0000 | -33.492847, -70.71944 |
... | ... | ... | ... | ... | ... |
3556 | Groningen-Europaweg | Groningen | NL | -999.0000 | 53.2178, 6.5789 |
3557 | Kollumerwaard-Hooge Zuidwal | Kollumerwaard | NL | -999.0000 | 53.3304, 6.27681 |
3558 | Environment Park | Townsville | AU | -1111.0000 | -19.2629, 146.831 |
3559 | Cannon Hill | South East Queensland | AU | -1111.0000 | -27.4654, 153.087 |
3560 | CAL-JOAR - Caldas - E U Joaquín Aristizabal | Caldas | CO | -9999.0000 | 6.0930777, -75.637764 |
3561 rows × 5 columns
Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.
Panda Definition
df.describe()
value | latitude | longitude | |
---|---|---|---|
count | 10.000000 | 10.000000 | 10.000000 |
mean | 100.920027 | 49.360041 | 20.569345 |
std | 246.712867 | 0.079072 | 1.031945 |
min | 0.416660 | 49.293564 | 19.960083 |
25% | 8.940828 | 49.293564 | 19.960083 |
50% | 21.772400 | 49.333886 | 19.994537 |
75% | 40.811650 | 49.406284 | 20.724357 |
max | 801.230000 | 49.483597 | 22.439453 |
This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.
10) df.head(
location | city | country | pollutant | value | timestamp | unit | source_name | latitude | longitude | averaged_over_in_hours | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | Zakopane, ul. Sienkiewicza | Zakopane | PL | co | 801.23000 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.293564 | 19.960083 | None |
1 | Zakopane, ul. Sienkiewicza | Zakopane | PL | no2 | 26.27990 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.293564 | 19.960083 | None |
2 | Zakopane, ul. Sienkiewicza | Zakopane | PL | o3 | 56.78670 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.293564 | 19.960083 | None |
3 | Zakopane, ul. Sienkiewicza | Zakopane | PL | pm10 | 41.50630 | 2020-01-15 12:00:00+00:00 | µg/m³ | GIOS | 49.293564 | 19.960083 | None |
4 | Zakopane, ul. Sienkiewicza | Zakopane | PL | so2 | 10.82920 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.293564 | 19.960083 | None |
5 | Polanczyk-mobilna | Polańczyk | PL | pm10 | 8.31137 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.374207 | 22.439453 | None |
6 | Polanczyk-mobilna | Polańczyk | PL | pm25 | 7.84754 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.374207 | 22.439453 | None |
7 | Krynica, ul. Bulwary Dietla | Krynica-Zdrój | PL | pm10 | 17.26490 | 2020-01-15 13:00:00+00:00 | µg/m³ | GIOS | 49.416977 | 20.956146 | None |
8 | Nowy Targ, Plac Słowackiego | Nowy Targ | PL | bc | 0.41666 | 2019-12-31 22:00:00+00:00 | µg/m³ | GIOS | 49.483597 | 20.028992 | None |
9 | Nowy Targ, Plac Słowackiego | Nowy Targ | PL | pm10 | 38.72770 | 2020-01-15 12:00:00+00:00 | µg/m³ | GIOS | 49.483597 | 20.028992 | None |
from IPython.display import IFrame
'https://datastudio.google.com/embed/reporting/1BsclZNxZ8Uvc-7gtUxBlQHN9uXJNOhWE/page/J4uE', width=700, height=500) IFrame(