Air Quality Data

This sample notebook demonstrates working with Google BigQuery datasets.

Dataset Details: bigquery-public-data:openaq

Description

OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.

Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).

Dataset Source: openaq.org

Category: Science

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://openaq.org/#/about?_k=s3aspo — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Hourly

This kernel shows how easy it can be to run a SQL query against a BigQuery table and get a pandas dataframe as the result. If you're interested in digging deeper, check out these references:

First we have to define the use of bigquery

In [0]:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated

Understand the data structure

It's helpful to inspect schema and a sample of the data we're working with

Understanding the EPA Data

In [0]:
%%bigquery --project <your project id> df
SELECT * FROM
  `bigquery-public-data.openaq.global_air_quality`
  where timestamp >= '2019-01-01'
  LIMIT 10
Out[0]:
location city country pollutant value timestamp unit source_name latitude longitude averaged_over_in_hours
0 Near Road ADA US pm10 15.000 2019-05-03 08:00:00+00:00 µg/m³ AirNow 43.593930 -116.381250 1.0
1 Near Road ADA US pm25 7.200 2019-05-03 08:00:00+00:00 µg/m³ AirNow 43.593930 -116.381250 1.0
2 Near Road ADA US bc 0.210 2019-05-03 08:00:00+00:00 µg/m³ AirNow 43.593930 -116.381250 1.0
3 FR33305 Ain FR pm10 15.200 2019-05-03 08:00:00+00:00 µg/m³ EEA France 46.211643 5.226389 1.0
4 FR33305 Ain FR no2 12.200 2019-05-03 08:00:00+00:00 µg/m³ EEA France 46.211643 5.226389 1.0
5 Ponca City KAY US pm25 4.900 2019-05-03 09:00:00+00:00 µg/m³ AirNow 36.697190 -97.081276 1.0
6 Ponca City KAY US so2 0.000 2019-01-17 15:00:00+00:00 ppm AirNow 36.697190 -97.081276 1.0
7 Ponca City KAY US pm10 9.000 2019-05-03 09:00:00+00:00 µg/m³ AirNow 36.697190 -97.081276 1.0
8 Lincoln MT1 US o3 0.023 2019-05-03 08:00:00+00:00 ppm AirNow 46.955997 -112.655396 1.0
9 Kamiah N/A US pm25 3.000 2019-05-03 05:00:00+00:00 µg/m³ AirNow 46.209400 -116.027500 1.0

OpenAQ dataset

The OpenAQ dataset is updated hourly to show a nearly live look at government-reported air quality around the world. With this dataset, you can answer questions like:

  • Where are the european hotspots for poor air quality right now?
  • How does one city compare to others?

Where are the european hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?

In [0]:
from IPython.display import IFrame
IFrame('your url', width=700, height=500)
Out[0]:

Global Hotspots

  • Where are the global hotspots for poor air quality right now?
  • How does one city compare to others?

Where are the global hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?

In [0]:
%%bigquery --project <your project id>
#standardSQL
SELECT 
 location, city, country, value, 
 CONCAT(CAST(latitude AS STRING), ', ', CAST(longitude AS STRING)) AS latlong  
FROM 
  `bigquery-public-data.openaq.global_air_quality` 
WHERE 
  pollutant = "pm10" AND timestamp >= '2019-01-01'
ORDER BY 
  value DESC
Out[0]:
location city country value latlong
0 Арвайхээр Арвайхээр хот MN 834.000000 46.258915, 102.78922
1 Sanjay Nagar, Ghaziabad - UPPCB Ghaziabad IN 792.000000 28.685383, 77.45384
2 Collectorate, Jodhpur - RSPCB Jodhpur IN 556.300000 26.26825, 73.01939
3 Nathu Colony, Ballabgarh - HSPCB Ballabgarh IN 449.290000 28.341925, 77.3197
4 Alipur, Delhi - DPCC Delhi IN 407.000000 28.815329, 77.15301
5 地区站 和田地区 CN 400.000000 37.1152, 79.9485
6 Pimpleshwar Mandir, Thane - MPCB Thane IN 361.210000 19.192057, 72.95852
7 古江巴格乡院内 和田地区 CN 352.000000 37.1013, 79.9117
8 睢阳区环保局 商丘市 CN 334.000000 34.407, 115.6386
9 Vasundhara, Ghaziabad - UPPCB Ghaziabad IN 294.000000 28.660336, 77.357254
10 Chandrapur, Chandrapur - MPCB Chandrapur IN 291.000000 19.645325, 77.63452
11 Loni, Loni Dehat - UPPCB Loni Dehat IN 279.000000 28.757294, 77.27879
12 环境监测站 商丘市 CN 279.000000 34.4286, 115.6697
13 Sector-D Industrial Area, Mandideep - MPPCB Mandideep IN 252.000000 23.10844, 77.51143
14 Indirapuram, Ghaziabad - UPPCB Ghaziabad IN 246.000000 28.646233, 77.35808
15 City Railway Station, Bengaluru - KSPCB Bengaluru IN 243.750000 12.975684, 77.56608
16 Mongol gazar Ulaanbaatar MN 242.000000 47.903538, 106.85071
17 西梁 朝阳市 CN 233.000000 41.5931, 120.4439
18 Ardhali Bazar, Varanasi - UPPCB Varanasi IN 233.000000 25.3506, 82.90831
19 Civil Lines, Ajmer - RSPCB Ajmer IN 216.250000 26.47086, 74.64659
20 RIICO Ind. Area III, Bhiwadi - RSPCB Bhiwadi IN 211.700000 28.194908, 76.8623
21 Vivek Vihar, Delhi - DPCC Delhi IN 211.000000 28.672342, 77.31526
22 Shastri Nagar, Narnaul - HSPCB Narnaul IN 205.580000 28.06025, 76.11311
23 Secretariat, Amaravati - APPCB Amaravati IN 204.000000 16.515083, 80.518166
24 North Campus, DU, Delhi - IMD Delhi IN 203.500000 28.657381, 77.15855
25 河医大 郑州市 CN 202.000000 34.7538, 113.6356
26 1-r khoroolol Ulaanbaatar MN 199.000000 47.91798, 106.84806
27 太极体育中心 焦作市 CN 195.000000 35.2067, 113.2725
28 Airoli, Navi Mumbai - MPCB Navi Mumbai IN 192.260000 19.1494, 72.9986
29 市一中 邵阳市 CN 191.000000 27.2317, 111.4733
... ... ... ... ... ...
3014 Heerlen-Looierstraat Heerlen NL -3.250000 50.888, 5.9705
3015 Vigernes Lillestrøm NO -3.401335 59.95548, 11.073983
3016 Amsterdam-A10 west Amsterdam NL -995.000000 52.3395, 4.84102
3017 De Rijp-Oostdijkje De Rijp NL -999.000000 52.5636, 4.8617
3018 Rotterdam-Oost Sidelinge A13 Rotterdam NL -999.000000 51.9385, 4.43069
3019 Maassluis-Kwartellaan Maassluis NL -999.000000 51.932, 4.22802
3020 Rotterdam Noord -Statenweg Rotterdam Noord NL -999.000000 51.9271, 4.46134
3021 Hoogvliet-Leemkuil Hoogvliet NL -999.000000 51.8674, 4.35524
3022 Badhoevedorp-Sloterweg Badhoevedorp NL -999.000000 52.334, 4.77401
3023 Amsterdam-Jan van Galenstraat Amsterdam NL -999.000000 52.3748, 4.86032
3024 Amsterdam-Spaarnwoude Amsterdam NL -999.000000 52.3984, 4.72858
3025 Ridderkerk-Hogeweg Ridderkerk NL -999.000000 51.8694, 4.58006
3026 Rotterdam Zuid-Zwartewaalstraat Rotterdam Zuid NL -999.000000 51.8936, 4.48753
3027 Hoek v. Holland-Berghaven Hoek v. Holland NL -999.000000 51.9778, 4.12194
3028 Maastricht-A2 Nassaulaan Maastricht NL -999.000000 50.8459, 5.71475
3029 Rotterdam Zuid-Pleinweg Rotterdam Zuid NL -999.000000 51.8911, 4.48069
3030 Amsterdam-Hoogtij Amsterdam NL -999.000000 52.428, 4.77348
3031 Amsterdam-Vondelpark Amsterdam NL -999.000000 52.3597, 4.86621
3032 Amsterdam-Westerpark Amsterdam NL -999.000000 52.394, 4.87016
3033 Amsterdam-Einsteinweg Amsterdam NL -999.000000 52.3813, 4.84523
3034 Velsen-Staalstraat Velsen NL -999.000000 52.4744, 4.6288
3035 Amsterdam-Stadhouderskade Amsterdam NL -999.000000 52.358, 4.8997
3036 Amsterdam-Van Diemenstraat Amsterdam NL -999.000000 52.39, 4.88781
3037 Geleen-Asterstraat Geleen NL -999.000000 50.9844, 5.82223
3038 Zaandam-Wagenschotpad Zaandam NL -999.000000 52.448, 4.81671
3039 Horst a/d Maas-Hoogheide Horst a/d Maas NL -999.000000 51.4541, 6.1082
3040 Beverwijk West-Creutzberglaan Beverwijk West NL -999.000000 52.4893, 4.64053
3041 Auckland Point Gladstone AU -1111.000000 -23.8351, 151.254
3042 CAL-JOAR - Caldas - E U Joaquín Aristizabal Caldas CO -9999.000000 6.0930777, -75.637764
3043 GIR-IECO - Institucion Educativa Colombia. Gir... Girardota CO -9999.000000 6.3785167, -75.443985

3044 rows × 5 columns

Panda DataFrame

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Panda Definition

In [0]:
df.describe()
Out[0]:
value latitude longitude averaged_over_in_hours
count 10.000000 10.000000 10.000000 10.0
mean 6.673300 42.646204 -85.861770 1.0
std 6.013368 4.289033 48.755539 0.0
min 0.000000 36.697190 -116.381250 1.0
25% 0.907500 38.421375 -116.292812 1.0
50% 6.050000 43.593930 -104.868336 1.0
75% 11.400000 46.211082 -97.081276 1.0
max 15.200000 46.955997 5.226389 1.0

pandas.DataFrame.head

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

In [0]:
df.head(10)
Out[0]:
location city country pollutant value timestamp unit source_name latitude longitude averaged_over_in_hours
0 Near Road ADA US pm10 15.000 2019-05-03 08:00:00+00:00 µg/m³ AirNow 43.593930 -116.381250 1.0
1 Near Road ADA US pm25 7.200 2019-05-03 08:00:00+00:00 µg/m³ AirNow 43.593930 -116.381250 1.0
2 Near Road ADA US bc 0.210 2019-05-03 08:00:00+00:00 µg/m³ AirNow 43.593930 -116.381250 1.0
3 FR33305 Ain FR pm10 15.200 2019-05-03 08:00:00+00:00 µg/m³ EEA France 46.211643 5.226389 1.0
4 FR33305 Ain FR no2 12.200 2019-05-03 08:00:00+00:00 µg/m³ EEA France 46.211643 5.226389 1.0
5 Ponca City KAY US pm25 4.900 2019-05-03 09:00:00+00:00 µg/m³ AirNow 36.697190 -97.081276 1.0
6 Ponca City KAY US so2 0.000 2019-01-17 15:00:00+00:00 ppm AirNow 36.697190 -97.081276 1.0
7 Ponca City KAY US pm10 9.000 2019-05-03 09:00:00+00:00 µg/m³ AirNow 36.697190 -97.081276 1.0
8 Lincoln MT1 US o3 0.023 2019-05-03 08:00:00+00:00 ppm AirNow 46.955997 -112.655396 1.0
9 Kamiah N/A US pm25 3.000 2019-05-03 05:00:00+00:00 µg/m³ AirNow 46.209400 -116.027500 1.0
In [0]:
from IPython.display import IFrame
IFrame('your url', width=700, height=500)
Out[0]: