Air Quality Data¶

This sample notebook demonstrates working with Google BigQuery datasets.

Dataset Details: bigquery-public-data:openaq¶

Description¶

OpenAQ is an open-source project to surface live, real-time air quality data from around the world. Their “mission is to enable previously impossible science, impact policy and empower the public to fight air pollution.” The data includes air quality measurements from 5490 locations in 47 countries.

Scientists, researchers, developers, and citizens can use this data to understand the quality of air near them currently. The dataset only includes the most current measurement available for the location (no historical data).

Dataset Source: openaq.org

Category: Science

Use: This dataset is publicly available for anyone to use under the following terms provided by the Dataset Source — https://openaq.org/#/about?_k=s3aspo — and is provided "AS IS" without any warranty, express or implied, from Google. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.

Update Frequency: Hourly

This kernel shows how easy it can be to run a SQL query against a BigQuery table and get a pandas dataframe as the result. If you're interested in digging deeper, check out these references:

First we have to define the use of bigquery¶

In [0]:

from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated

Understand the data structure¶

It's helpful to inspect schema and a sample of the data we're working with

Understanding the EPA Data

In [0]:

%%bigquery --project <your project id> df
SELECT * FROM
  `bigquery-public-data.openaq.global_air_quality`
  where timestamp >= '2019-01-01'
  LIMIT 10

Out[0]:

	location	city	country	pollutant	value	timestamp	unit	source_name	latitude	longitude	averaged_over_in_hours
0	Near Road	ADA	US	pm10	15.000	2019-05-03 08:00:00+00:00	µg/m³	AirNow	43.593930	-116.381250	1.0
1	Near Road	ADA	US	pm25	7.200	2019-05-03 08:00:00+00:00	µg/m³	AirNow	43.593930	-116.381250	1.0
2	Near Road	ADA	US	bc	0.210	2019-05-03 08:00:00+00:00	µg/m³	AirNow	43.593930	-116.381250	1.0
3	FR33305	Ain	FR	pm10	15.200	2019-05-03 08:00:00+00:00	µg/m³	EEA France	46.211643	5.226389	1.0
4	FR33305	Ain	FR	no2	12.200	2019-05-03 08:00:00+00:00	µg/m³	EEA France	46.211643	5.226389	1.0
5	Ponca City	KAY	US	pm25	4.900	2019-05-03 09:00:00+00:00	µg/m³	AirNow	36.697190	-97.081276	1.0
6	Ponca City	KAY	US	so2	0.000	2019-01-17 15:00:00+00:00	ppm	AirNow	36.697190	-97.081276	1.0
7	Ponca City	KAY	US	pm10	9.000	2019-05-03 09:00:00+00:00	µg/m³	AirNow	36.697190	-97.081276	1.0
8	Lincoln	MT1	US	o3	0.023	2019-05-03 08:00:00+00:00	ppm	AirNow	46.955997	-112.655396	1.0
9	Kamiah	N/A	US	pm25	3.000	2019-05-03 05:00:00+00:00	µg/m³	AirNow	46.209400	-116.027500	1.0

OpenAQ dataset¶

The OpenAQ dataset is updated hourly to show a nearly live look at government-reported air quality around the world. With this dataset, you can answer questions like:

Where are the european hotspots for poor air quality right now?
How does one city compare to others?

Where are the european hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?

In [0]:

from IPython.display import IFrame
IFrame('your url', width=700, height=500)

Out[0]:

Global Hotspots¶

Where are the global hotspots for poor air quality right now?
How does one city compare to others?

Where are the global hotspots for poor air quality right now (using concentrations of PM10: Particulate Matter with a size of 10 micrometers or less)?

In [0]:

%%bigquery --project <your project id>
#standardSQL
SELECT 
 location, city, country, value, 
 CONCAT(CAST(latitude AS STRING), ', ', CAST(longitude AS STRING)) AS latlong  
FROM 
  `bigquery-public-data.openaq.global_air_quality` 
WHERE 
  pollutant = "pm10" AND timestamp >= '2019-01-01'
ORDER BY 
  value DESC

Out[0]:

	location	city	country	value	latlong
0	Арвайхээр	Арвайхээр хот	MN	834.000000	46.258915, 102.78922
1	Sanjay Nagar, Ghaziabad - UPPCB	Ghaziabad	IN	792.000000	28.685383, 77.45384
2	Collectorate, Jodhpur - RSPCB	Jodhpur	IN	556.300000	26.26825, 73.01939
3	Nathu Colony, Ballabgarh - HSPCB	Ballabgarh	IN	449.290000	28.341925, 77.3197
4	Alipur, Delhi - DPCC	Delhi	IN	407.000000	28.815329, 77.15301
5	地区站	和田地区	CN	400.000000	37.1152, 79.9485
6	Pimpleshwar Mandir, Thane - MPCB	Thane	IN	361.210000	19.192057, 72.95852
7	古江巴格乡院内	和田地区	CN	352.000000	37.1013, 79.9117
8	睢阳区环保局	商丘市	CN	334.000000	34.407, 115.6386
9	Vasundhara, Ghaziabad - UPPCB	Ghaziabad	IN	294.000000	28.660336, 77.357254
10	Chandrapur, Chandrapur - MPCB	Chandrapur	IN	291.000000	19.645325, 77.63452
11	Loni, Loni Dehat - UPPCB	Loni Dehat	IN	279.000000	28.757294, 77.27879
12	环境监测站	商丘市	CN	279.000000	34.4286, 115.6697
13	Sector-D Industrial Area, Mandideep - MPPCB	Mandideep	IN	252.000000	23.10844, 77.51143
14	Indirapuram, Ghaziabad - UPPCB	Ghaziabad	IN	246.000000	28.646233, 77.35808
15	City Railway Station, Bengaluru - KSPCB	Bengaluru	IN	243.750000	12.975684, 77.56608
16	Mongol gazar	Ulaanbaatar	MN	242.000000	47.903538, 106.85071
17	西梁	朝阳市	CN	233.000000	41.5931, 120.4439
18	Ardhali Bazar, Varanasi - UPPCB	Varanasi	IN	233.000000	25.3506, 82.90831
19	Civil Lines, Ajmer - RSPCB	Ajmer	IN	216.250000	26.47086, 74.64659
20	RIICO Ind. Area III, Bhiwadi - RSPCB	Bhiwadi	IN	211.700000	28.194908, 76.8623
21	Vivek Vihar, Delhi - DPCC	Delhi	IN	211.000000	28.672342, 77.31526
22	Shastri Nagar, Narnaul - HSPCB	Narnaul	IN	205.580000	28.06025, 76.11311
23	Secretariat, Amaravati - APPCB	Amaravati	IN	204.000000	16.515083, 80.518166
24	North Campus, DU, Delhi - IMD	Delhi	IN	203.500000	28.657381, 77.15855
25	河医大	郑州市	CN	202.000000	34.7538, 113.6356
26	1-r khoroolol	Ulaanbaatar	MN	199.000000	47.91798, 106.84806
27	太极体育中心	焦作市	CN	195.000000	35.2067, 113.2725
28	Airoli, Navi Mumbai - MPCB	Navi Mumbai	IN	192.260000	19.1494, 72.9986
29	市一中	邵阳市	CN	191.000000	27.2317, 111.4733
...	...	...	...	...	...
3014	Heerlen-Looierstraat	Heerlen	NL	-3.250000	50.888, 5.9705
3015	Vigernes	Lillestrøm	NO	-3.401335	59.95548, 11.073983
3016	Amsterdam-A10 west	Amsterdam	NL	-995.000000	52.3395, 4.84102
3017	De Rijp-Oostdijkje	De Rijp	NL	-999.000000	52.5636, 4.8617
3018	Rotterdam-Oost Sidelinge A13	Rotterdam	NL	-999.000000	51.9385, 4.43069
3019	Maassluis-Kwartellaan	Maassluis	NL	-999.000000	51.932, 4.22802
3020	Rotterdam Noord -Statenweg	Rotterdam Noord	NL	-999.000000	51.9271, 4.46134
3021	Hoogvliet-Leemkuil	Hoogvliet	NL	-999.000000	51.8674, 4.35524
3022	Badhoevedorp-Sloterweg	Badhoevedorp	NL	-999.000000	52.334, 4.77401
3023	Amsterdam-Jan van Galenstraat	Amsterdam	NL	-999.000000	52.3748, 4.86032
3024	Amsterdam-Spaarnwoude	Amsterdam	NL	-999.000000	52.3984, 4.72858
3025	Ridderkerk-Hogeweg	Ridderkerk	NL	-999.000000	51.8694, 4.58006
3026	Rotterdam Zuid-Zwartewaalstraat	Rotterdam Zuid	NL	-999.000000	51.8936, 4.48753
3027	Hoek v. Holland-Berghaven	Hoek v. Holland	NL	-999.000000	51.9778, 4.12194
3028	Maastricht-A2 Nassaulaan	Maastricht	NL	-999.000000	50.8459, 5.71475
3029	Rotterdam Zuid-Pleinweg	Rotterdam Zuid	NL	-999.000000	51.8911, 4.48069
3030	Amsterdam-Hoogtij	Amsterdam	NL	-999.000000	52.428, 4.77348
3031	Amsterdam-Vondelpark	Amsterdam	NL	-999.000000	52.3597, 4.86621
3032	Amsterdam-Westerpark	Amsterdam	NL	-999.000000	52.394, 4.87016
3033	Amsterdam-Einsteinweg	Amsterdam	NL	-999.000000	52.3813, 4.84523
3034	Velsen-Staalstraat	Velsen	NL	-999.000000	52.4744, 4.6288
3035	Amsterdam-Stadhouderskade	Amsterdam	NL	-999.000000	52.358, 4.8997
3036	Amsterdam-Van Diemenstraat	Amsterdam	NL	-999.000000	52.39, 4.88781
3037	Geleen-Asterstraat	Geleen	NL	-999.000000	50.9844, 5.82223
3038	Zaandam-Wagenschotpad	Zaandam	NL	-999.000000	52.448, 4.81671
3039	Horst a/d Maas-Hoogheide	Horst a/d Maas	NL	-999.000000	51.4541, 6.1082
3040	Beverwijk West-Creutzberglaan	Beverwijk West	NL	-999.000000	52.4893, 4.64053
3041	Auckland Point	Gladstone	AU	-1111.000000	-23.8351, 151.254
3042	CAL-JOAR - Caldas - E U Joaquín Aristizabal	Caldas	CO	-9999.000000	6.0930777, -75.637764
3043	GIR-IECO - Institucion Educativa Colombia. Gir...	Girardota	CO	-9999.000000	6.3785167, -75.443985

3044 rows × 5 columns

Panda DataFrame¶

Generate descriptive statistics that summarize the central tendency, dispersion and shape of a dataset’s distribution, excluding NaN values.

Panda Definition

In [0]:

df.describe()

Out[0]:

	value	latitude	longitude	averaged_over_in_hours
count	10.000000	10.000000	10.000000	10.0
mean	6.673300	42.646204	-85.861770	1.0
std	6.013368	4.289033	48.755539	0.0
min	0.000000	36.697190	-116.381250	1.0
25%	0.907500	38.421375	-116.292812	1.0
50%	6.050000	43.593930	-104.868336	1.0
75%	11.400000	46.211082	-97.081276	1.0
max	15.200000	46.955997	5.226389	1.0

pandas.DataFrame.head¶

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

In [0]:

df.head(10)

Out[0]:

	location	city	country	pollutant	value	timestamp	unit	source_name	latitude	longitude	averaged_over_in_hours
0	Near Road	ADA	US	pm10	15.000	2019-05-03 08:00:00+00:00	µg/m³	AirNow	43.593930	-116.381250	1.0
1	Near Road	ADA	US	pm25	7.200	2019-05-03 08:00:00+00:00	µg/m³	AirNow	43.593930	-116.381250	1.0
2	Near Road	ADA	US	bc	0.210	2019-05-03 08:00:00+00:00	µg/m³	AirNow	43.593930	-116.381250	1.0
3	FR33305	Ain	FR	pm10	15.200	2019-05-03 08:00:00+00:00	µg/m³	EEA France	46.211643	5.226389	1.0
4	FR33305	Ain	FR	no2	12.200	2019-05-03 08:00:00+00:00	µg/m³	EEA France	46.211643	5.226389	1.0
5	Ponca City	KAY	US	pm25	4.900	2019-05-03 09:00:00+00:00	µg/m³	AirNow	36.697190	-97.081276	1.0
6	Ponca City	KAY	US	so2	0.000	2019-01-17 15:00:00+00:00	ppm	AirNow	36.697190	-97.081276	1.0
7	Ponca City	KAY	US	pm10	9.000	2019-05-03 09:00:00+00:00	µg/m³	AirNow	36.697190	-97.081276	1.0
8	Lincoln	MT1	US	o3	0.023	2019-05-03 08:00:00+00:00	ppm	AirNow	46.955997	-112.655396	1.0
9	Kamiah	N/A	US	pm25	3.000	2019-05-03 05:00:00+00:00	µg/m³	AirNow	46.209400	-116.027500	1.0

In [0]:

from IPython.display import IFrame
IFrame('your url', width=700, height=500)

Out[0]: