Data science is a hot topic just now, more and more people are talking about it and planning to include data science in their organizations. The data science practice is to extract knowledge and insights from data and information generated through various tools and applications.
If you’re new to data science or just trying to build a more robust data science portfolio, a perfect way of solidifying your skills is to participate in projects, assignments, including data visualization, data cleaning, and data science projects or machine learning projects to strengthen your skills.
Continuously practicing these projects and assignments can help you ace skills and excel in your career.
To help you out finding open public datasets to work with, we have compiled a list of public open datasets for your next data science project if it is in machine learning or Images datasets, NLP datasets, self-driving datasets
This list of public datasets sources is continuously updated, the data are collected from blogs, websites and user responses. Most of the data sets listed below are free, however, some are not.
List of Datasets
Datasets Provider
URL
Category
Description
1000 Genomes
Biology
Supporting open human variation data
10k US Adult Faces Database
Images
The datasets here have been assembled and made publicly available during Wilma's research career. Please cite the corresponding publication when you use any of these datasets.
2019 Novel Coronavirus COVID-19 Data Repository by Johns Hopkins CSSE
Healthcare
Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
2021 Portuguese Elections Twitter Dataset
Social Networks
This dataset contains tweets and users mostly from the Portuguese Twittersphere. The watched users stem from a seed of political accounts (usernames) and news sources(usernames_news)
2GB of Photos of Cats
Images
Over 9,000 images of cats with annotated facial features
38-Cloud (Cloud Detection)
Science
This data set includes Landsat 8 images and their manually extracted pixel-level ground truths for cloud detection.
3D Human beings dataset
Images
THE FIRST DATASET FOR COMPUTER VISION RESEARCH OF DRESSED HUMANS WITH SPECIFIC GEOMETRY REPRESENTATION FOR CLOTHES
3W dataset
Time Series
The first realistic and public dataset with rare undesirable real events in oil wells.
43k+ Donald Trump Twitter Screenshots
Social Networks
This archive contains screenshots of 43,475 Donald Trump tweets from May 2009 to May 2020.
53.5B Web clicks of 100K users in Indiana Univ.
Computer Networks
To foster the study of the structure and dynamics of Web traffic networks, we make available a large dataset (‘Click Dataset’) of about 53.5 billion HTTP requests made by users at Indiana Education.
A Twitter Dataset of 40+ million tweets related to COVID-19
Social Networks
A Twitter Dataset of 40+ million tweets related to COVID-19
Ably Open Realtime Data
Public Domains
Academic Torrents
Science
It has data used to publish scientific research papers. The variety of datasets is massive with availability of free download.
Academic Torrents of data sharing from UMB
Search Engines
Distributed system for sharing enormous datasets - for researchers, by researchers.
ACLED (Armed Conflict Location & Event Data Project)
Sciences
ACLED collects real-time data on the locations, dates, actors, fatalities, and types of all reported political violence and protest events across Africa, the Middle East, Latin America & the Caribbean, East Asia, South Asia, Southeast Asia, Central Asia & the Caucasus, Europe, and the United States of America.
Actuaries Climate Index
Climate/Weather
Actuaries Climate Index data is available for download. Data is currently available through May 2021 (Spring 2021). Download monthly and seasonal data by region and component.
Affective Image Classification
Images
AIcrowd Competitions
Data Challenges
Airbnb dataset
Public Domains
This site hosts all the listing data from Airbnb.
Airborne Object Detection and Tracking
Images
Airlines OD Data 1987-2008
Transportation
Alabama Real-Time Coastal Observing System
Science
Alberta, Province of Canada
Government
All-Age-Faces Dataset
Machine Learning
Allen Institute Datasets
Neuroscience
Amazon
Public Domains
Amazon Reviews
Public Domains
The dataset contains about 35 million amazon reviews.
American Economic Association
Economics
Datasets about Macroeconomic data.
American Economic Association (AEA)
Economics
American Gut (Microbiome Project)
Biology
American Gut open-access data and IPython notebooks - GitHub -
biocore/American-Gut: American Gut open-access data and IPython notebooks
American Ninja Warrior Obstacles
Sports
AMiner Citation Network Dataset
Complex Networks
AMPds
Energy
The Almanac of Minutely Power dataset
Analytics Vidhya
Public Domains
The datasets can be downloaded from the hundreds of data-hack competitions they organize
Animals with attributes
Images
Antwerp, Belgium
Government
AQUASTAT
Science
Global water resources and uses
ArcGIS Open Data portal
GIS
Archive-it from Internet Archive
Public Domains
Archive.org
Archive.org Datasets
Public Domains
Argentina (non official)
Government
Audi Autonomous Driving Dataset
Machine Learning
Audience Unfiltered faces for gender and age classification
Images
Austin, TX, US
Government
Australia (abs.gov.au)
Government
Australia (data.gov.au)
Government
Australian Weather
Climate/Weather
Austria (data.gv.at)
Government
Authoritarian Ruling Elites Database
Sciences
Automatic Keyphrase Extraction
Museums
Aviation Weather Center
Climate/Weather
#N/A
Awesome 3D Semantic City Models
GIS
Collection of open 3D semantic city
AWS COVID-19 Datasets
Healthcare
AWS datasets
Data Catalogues
The big has entered with hundreds of datasets. It’s no surprise if AWS hosts the largest datasets in the coming days.
Azure
Data Catalogues
Base dos Dados - Data Basis: Open Data Repository for Brazil
Search Engines
Baton Rouge, LA, US
Government
Beersheba, Israel
Government
Open Data Portal (Smart7 OpenData)
Belgium
Government
Berkeley Education’s Autonomous driving dataset
Machine Learning
Betfair Historical Exchange Data
Sports
Bike Share Systems (BSS) collection
Transportation
BIS Statistics
Finance
BIS statistics, compiled in cooperation with central
Blizzard Challenge Speech - The speech + text data comes from [...]
Museums
Blockmodo Coin Registry - A registry of JSON formatted information files [...]
Brazilian Weather - Historical data (In Portuguese)
Climate/Weather
Broad Bioimage Benchmark Collection (BBBC)
Biology
The Broad Bioimage Benchmark Collection (BBBC) is a collection of freely
downloadable microscopy image sets. In addition to the images themselves,
each set includes a description of the biological application and some type
of "ground truth" (expected results).
Broad Cancer Cell Line Encyclopedia (CCLE)
Biology
Bruteforce Database
Data Challenges
Buenos Aires, Argentina
Government
Bureau of Economic Analysis dataset
Government
Bureau of Labor Statistics
Government
CADDY Underwater Stereo-Vision Dataset of divers' hand gestures
Images
CAIDA Internet Datasets
Computer Networks
Calgary, AB, Canada
Government
Caltech Pedestrian Detection Benchmark
Images
Character Recognition in Natural Images
Cambridge, MA, US
Government
Cambridge, MA, US, GIS data on GitHub
GIS
Canada
Government
Canada Parliament dataset
Government
Text dataset for NLP tasks from Canadian Parliament.
Canada Science and Technology Museums Corporation's Open Data
Museums
Canadian Legal Information Institute
Sciences
Canadian Meteorological Centre
Climate/Weather
Cancer related dataset
Science
Carnegie Melon Education dataset
Education
5+ hours of Highway autonomous driving dataset.
CBOE Futures Exchange
Finance
CDC
Government
This is the dataset offered from the Centers for Disease Control and Prevention.
Cell Image Library
Biology
Center for Systemic Peace Datasets
Sciences
CERN Open Data Portal
Physics
Challenges in Machine Learning
Data Challenges
Chars74K dataset
Images
Charting The Global Climate Change News Narrative 2009-2020
Climate/Weather
Cheng-Caverlee-Lee September 2009
Social Networks
Chicago
Government
Chile
Government
China
Government
China Biographical Database
Social Networks
CIFAR-10
Images
Image Classification dataset.
City of Berkeley Open Data
Government
Climate Data from UEA
Climate/Weather
Climate Data Store
Climate/Weather
Sea surface temperature daily data from 1981 to present derived from satellite observations
CLiPS Stylometry Investigation Corpus
Museums
ClueWeb09 - 1B web pages
Computer Networks
ClueWeb09 FACC
Museums
ClueWeb12 - 733M web pages
Computer Networks
ClueWeb12 FACC
Museums
CMU datasets
Education
CMU Enron Email of 150 users
Social Networks
CMU JASA data archive
Public Domains
CMU StatLab collections
Public Domains
Code duplicates
Software
CodeNeuro Datasets
Neuroscience
COIL 100
Software
Image dataset of objects photographed at 360 degree rotation at every possible angle from Columbia Education.
Collaborative Research in Computational Neuroscience (CRCNS)
Neuroscience
COMBED
Energy
Comma.ai
Machine Learning
5+ hours of Highway autonomous driving dataset.
Commit messages
Software
CommonCrawl Web Data over 7 years
Computer Networks
Community Resource for Archiving Wireless Data At Dartmouth
Complex Networks
Complete FAANG Stock data
Finance
Complete Genomics Public Data
Biology
Composition of Foods Raw, Processed, Prepared USDA National Nutrient Database for Standard
Healthcare
Computer Vision dataset
Machine Learning
Massive set of computer vision dataset organized by categories
Context-aware data sets from five domains
Machine Learning
Cooper-Hewitt's Collection Database
Museums
Corona Virus dataset
Healthcare
Coronavirus (Covid-19) Data in the United States
Healthcare
Correlates of War Project
Sciences
COVID-19 Case Surveillance Public Use Data
Healthcare
COVID-19 Reported Patient Impact and Hospital Capacity by Facility
Healthcare
CRAWDAD Wireless datasets from Dartmouth Univ.
Computer Networks
Cricsheet Matches (cricket)
Sports
Criteo click-through data
Computer Networks
CrossRef DOI URLs
Complex Networks
All the journal article DOIs from CrossRef's OAI-PMH server; URLs of just
under 50 million journal articles.
CrowdANALYTIX dataX
Data Challenges
Cryptome Conspiracy Theory Items
Sciences
Crystallography Open Database
Physics
CS:GO Competitive Matchmaking Data
eSports
Cube++
Images
4890 raw 18-megapixel images, each containing a SpyderCube color
Cytology Dataset
Images
CCAgT: Images of Cervical Cells with AgNOR Stain
D4D Challenge of Orange
Data Challenges
Dallas Open Data
Government
Danbooru Tagged Anime Illustration Dataset
Images
Data Driven
Data Challenges
Data Driven hosts competitions for Data Scientists to tackle the real world leading to a social impact. They have datasets from their competitions.
Data Packaged Core Datasets
Public Domain
Data.gov
Government
It consists of a variety of datasets from US Government agencies. Domains include Education, Climate, Food, Chronic disease and what not.
Data.gov.uk
Government
Thousands of datasets from the UK Govt.
Data.World
Data Catalogue
Data360
Public Domains
Databanks International Cross National Time Series Data Archive
Time Series
Database of all continents, countries, States/Subdivisions/Provinces
GIS
Database of Scientific Code Contributions
Sciences
DataBC
Government
data from the Province of British Columbia
Datacards
Sciences
Datahub.io
Search Engines
Datasets on Github
It hosts tons of awesome datasets. This github boasts a variety of datasets such as Climate Data, Time Series data, Plane crash data etc. Feel free to dig in.
Datos Argentina
Government
Portal de datos abiertos de la República Argentina.
DBFC
Energy
Direct Borohydride Fuel Cell (DBFC) Dataset
DBLP Citation dataset
Complex Networks
DBnomics – the world's economic database
Economics
DBpedia
Structured dataset from Wikipedia
DBpedia - Structured data from Wikipedia
Museums
Debt to the Penny
Government
The Debt to the Penny dataset provides information
DEL
Energy
Domestic Electrical Load study datsets for South Africa (1994 - 2014)
Delve Datasets for classification and regression
Machine Learning
Densely Annotated Video Driving Data Set
Images
Denver Open Data
Government
DIMACS Road Networks Collection
Complex Networks
Dirty Words
Museums
Discogs Monthly Data
Machine Learning
Dog Image dataset from Stanford
Domains Project - Sorted list of Internet domains
Search Engines
DrivenData Competitions for Social Good
Data Challenges
DukeMTMC Data Set
Images
Durham, NC Open Data
Government
Dutch Traffic Information
Transportation
Dutch Weather
Climate/Weather
Earth Models
Science
eBay Online Auctions (2012)
Machine Learning
EBI ArrayExpress
Biology
EBI Protein Data Bank in Europe
Biology
ECO
Energy
EconData from UMD
Economics
Economic Freedom of the World Data
Economics
Edmonton, AB, Canada
Government
EDRM Enron EMail of 151 users, hosted on S3
Social Networks
EHDP Large Health Data Sets
Healthcare
EIA
Energy
Electron Microscopy Pilot Image Archive (EMPIAR)
Biology
EMPIAR, the Electron Microscopy Public Image Archive centered at EMBL-EBI,
is a public resource for raw electron microscopy images related to EMDB,
contains micrographs, particle sets and tilt-series.
Email data from Enron
ENCODE project
Biology
England LGInform
Government
Enigma Public
World’s broadest collection of open source datasets.
Enigma Public
Public Domains
Enigma provides accurate, timely business data about the identity and
financial health of small and medium businesses in the US.
Ensembl Genomes
Biology
Providing genome data for non-vertebrate species, with tools for the
manipulation, analysis and visualisation of that data
EOPC-DE-Early-Onset-Prostate-Cancer-Germany
Healthcare
The ICGC Data Portal provides tools for visualizing, querying and
downloading the data released quarterly by the consortium's member projects.
EOSDIS - NASA's earth observing system data
Science
Equity in Athletics
Sports
Ergast Formula 1, from 1950 up to date (API)
Sports
ETH Entomological Collection (ETHEC) Fine Grained Butterfly (Lepidoptra) Images
Images
European Climate Assessment & Dataset
Climate/Weather
European Social Survey
Sciences
European Union dataset
EuroStat
Government
EveryPolitician
Government
Ongoing project collating and sharing data on every
Face Recognition Benchmark
Images
Face Recognition - Databases
Facebook Data Scrape (2005)
Social Networks
Facebook data scrape related to paper The Social Structure of Facebook
Networks, by Amanda L. Traud, Peter J. Mucha, Mason A. Porter. We study the
social...
Facebook Social Connectedness Index
Social Networks
Facebook Social Networks from LAW (since 2007)
Social Networks
Factual Global Location Data
GIS
FBI
This site hosts crime data in the US
FBI Hate Crime 2013 - aggregated data
Sciences
Contribute to emorisse/FBI-Hate-Crime-Statistics development by creating an
account on GitHub.
FCP-INDI
Neuroscience
Federal Committee on Statistical Methodology (FCSM) (formerly FedStats)
Government
FIFA-2021 Complete Player Dataset
eSports
This Data set contains data of the players in FIFA-2021
Financial Times dataset
Detailed public dataset about Financial market
Finland
Government
Suomi.fi Open Data is Finland’s open data catalogue. The portal gathers all
open data published in Finland in a single service. Come, use it, or
publish via us!
FiveThirtyEight
They have a wide variety of datasets on their Github. The specialty of this site is that they have a detailed data dictionary explaining each of the dataset which is very beneficial. I absolutely lovee their FIFA dataset. (*Proud Gunner).
Flickr Personal Taxonomies
Museums
FLOSSmole data about free, libre, and open source software development
Software
Football/Soccer resources (data and APIs)
Sports
#N/A
Foursquare from UMN/Sarwat (2013)
Social Networks
This data set contains 2153471 users, 1143092 venues, 1021970 check-ins,
27098490 social connections, and 2809581 ratings that users assigned to
venues; all...
Fragile States Index
Sciences
France
Government
data.gouv.fr dataset search
Fredericton, NB, Canada
Government
#N/A
Free Music Archive
Machine Learning
FMA: A Dataset For Music Analysis. Contribute to mdeff/fma development by
creating an account on GitHub.
Freebase of people, places, and things
Museums
Gapminder World demographic databases
Healthcare
Gatineau, QC, Canada
Government
GDC
Healthcare
GDC supports several cancer genome programs for CCG, TCGA, TARGET etc.
GDELT Global Events Database
Sciences
The GDELT Project
GDXray
Images
X-ray images for X-ray testing and Computer Vision
High Quality GeoJSON maps programmatically generated
Geo Spatial Data from ASU
GIS
Geo Wiki Project
GIS
Citizen-driven Environmental Monitoring
GeoFabrik
GIS
OSM data extracted to a variety of formats and areas
GeoLife GPS Trajectory from Microsoft Research
Transportation
GeoNames Worldwide
GIS
German Climate Data Center
Climate/Weather
German Federal Office for Radiation Protection (Bundesamt für Strahlenschutz)
Government
The German Federal Office for Radiation Protection (Bundesamt für Strahlenschutz) has a comprehensive database of smartphones - new and old - and the level of radiation they emit
German Political Speeches Corpus
Museums
German Social Survey
Sciences
German train system by Deutsche Bahn
Transportation
Germany
Government
Ghent, Belgium
Government
GHTorrent
Software
GitHub Collaboration Archive
Social Networks
Glasgow, Scotland, UK
Government
Global Administrative Areas Database (GADM)
GIS
Global Biotic Interactions (GloBI)
Biology
Global Climate Data Since 1929
Climate/Weather
Global Economic Complexity data
Global Power Plant Database
Energy
Global Religious Futures Project
Sciences
Global Wind Atlas
Science
Google
Public Domains
Google Books
Google Books Ngrams
Museums
Google dataset
This dataset is specifically for accurate landmark recognition
Google Finance
Finance
Google MC-AFP
Museums
Google Open Images
Google Public Datasets
Google has hosted tons of datasets on Google Public Datasets which is basically their Cloud Platform. You can browse through their dataset collection using BigQuery. The first 1 Terabyte of queries you make are basically free.
Google Scholar citation relations
Social Networks
Google Trends
Finance
Google Trends dataset
Google Web 5gram
Museums
Grand Comics Database
Public Domains
Graviti Open Datasets
Data Catalogues
Open Datasets
Greece
Government
Guardian world governments
Government
Gun Violence Data
Sciences
Gutenberg eBooks List
Museums
Halifax, NS, Canada
Government
Hansards text chunks of Canadian Parliament
Museums
Hard Drive Failure Rates
Time Series
Harvard Dataverse Network of scientific data
Search Engines
Harvard Medical School (HMS) LINCS Project
Biology
Harvard Education dataset
Education
Heart Rate Time Series from MIT
Time Series
Helsinki Region, Finland
Government
HES - Household Electricity Study, UK
Energy
HFED
Energy
High-Resolution Contact Networks from Wearable Sensors
Social Networks
Historical MacroEconomic Statistics
Economics
Homeland Infrastructure Foundation-Level Data
GIS
Hong Kong, China
Government
Houston, TX, US
Government
Hubway Million Rides in MA
Transportation
Human Connectome Project
Neuroscience
Human Genome Diversity Project
Biology
Human Microbiome Project (HMP)
Biology
HumanEva Dataset
Images
Humanitarian Data Exchange
Sciences
Hyperspectral benchmark dataset on soil moisture
Agriculture
Hyperspectral benchmark dataset on soil moisture
iAWE
Energy
IceCube - South Pole Neutrino Observatory
Physics
ICOS PSP Benchmark
Biology
ICPSR (UMICH)
Search Engines
ICWSM Data Challenge (since 2009)
Data Challenges
IEEE Geoscience and Remote Sensing Society DASE Website
GIS
Image dataset of Human Face
ImageNet
ImageNet is an image database consisting of images organized according to the WordNet hierarchy.
ImageNet (in WordNet hierarchy)
Images
IMDb Database
Machine Learning
Indian Government Data
Government
Indian Government datasets
Indie Map: social graph and crawl of top IndieWeb sites
Social Networks
Indonesian Data Portal
Government
Indoor Image dataset from MIT
Indoor Scene Recognition
Images
Infochimps
Public Domains
INFORM Index for Risk Management
Sciences
Informatics for Integrating Biology & the Bedside
Healthcare
INFORUM
Economics
Interindustry Forecasting at the Education of Maryland
Instagram Graph API
Institute for Demographic Studies
Sciences
Institute of Education Sciences
Search Engines
Integrated Marine Observing System (IMOS)
Science
International Affective Picture System, UFL
Images
International HapMap Project
Biology
International Monetary Fund Public data
Economic
International Networks Archive
Sciences
International Social Survey Program ISSP
Sciences
International Studies Compendium Project
Sciences
International Trade Statistics
Economics
Internet Product Code Database
Economics
Internet-Wide Scan Data Repository
Computer Networks
Iowa - Welcome to the State of Iowa's data portal. Please explore data [...]
Government
Iranis
Machine Learning
A Large-scale Dataset of Farsi/Arabic License Plate Characters
Ireland's Open Data Portal
Government
Israel's Open Data Portal
Government
Istanbul Municipality Open Data Portal
Government
Italy
Government
Il Portale dati.gov.it è il catalogo nazionale dei metadati
Jail deaths in America
Government
The U.S. government does not release jail
James McGuire Cross National Data
Sciences
Japan
Government
Jeopardy Quiz show dataset
Joint External Debt Data Hub
Economics
Jon Haveman International Trade Data Links
Economics
Kaggle
Data Catalogues
Kaggle is the world's largest data science community with powerful tools and resources to help you achieve your data science goals.
Kaggle Competition Data
Data Challenges
KDD Cup by Tencent 2012
Data Challenges
KDD Cups
KDD is a competition held by ACM on Knowledge Discovery and Data Mining hosting datasets with detailed data dictionaries & instructions.
KDNuggets Data Collections
Public Domains
KDNuggets dataset
Keel Repository for classification, regression and time series
Machine Learning
KEGG
Biology
KITTI Vision Benchmark Suite
Images
Labeled Faces in the Wild (LFW)
Machine Learning
Labeled Information Library of Alexandria
Images
Labelme
Image dataset from MIT
Lahman's Baseball Database
Sports
Landsat 8 on AWS
GIS
Large Movie Review dataset from Stanford
Laval, QC, Canada
Government
Lemons quality control dataset
Agriculture
Lemons qualoity control dataset
Lending Club Loan Data
Machine Learning
LendingClub
The site hosts massive datasets about loan related data. You have to create an account to access the data.
Lexington, KY
Government
Libraries.io Open Source Repository and Dependency Metadata
Software
Ligo Open Science Center (LOSC)
Physics
List of all countries in all languages
GIS
LJ Speech
Museums
Localytics Data Visualization Challenge
Data Challenges
LODUM
Datasets from Education of Munster.
London Datastore, UK
Government
London, ON, Canada
Government
Long-Term Productivity Database - The Long-Term Productivity database was [...]
Economics
Los Angeles Open Data
Government
Luxembourg - Luxembourgish Open Data Portal
Government
M-AILabs Speech
Museums
Machine Comprehension Test (MCTest) of text from Microsoft Research
Museums
Machine Learning Data Set Repository
Machine Learning
Machine Translation of European languages
Museums
MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste
Sciences
Making Sense of Microposts 2013
Museums
Making Sense of Microposts 2016
Museums
Marinexplore - Open Oceanographic Data
Science
Mass Mobilization Data Project
Sciences
MassGIS, Massachusetts, U.S.
Government
Massive Visual Memory Stimuli, MIT
Images
MeDAL
Healthcare
A large medical text dataset curated for abbreviation
Medical Insurance dataset
Medicare Coverage Database (MCD), U.S.
Healthcare
Medicare Data Engine of medicare.gov Data
Healthcare
Medicare Data File
Healthcare
Medicare related dataset
Healthcare
MeSH, the vocabulary thesaurus used for indexing articles for PubMed
Healthcare
Metastatic-Prostate-Adenocarcinoma-MCTP
Healthcare
Metastatic-Prostate-Cancer-SU2CPCF-Dream-Team
Healthcare
Metropolitan Museum of Art Collection API
Museums
Metropolitan Transportation Commission (MTC), California, US
Government
Mexico
Government
Microsoft Academic Knowledge Graph
Sciences
Microsoft Academic Research data
Sciences
Microsoft Azure Data Market Free DataSets
Public Domains
Microsoft Data Science for Research
Public Domains
Microsoft MAchine Reading COmprehension Dataset (or MS MARCO)
Museums
Microsoft Research Open Data
Public Domains
Million Song Dataset
Machine Learning
Minneapolis Institute of Arts metadata
Museums
Minnesota Population Center
Sciences
MIRAGE-2019
Computer Networks
MIRAGE-2019 is a human-generated dataset for mobile traffic
Mississauga, ON, Canada
Government
View current and historical planning data, such as information on
population, demographics, census, development, growth forecasts, housing,
employment, office and land use.
MIT dataset
Education
MIT Reality Mining Dataset
Sciences
MNIST database of handwritten digits, near 1 million examples
Images
MNIST dataset
This is a repository containing a hand-written digit dataset (About 60,000 samples).
Mobile Social Networks from UMASS
Social Networks
Moldova
Government
Moncton, NB, Canada
Government
City of Moncton - Open Data
Montreal BIXI Bike Share
Transportation
Vous cherchez un moyen économique, écologique et pratique pour vous
déplacer? Explorez Montréal en faisant la location d'un vélo BIXI!
Multi-View Region of Interest Prediction Dataset for Autonomous Driving
Images
NaF-Prostate
Healthcare
NASA dataset
Physics
Massive set of datasets pertaining to Space
NASA Exoplanet Archive
Physics
NASA Global Imagery Browse Services
Climate/Weather
NASDAQ
Finance
National Estuarine Research Reserves System-Wide Monitoring Program
Science
National Technical Reports Library
Search Engines
National Technical Report Library
National Weather Service GIS Data Portal
GIS
Natural Earth - vectors and rasters of the world
GIS
Natural History Museum (London) Data Portal
Museums
NaturalLanguage
Museums
A topic-centric list of HQ open datasets. Contribute to
awesomedata/awesome-public-datasets development by creating an account on
GitHub.
NBER Patent Citations
Complex Networks
NCBI Proteins
Biology
NCBI Taxonomy
Biology
NCI Genomic Data Commons
Biology
NDAR
Neuroscience
Netflix Prize
Data Challenges
Netherlands
Government
Network Repository with Interactive Exploratory Analysis Tools
Complex Networks
The first interactive network dataset repository with real-time interactive
graph visualization and analytics
Network Twitter Data
Social Networks
NeuroData
Neuroscience
Neuroelectro
Neuroscience
Neuroendocrine-Prostate-Cancer
Healthcare
NeuroMorpho
Neuroscience
New York Department of Sanitation Monthly Tonnage
Government
New York State Education Department Data
Education
New Yorker caption contest ratings
Machine Learning
Data from the caption contest. Contribute to nextml/caption-contest-data
development by creating an account on GitHub.
New Zealand
Government
Find Stats NZ's information releases, news stories, and reports grouped by
topic.
Newspaper Navigator
Images
Explore the visual and textual content within the Chronicling America
digitized newspaper collection in new ways using machine learning!
NFL play-by-play data
Sports
NIMH Data Archive
Neuroscience
NIST complex networks data collection
Complex Networks
NOAA Bering Sea Climate
Climate/Weather
NOAA Climate Datasets
Climate/Weather
These links provide quick access to many of NCEI's climate and weather
datasets, products, and various web pages and resources. Related Content
NOAA Realtime Weather Models
Climate/Weather
#N/A
NOAA SURFRAD Meteorology and Radiation Datasets
Climate/Weather
The Global Monitoring Laboratory conducts research on greenhouse gas and
carbon cycle feedbacks, changes in clouds, aerosols, and surface radiation,
and recovery of stratospheric ozone.
Noisy speech database for training speech enhancement algorithms and TTS
Museums
Notre Dame Global Adaptation Index (ND-GAIN)
Sciences
NPCR-2001-2015
Healthcare
De-identified cancer incidence data are available to researchers for free
in these databases.
NPCR-2005-2015
Healthcare
De-identified cancer incidence data are available to researchers for free
in these databases.
NSSDC (NASA) data of 550 space spacecraft
Physics
Loading...
Number of Ebola Cases and Deaths in Affected Countries (2014)
Healthcare
NYC betanyc
Government
NYC Open Data
Government
NYC Open Data helps New Yorkers use and learn about City data
NYC Taxi Trip Data 2009-
Transportation
NYC Taxi Trip Data 2013
Transportation
FOIA/FOILed Taxi Trip Data from the NYC Taxi and Limousine Commission 2013.
Released by http://chriswhong.com/open-data/foil_nyc_taxi/ trip_data.7z
and...
NYC Uber trip data April 2014 to September 2014
Transportation
Uber trip data from a freedom of information request to NYC's Taxi &
Limousine Commission - GitHub - fivethirtyeight/uber-tlc-foil-response:
Uber trip data from a freedom of information request to NYC's Taxi &
Limousine Commission
Oakland, California, US
Government
OANDA
Finance
#N/A
OASIS
Neuroscience
Loading...
OECD
Government
Find, compare and share the latest OECD data: charts, maps, tables and
related publications
Oil and Gas Authority Open Data
Science
Discover, analyze and download data from ArcGIS Hub. Download in CSV, KML,
Zip, GeoJSON, GeoTIFF or PNG. Find API links for GeoServices, WMS, and WFS.
Analyze with charts and thematic maps. Take the next step and create
StoryMaps and Web Maps.
Oklahoma
Government
OONI: Open Observatory of Network Interference
Computer Networks
The Open Observatory of Network Interference (OONI) is a global community
measuring internet censorship around the world. Run OONI Probe to detect
internet censorship. Use OONI Explorer to track internet censorship
worldwide in near real-time.
Open Crime and Policing Data in England, Wales and Northern Ireland
Sciences
Open Data Certificates (beta)
Search Engines
Open Data for Africa
Government
Open Government Data (OGD) Platform India
Government
Loading...
Open Image dataset
Images
Loading...
Open Images From Google
Images
Open Library Data Dumps
Public Domains
Open Library is an open, editable library catalog, building towards a web
page for every book ever published. Read, borrow, and discover more than 3M
books for free.
Open Mobile Data by MobiPerf
Computer Networks
Google Cloud Platform lets you build, deploy, and scale applications,
websites, and services on the same infrastructure as Google.
Open Multilingual Wordnet
Museums
Open Traffic collection
Transportation
Collection of open data resources for traffic information - GitHub -
graphhopper/open-traffic-collection: Collection of open data resources for
traffic information
Open-ODS (structure of the UK NHS)
Healthcare
This domain name has been registered with Gandi.net. It is currently parked
by the owner.
OpenAddresses
GIS
OpenCorporates Database of Companies in the World
Economics
#N/A
OpenDataNetwork - A search engine of all Socrata powered data portals
Search Engines
Find the data you need to power your business, app, or analysis from across
the open data ecosystem.
OpenDataPhilly
Government
OpenDataPhilly is a catalog of open data in the
OpenDataSoft's list of 1,600 open data
Government
Follow the news of the Opendatasoft community. Get the latest news in your
inbox by subscribing to our newsletter!
OpenDota data dump
eSports
OpenFlights - airport, airline and route data
Transportation
OpenfMRI
Neuroscience
OpenNEURO
Neuroscience
OpenPaymentsData, Healthcare financial relationship data
Healthcare
OpenSanctions
Sciences
OpenSanctions helps investigators find leads, allows companies to manage
risk and enables technologists to build data-driven products.
OpenStreetMap (OSM)
GIS
Optimized Soil Adjusted Vegetation Index
Agriculture
Optimized Soil Adjusted Vegetation Index
Oregon
Government
OSU Cognitive Modeling Repository Datasets
Psychology+Cognition
OSU Financial data
Finance
Ottawa, ON, Canada
Government
City of Ottawa Open Data (Open Ottawa)
Our World in Data
Economics
Oxford Autonomous Driving dataset
Machine Learning
Palmer Penguins
Biology
Palo Alto, California, US
Government
Pathguid
Biology
Paul Hensel General International Data Page
Sciences
Best Quality Drugs! Valid pharmacy recognized by the CFA. 100% Satisfaction
Guaranteed. Best prices for excellent quality!
Proton Exchange Membrane (PEM) Fuel Cell Dataset. Contribute to
ECSIM/pem-dataset1 development by creating an account on GitHub.
Personae Corpus
Museums
PewResearch Internet Survey Project
Sciences
PewResearch Society Data Collection
Sciences
Philadelphia Bike Share Stations (JSON)
Transportation
PhysioBank Databases
Healthcare
A large and growing archive of physiological data.
Pinhooker: Thoroughbred Bloodstock Sale Data
Sports
An R Package to compile data sets of historic results from thoroughbred
sales - GitHub - phillc73/pinhooker: An R Package to compile data sets of
historic results from thoroughbred sales
PLAID
Energy
The Plug Load Appliance Identification Dataset
Plane Crash Database, since 1920
Transportation
PLCO-Prostate
Healthcare
PLCO-Prostate-Diagnostic-Procedures
Healthcare
PLCO-Prostate-Medical-Complications
Healthcare
PLCO-Prostate-Screening
Healthcare
PLCO-Prostate-Screening-Abnormalities
Healthcare
PLCO-Prostate-Treatments
Healthcare
Pleiades - Gazetteer and graph of ancient places
GIS
Pleiades gives scholars, students, and enthusiasts worldwide the ability to
use, create, and share historical geographic information about the ancient
world in digital form.
Portland, Oregon
Government
Welcome to the City of Portland Corporate GIS (CGIS) team page. CGIS
provides corporate spatial data, systems, applications and services to the
organization and Portland citizens. Here you can stay current on projects,
find maps, applications, access enterprise data, and offer any feedback.
Portugal - Pordata organization
Government
Pordata is a certified statistical database about Portugal, its
Municipalities and Europe. With free access it addresses the various themes
in society.
POS/NER/Chunk annotated data
Museums
Twitter NLP Tools. Contribute to aritter/twitter_nlp development by
creating an account on GitHub.
PRAD-CA-Prostate-Adenocarcinoma-Canada
Healthcare
The ICGC Data Portal provides tools for visualizing, querying and
downloading the data released quarterly by the consortium's member projects.
PRAD-FR-Prostate-Adenocarcinoma-France
Healthcare
The ICGC Data Portal provides tools for visualizing, querying and
downloading the data released quarterly by the consortium's member projects.
PRAD-UK-Prostate-Adenocarcinoma-United-Kingdom
Healthcare
The ICGC Data Portal provides tools for visualizing, querying and
downloading the data released quarterly by the consortium's member projects.
Pro Kabadi season 1 to 7
Sports
This Repo contain both Python Code (unorganized) and Data Used for
Downloading Stats Data from Pro Kabadi. - GitHub -
ranganadhkodali/Pro-Kabadi-season-1-7-Stats: This Repo contain both Python
Code (unorganized) and Data Used for Downloading Stats Data from Pro Kabadi.
Program for International Student Assessement (PISA)
Education
#N/A
Prostate Adenocarcinoma (MSKCC/DFCI)
Healthcare
Prostate-3T
Healthcare
Prostate-Adenocarcinoma-Broad-Cornell-2012
Healthcare
Prostate-Adenocarcinoma-Broad-Cornell-2013
Healthcare
Prostate-Adenocarcinoma-CNA-study-MSKCC
Healthcare
Prostate-Adenocarcinoma-Fred-Hutchinson-CRC
Healthcare
Prostate-Adenocarcinoma-MSKCC
Healthcare
Prostate-Adenocarcinoma-Organoids-MSKCC
Healthcare
Prostate-Adenocarcinoma-Sun-Lab
Healthcare
Prostate-Adenocarcinoma-TCGA
Healthcare
Prostate-Adenocarcinoma-TCGA-PanCancer-Atlas
Healthcare
Prostate-Diagnosis
Healthcare
Prostate-Fused-MRI-Pathology
Healthcare
Prostate-MRI
Healthcare
Prostate-R
Healthcare
PROSTATEx-Challenge
Healthcare
Protein Data Bank
Biology
As a member of the wwPDB, the RCSB PDB curates and annotates PDB data
according to agreed upon standards. The RCSB PDB also provides a variety of
tools and resources. Users can perform simple and advanced searches based
on annotations relating to sequence, structure and function. These
molecules are visualized, downloaded, and analyzed by users who range from
students to specialized scientists.
Protein-protein interaction network
Complex Networks
Psychiatric Genomics Consortium
Biology
PubChem Project
Biology
Search and explore chemical information in the world's largest free
chemistry database. Search chemicals by name, molecular formula, structure,
and other identifiers. Find chemical and physical properties, biological
activities, safety and toxicity information, patents, literature citations
and more.
PubGene (now Coremine Medical)
Biology
Public Git Archive
Software
source{d} datasets ("big code") for source code analysis and machine
learning on source code - datasets/PublicGitArchive at master ·
src-d/datasets
Puerto Rico Government
Government
Pull Request review comments
Software
PyPI and Maven Dependency Network
Complex Networks
As time is always running out, i don't think i'll have the time in a while
to work again on the data I collected for the last three articles, Going
offline with Maven, State of the Maven/Java dependency graph and State of
the PyPi/Python dependency graph. So, as it took me a long time to build…
QIN-PROSTATE
Healthcare
QIN-PROSTATE-Repeatability
Healthcare
Quandl
Finance
It is a massive repository for Economic and Financial data. Most of the datasets are free but some are available to purchase as well.
Quebec City, QC, Canada
Government
Quebec Province of Canada
Government
Loading...
Question Answering dataset
Public Domains
Rapid7 Sonar Internet Scans
Computer Networks
A security research project that conducts internet-wide surveys across
different services and protocols to gain insights into global exposure to
common vulnerabilities.
This site hosts all the comments millions of users made on Reddit from 2005 to 2017!
Reddit Comments
Social Networks
Reddit Datasets
Public Domains
r/datasets: A place to share, find, and discuss Datasets.
Regina SK, Canada
Government
Registered Meteorites on Earth
Machine Learning
#N/A
Renfe (Spanish National Railway Network) dataset
Transportation
Reserve Bank of India
RBI provides number of datasets related to Money Market Operations, Banking products etc
Restaurants Health Score Data in San Francisco
Machine Learning
Retrosheet Baseball Statistics
Sports
Reverse Geocoder using OSM data
GIS
Simple but fast reverse geocoding up to city granularitiy level - GitHub -
kno10/reversegeocode: Simple but fast reverse geocoding up to city
granularitiy level
RevolutionAnalytics Collection
Public Domains
Rfam
Biology
The Rfam database is a collection of RNA families
Rijksmuseum Historical Art Collection
Museums
The Rijksmuseum links individuals with art and history. Our data services
are important building blocks for this. That is why we offer them in
different ways, guided by the Rijksmuseum’s progressive Open Data policy
and inspired by the FAIR data principles.
Rio de Janeiro, Brazil
Government
RITA Airline On-Time Performance data
Transportation
RITA/BTS transport data collection (TranStat)
Transportation
Robin Wilson - Free GIS Datasets
GIS
Romania
Government
Rotten Tomato Reviews
Public Domains
About 400,000 reviews from Rotten Tomato
RuFa
Images
Contains images of text written in one of two Arabic fonts
Russia
Government
Sample R data sets
Public Domains
San Antonio, TX
Government
Community Information Now - CI:Now is a nonprofit
San Diego, CA
Government
San Francisco Data sets
Government
San Jose, California, US
Government
San Mateo County, California, US
Government
Sanger Catalogue of Somatic Mutations in Cancer (COSMIC)
Biology
COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world's
largest and most comprehensive resource for exploring the impact of somatic
mutations in human cancer.
Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC)
Biology
Saskatchewan, Province of Canada
Government
SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles)
Museums
SciencesPo World Trade Gravity Datasets
Economics
Scopus Citation Database
Complex Networks
Seattle
Government
SEC EDGAR
Finance
EDGAR, the Electronic Data Gathering, Analysis, and Retrieval
SEER-YR1973_2015.SEER9
Healthcare
SEER-YR1992_2015.SJ_LA_RG_AK
Healthcare
SEER-YR2000_2015.CA_KY_LO_NJ_GA
Healthcare
SEER-YR2000_2015.CA_KY_LO_NJ_GA
Healthcare
Sentiment Analysis dataset -Sentiment140
Public Domains
Sequence Read Archive(SRA)
Biology
Several Shape-from-Silhouette Datasets
Images
Singapore Government Data
Government
Skin Cancer dataset
Healthcare
Skytrax' Air Travel Reviews Dataset
Social Networks
Sloan Digital Sky Survey (SDSS)
Physics
Small Network Data
Complex Networks
Smart Meter Data Portal
Energy
Smithsonian Institution Global Volcano and Eruption Database
Sciences
SMS Spam Collection in English
Museums
Social Twitter Data
Social Networks
Socrata
Data Catalogue
Socrata hosts cleaned datasets across domains such as Government data, Radiation data, Workplace related data etc.
Source Code Identifiers
Software
SourceForge.net Research Data
Social Networks
South Africa
Government
South Africa Trade Statistics
Government
Space Apps Challenge
Data Challenges
Spam Messages dataset
Complex Networks
Spoken digit dataset
Complex Networks
St Louis Federal
Finance
StackExchange Data Explorer
Sciences
Standard Sentiment dataset from Stanford
Public Domains
Stanford Dogs Dataset
Images
Stanford GraphBase
Complex Networks
Stanford Large Network Dataset Collection
Complex Networks
Stanford Longitudinal Network Data Sources
Complex Networks
Stanford Microarray Data
Biology
Stanford Question Answering Dataset (SQuAD)
Museums
State of Utah, US
Government
Statista.com - statistics and Studies
Search Engines
Statistics from the General Statistics Office of Vietnam
Government
Stats4Stem R data sets (archived)
Public Domains
StatSci.org
Public Domains
Stock Market dataset -DataHub
Finance
Stowers Institute Original Data Repository
Biology
The Stowers Institute for Medical Research focuses on basic biomedical
research in genetic model organisms as a way to understand the molecular
mechanisms…
A synthetic energy dataset for non-intrusive load monitoring
Systems Science of Biological Dynamics (SSBD) Database
Biology
Taiwan
Government
Taiwan gov
Government
Tate Collection metadata
Museums
TCGA-PRAD-US
Healthcare
Tel-Aviv Open Data
Government
Telecom Italia Big Data Challenge
Data Challenges
Tennis database of rankings, results, and stats for ATP
Sports
Tennis database of rankings, results, and stats for WTA
Sports
Terrorism Research and Analysis Consortium
Sciences
Texas Inmates Executed Since 1984
Sciences
Texas Open Data
Government
Text data from eBook dataset
Public Domains
The Action Similarity Labeling (ASLAN) Challenge
Images
The Atlas of Economic Complexity
Economics
The Big Bad NLP Database
Museums
The Cancer Genome Atlas (TCGA), available via Broad GDAC
Biology
The Cancer Genome Atlas project (TCGA)
Healthcare
The Cancer Imaging Archive (TCIA)
Healthcare
The Catalogue of Life
Biology
The Center for International Data
Economics
The COVID Tracking Project
Healthcare
The COVID Tracking Project collects
The Getty vocabularies
Museums
The global dataset of historical yields for major crops 1981–2016
Agriculture
Iizumi, Toshichika (2019): Global dataset of historical yields v1.2 and
v1.3 aligned version. PANGAEA, https://doi.org/10.1594/PANGAEA.909132,
Supplement to: Iizumi, Toshichika; Sakai, T (2020): The global dataset of
historical yields for major crops 1981–2016. Scientific Data, 7(1), 97,
https://doi.org/10.1038/s41597-020-0433-7
The Koblenz Network Collection
Complex Networks
The Laboratory for Web Algorithmics (UNIMI)
Complex Networks
The Observatory of Economic Complexity
Economics
The Oxford-IIIT Pet Dataset
Images
The Peer-to-Peer Trace Archive
Computer Networks
The Personal Genome Project
Biology
The Public Utility Data Liberation Project (PUDL)
Energy
The Washington Post List
Public Domains
The World Bank
Government
The World Bank Open Data Resources for Climate Change
Climate/Weather
TIGER/Line - U.S. boundaries and roads
GIS
TikTok Dataset
Machine Learning
Time Series Data Library (TSDL) from MU
Time Series
Titanic Survival Data Set
Sciences
Top Streamers on Twitch
Entertainment
This contains data of Top 1000 Streamers
Toronto Bike Share Stations (JSON and GBFS files)
Transportation
Toronto, ON, Canada
Government
Tracebase
Energy
Traffic and Log Data Captured During a Cyber Defense Exercise
CyberSecurity
Transport for London (TFL)
Transportation
TravisTorrent Dataset
Data Challenges
MSR'2017 Mining Challenge
TunedIT
Data Challenges
Data mining & machine learning data sets, algorithms, challenges
Tunisia
Government
Turing Change Point Dataset
Time Series
Twenty News Groups dataset
Public Domains
Twitch Top Streamer's Data
Social Networks
Twitter Data for Online Reputation Management
Social Networks
Twitter Data for Sentiment Analysis
Social Networks
Twitter data on US Airline sentiment
Social Networks
Twitter Graph of entire Twitter site
Social Networks
Twitter Scrape Calufa May 2011
Social Networks
TwoFishes - Foursquare's coarse geocoder
GIS
TZ Timezones shapefile
GIS
U.K. Government Data
Government
U.S. American Community Survey
Government
U.S. Bureau of Transportation Statistics (BTS)
Transportation
U.S. CDC Public Health datasets
Government
U.S. Census Bureau
Government
U.S. Congressional Research Service (CRS) Reports
Government
U.S. Department of Agriculture's Nutrient Database
Agriculture
U.S. Department of Housing and Urban Development (HUD)
Government
U.S. Domestic Flights 1990 to 2009
Transportation
U.S. Federal Government Agencies
Government
U.S. Federal Government Data Catalog
Government
U.S. Food and Drug Administration (FDA)
Government
U.S. Freight Analysis Framework since 2007
Transportation
U.S. National Center for Education Statistics (NCES)
Government
U.S. Open Government
Government
U.S. Patent and Trademark Office (USPTO) Bulk Data Products
Government
UC Riverside Time Series Dataset
Time Series
UCB's Archive of Social Science Data (D-Lab)
Sciences
UCI Machine Learning Repository
Machine Learning
This site consists of datasets hosted by the Education of California, Irvine. It has a collection of about 400+ datasets aimed towards the Machine Learning community.
UCI Network Data Repository
Complex Networks
UCI Spam Email dataset
Complex Networks
UCLA Social Sciences Data Archive
Sciences
UCLA SOCR data collection
Public Domains
UCSC Public Data
Biology
UCSD Network Telescope, IPv4 /8 net
Computer Networks
UEA Climatic Research Unit
Climate/Weather
The Climatic Research Unit (CRU) is based in the School of Environmental
Sciences , UEA and is considered one of the world's leading institution
concerned with the study of natural and anthropogenic climate change.
UFL sparse matrix collection
Complex Networks
UFO Reports
Public Domains
Uganda Bureau of Statistics
Government
UK 2011 Census Open Atlas Project
Government
UK-DALE - UK Domestic Appliance-Level Electricity
Energy
Ukraine
Government
Ukraine Energy Centre Datasets
Energy
UN Civil Society Database
Sciences
UN Commodity Trade Statistics
Economics
UN Environmental Data
GIS
UN Human Development Reports
Economics
UNICEF
Data Catalogue
This site hosts data about the lives of children with details such as Nutrition, Education etc.
UniGene
Biology
UNIMI/LAW Social Network Datasets
Social Networks
United Nations
Government
United States Congress Twitter Data - Daily datasets with tweets of 1100+ [...]
Social Networks
Universal Dependencies
Museums
Universal Protein Resource (UnitProt)
Biology
Universities Worldwide
Sciences
Education of North Carolina dataset
Education
Health related datasets.
UPJOHN for Labor Employment Research
Sciences
Uppsala Conflict Data Program
Sciences
Uruguay
Government
US Census data
Government
Detailed US census data
US Counties
Government
This is a repository of various data, broken down by US
US visualization public data
Government
USA Soccer Teams and Locations - USA soccer teams and locations. MLS, [...]
Sports
USENET postings corpus of 2005~2011
Museums
USGS Earthquake Archives
Science
Valley Transportation Authority (VTA), California, US
Government
Vancouver, BC Open Data Catalog
Government
Victoria, BC, Canada
Government
Vienna, Austria
Government
Violent-Flows - Crowd Violence / Non-violence Database and benchmark
Images
Visual genome
Images
Visual QA dataset
Images
It is a dataset for Open ended questions about images.
Wahington Post Climate Change
Climate/Weather
The Washington Post's analysis of NOAA climate change data for the
contiguous United States - GitHub -
washingtonpost/data-2C-beyond-the-limit-usa: The Washington Post's analysis
of NOAA climate change data for the contiguous United States
Walmart dataset
eCommerce
This dataset has details about Sales transactions from about 45 Walmart stores in the US.
Weather Forecasting dataset
Climate/Weather
Webhose - News/Blogs in multiple languages
Museums
WHITED
Energy
Wikidata - Wikipedia databases
Museums
Wikileaks 911 pager intercepts
Public Domains
Wikipedia Database
Public Domains
This dataset is perfect for Natural Language Processing Tasks.
Wikipedia Links data - 40 Million Entities in Context
Museums
WordNet databases and tools
Museums
World Bank
Finance
These datasets are offered by the World Bank. They also provide several tools such as Education Indices, Open Data Catalog etc.
World Bank dataset
Finance
World Bank Open Data
Sciences
World boundaries from the U.S. Department of State
GIS
World countries in multiple formats
GIS
World Health Organization Global Health Observatory
Healthcare
World Inequality Database
Sciences
WorldClim - Global Climate Data
Climate/Weather
WorldPop project
Sciences
WorldTree Corpus of Explanation Graphs for Elementary Science Questions
Museums
WSU Graph Database
Complex Networks
WU Historical Weather Worldwide
Climate/Weather
xView
Images
One of the largest overhead image dataset.
Yahoo Finance
Finance
Yahoo Knowledge Graph COVID-19 Datasets
Healthcare
Yahoo Webscope
Public Domains
Yahoo! Graph and Social Data
Social Networks
Yahoo! Ratings and Classification Data
Machine Learning
Yelp dataset
Public Domains
This dataset contains over 8 million + yelp reviews. This dataset is perfect for Text Classification use cases.
Yelp Dataset Challenge
Public Domains
Youtube 8m
Machine Learning
YouTube Faces Database
Images
YouTube Video dataset
Public Domains
This is a YouTube labelled video dataset. It consists of 8 million video IDs with related data.
Youtube Video Social Graph in 2007,2008
Social Networks
YouTube-BoundingBoxes
Machine Learning
Zalando -Fashion MNIST dataset
eCommerce
Zenodo - An open dependable home for the long-tail of science
Search Engines
Svenska kraftnät
Energy
Electricity statistics - We produce Sweden statistics for production, consumption, imbalance index, import and export and more.
Mimer
Energy
Production statistics
Noorpool
Energy
Nord Pool runs the leading power market in Europe offering both day-ahead and intraday markets to its customers.
OpenAQ
Air Quality
OpenAQ is a non-profit organization empowering communities around the globe to clean their air by harmonizing, sharing, and using open air quality data.