Kukkola Fors Torne River Sweden

List of datasets for all your data science work

Data science is a hot topic just now, more and more people are talking about it and planning to include data science in their organizations. The data science practice is to extract knowledge and insights from data and information generated through various tools and applications.

If you’re new to data science or just trying to build a more robust data science portfolio, a perfect way of solidifying your skills is to participate in projects, assignments, including data visualization, data cleaning, and data science projects or machine learning projects to strengthen your skills.

Continuously practicing these projects and assignments can help you ace skills and excel in your career.

To help you out finding open public datasets to work with, we have compiled a list of public open datasets for your next data science project if it is in machine learning or Images datasets, NLP datasets, self-driving datasets 

This list of public datasets sources is continuously updated, the data are collected from blogs, websites and user responses. Most of the data sets listed below are free, however, some are not.

List of Datasets

Datasets Provider URL Category Description
1000 Genomes Biology Supporting open human variation data
10k US Adult Faces Database Images The datasets here have been assembled and made publicly available during Wilma’s research career. Please cite the corresponding publication when you use any of these datasets.
2019 Novel Coronavirus COVID-19 Data Repository by Johns Hopkins CSSE Healthcare Novel Coronavirus (COVID-19) Cases, provided by JHU CSSE
2021 Portuguese Elections Twitter Dataset Social Networks This dataset contains tweets and users mostly from the Portuguese Twittersphere. The watched users stem from a seed of political accounts (usernames) and news sources(usernames_news)
2GB of Photos of Cats Images Over 9,000 images of cats with annotated facial features
38-Cloud (Cloud Detection) Science This data set includes Landsat 8 images and their manually extracted pixel-level ground truths for cloud detection.
3W dataset Time Series The first realistic and public dataset with rare undesirable real events in oil wells.
43k+ Donald Trump Twitter Screenshots Social Networks This archive contains screenshots of 43,475 Donald Trump tweets from May 2009 to May 2020.
53.5B Web clicks of 100K users in Indiana Univ. Computer Networks To foster the study of the structure and dynamics of Web traffic networks, we make available a large dataset (‘Click Dataset’) of about 53.5 billion HTTP requests made by users at Indiana Education.
A Twitter Dataset of 40+ million tweets related to COVID-19 Social Networks A Twitter Dataset of 40+ million tweets related to COVID-19
Ably Open Realtime Data Public Domains
Academic Torrents Science It has data used to publish scientific research papers. The variety of datasets is massive with availability of free download.
Academic Torrents of data sharing from UMB Search Engines Distributed system for sharing enormous datasets – for researchers, by researchers.
ACLED (Armed Conflict Location & Event Data Project) Sciences ACLED collects real-time data on the locations, dates, actors, fatalities, and types of all reported political violence and protest events across Africa, the Middle East, Latin America & the Caribbean, East Asia, South Asia, Southeast Asia, Central Asia & the Caucasus, Europe, and the United States of America.
Actuaries Climate Index Climate/Weather Actuaries Climate Index data is available for download. Data is currently available through May 2021 (Spring 2021). Download monthly and seasonal data by region and component.
Affective Image Classification Images
AIcrowd Competitions Data Challenges
Airbnb dataset Public Domains This site hosts all the listing data from Airbnb.
Airborne Object Detection and Tracking Images
Airlines OD Data 1987-2008 Transportation
Alabama Real-Time Coastal Observing System Science
Alberta, Province of Canada Government
All-Age-Faces Dataset Machine Learning
Allen Institute Datasets Neuroscience
Amazon Public Domains
Amazon Reviews Public Domains The dataset contains about 35 million amazon reviews.
American Economic Association Economics Datasets about Macroeconomic data.
American Economic Association (AEA) Economics
American Gut (Microbiome Project) Biology American Gut open-access data and IPython notebooks – biocore/American-Gut
American Ninja Warrior Obstacles Sports
AMiner Citation Network Dataset Complex Networks
AMPds Energy The Almanac of Minutely Power dataset
Analytics Vidhya Public Domains The datasets can be downloaded from the hundreds of data-hack competitions they organize
Animals with attributes Images
Antwerp, Belgium Government
AQUASTAT Science Global water resources and uses
ArcGIS Open Data portal GIS
Archive-it from Internet Archive Public Domains
Archive.org Datasets Public Domains
Argentina (non official) Government
Audi Autonomous Driving Dataset Machine Learning
Audience Unfiltered faces for gender and age classification Images
Austin, TX, US Government
Australia (abs.gov.au) Government
Australia (data.gov.au) Government
Australian Weather Climate/Weather
Austria (data.gv.at) Government
Authoritarian Ruling Elites Database Sciences
Automatic Keyphrase Extraction Museums
Aviation Weather Center Climate/Weather #N/A
Awesome 3D Semantic City Models GIS Collection of open 3D semantic city
AWS COVID-19 Datasets Healthcare
AWS datasets Data Catalogues The big has entered with hundreds of datasets. It’s no surprise if AWS hosts the largest datasets in the coming days.
Azure Data Catalogues
Base dos Dados – Data Basis: Open Data Repository for Brazil Search Engines
Baton Rouge, LA, US Government
Beersheba, Israel Government Open Data Portal (Smart7 OpenData)
Belgium Government
Berkeley Education’s Autonomous driving dataset Machine Learning
Betfair Historical Exchange Data Sports
Bike Share Systems (BSS) collection Transportation
BIS Statistics Finance BIS statistics, compiled in cooperation with central
Blizzard Challenge Speech – The speech + text data comes from […] Museums
Blockmodo Coin Registry – A registry of JSON formatted information files […] Finance
Blogger Corpus Museums
BLUEd Energy Building-Level fUlly labeled Electricity Disaggregation dataset
BODC – marine data of ~22K vars Science
Boston Housing dataset Education
Brain Catalogue Neuroscience
Brainomics Neuroscience
Brazil Government
Brazilian Weather – Historical data (In Portuguese) Climate/Weather
Broad Bioimage Benchmark Collection (BBBC) Biology The Broad Bioimage Benchmark Collection (BBBC) is a collection of freely downloadable microscopy image sets. In addition to the images themselves, each set includes a description of the biological application and some type of “ground truth” (expected results).
Broad Cancer Cell Line Encyclopedia (CCLE) Biology
Bruteforce Database Data Challenges
Buenos Aires, Argentina Government
Bureau of Economic Analysis dataset Government
Bureau of Labor Statistics Government
CADDY Underwater Stereo-Vision Dataset of divers’ hand gestures Images
CAIDA Internet Datasets Computer Networks
Calgary, AB, Canada Government
Caltech Pedestrian Detection Benchmark Images Character Recognition in Natural Images
Cambridge, MA, US Government
Cambridge, MA, US, GIS data on GitHub GIS
Canada Government
Canada Parliament dataset Government Text dataset for NLP tasks from Canadian Parliament.
Canada Science and Technology Museums Corporation’s Open Data Museums
Canadian Legal Information Institute Sciences
Canadian Meteorological Centre Climate/Weather
Cancer related dataset Science
Carnegie Melon Education dataset Education 5+ hours of Highway autonomous driving dataset.
CBOE Futures Exchange Finance
CDC Government This is the dataset offered from the Centers for Disease Control and Prevention.
Cell Image Library Biology
Center for Systemic Peace Datasets Sciences
CERN Open Data Portal Physics
Challenges in Machine Learning Data Challenges
Chars74K dataset Images
Charting The Global Climate Change News Narrative 2009-2020 Climate/Weather
Cheng-Caverlee-Lee September 2009 Social Networks
Chicago Government
Chile Government
China Government
China Biographical Database Social Networks
CIFAR-10 Images Image Classification dataset.
City of Berkeley Open Data Government
Climate Data from UEA Climate/Weather
Climate Data Store Climate/Weather Sea surface temperature daily data from 1981 to present derived from satellite observations
CLiPS Stylometry Investigation Corpus Museums
ClueWeb09 – 1B web pages Computer Networks
ClueWeb09 FACC Museums
ClueWeb12 – 733M web pages Computer Networks
ClueWeb12 FACC Museums
CMU datasets Education
CMU Enron Email of 150 users Social Networks
CMU JASA data archive Public Domains
CMU StatLab collections Public Domains
Code duplicates Software
CodeNeuro Datasets Neuroscience
COIL 100 Software Image dataset of objects photographed at 360 degree rotation at every possible angle from Columbia Education.
Collaborative Research in Computational Neuroscience (CRCNS) Neuroscience
Comma.ai Machine Learning 5+ hours of Highway autonomous driving dataset.
Commit messages Software
CommonCrawl Web Data over 7 years Computer Networks
Community Resource for Archiving Wireless Data At Dartmouth Complex Networks
Complete FAANG Stock data Finance
Complete Genomics Public Data Biology
Composition of Foods Raw, Processed, Prepared USDA National Nutrient Database for Standard Healthcare
Computer Vision dataset Machine Learning Massive set of computer vision dataset organized by categories
Context-aware data sets from five domains Machine Learning
Cooper-Hewitt’s Collection Database Museums
Corona Virus dataset Healthcare
Coronavirus (Covid-19) Data in the United States Healthcare
Correlates of War Project Sciences
COVID-19 Case Surveillance Public Use Data Healthcare
COVID-19 Reported Patient Impact and Hospital Capacity by Facility Healthcare
CRAWDAD Wireless datasets from Dartmouth Univ. Computer Networks
Cricsheet Matches (cricket) Sports
Criteo click-through data Computer Networks
CrossRef DOI URLs Complex Networks All the journal article DOIs from CrossRef’s OAI-PMH server; URLs of just under 50 million journal articles.
CrowdANALYTIX dataX Data Challenges
Cryptome Conspiracy Theory Items Sciences
Crystallography Open Database Physics
CS:GO Competitive Matchmaking Data eSports
Cube++ Images 4890 raw 18-megapixel images, each containing a SpyderCube color
Cytology Dataset Images CCAgT: Images of Cervical Cells with AgNOR Stain
D4D Challenge of Orange Data Challenges
Dallas Open Data Government
Danbooru Tagged Anime Illustration Dataset Images
Data Driven Data Challenges Data Driven hosts competitions for Data Scientists to tackle the real world leading to a social impact. They have datasets from their competitions.
Data Packaged Core Datasets Public Domain
Data.gov Government It consists of a variety of datasets from US Government agencies. Domains include Education, Climate, Food, Chronic disease and what not.
Data.gov.uk Government Thousands of datasets from the UK Govt.
Data.World Data Catalogue
Data360 Public Domains
Databanks International Cross National Time Series Data Archive Time Series
Database of all continents, countries, States/Subdivisions/Provinces GIS
Database of Scientific Code Contributions Sciences
DataBC Government data from the Province of British Columbia
Datacards Sciences
Datahub.io Search Engines
Datasets on Github It hosts tons of awesome datasets. This github boasts a variety of datasets such as Climate Data, Time Series data, Plane crash data etc. Feel free to dig in.
Datos Argentina Government Portal de datos abiertos de la República Argentina.
DBFC Energy Direct Borohydride Fuel Cell (DBFC) Dataset
DBLP Citation dataset Complex Networks
DBnomics – the world’s economic database Economics
DBpedia Structured dataset from Wikipedia
DBpedia – Structured data from Wikipedia Museums
Debt to the Penny Government The Debt to the Penny dataset provides information
DEL Energy Domestic Electrical Load study datsets for South Africa (1994 – 2014)
Delve Datasets for classification and regression Machine Learning
Densely Annotated Video Driving Data Set Images
Denver Open Data Government
DIMACS Road Networks Collection Complex Networks
Dirty Words Museums
Discogs Monthly Data Machine Learning
Dog Image dataset from Stanford
Domains Project – Sorted list of Internet domains Search Engines
DrivenData Competitions for Social Good Data Challenges
DukeMTMC Data Set Images
Durham, NC Open Data Government
Dutch Traffic Information Transportation
Dutch Weather Climate/Weather
Earth Models Science
eBay Online Auctions (2012) Machine Learning
EBI ArrayExpress Biology
EBI Protein Data Bank in Europe Biology
ECO Energy
EconData from UMD Economics
Economic Freedom of the World Data Economics
Edmonton, AB, Canada Government
EDRM Enron EMail of 151 users, hosted on S3 Social Networks
EHDP Large Health Data Sets Healthcare
EIA Energy
Electron Microscopy Pilot Image Archive (EMPIAR) Biology Loading…
Email data from Enron
ENCODE project Biology
England LGInform Government
Enigma Public World’s broadest collection of open source datasets.
Enigma Public Public Domains Enigma provides accurate, timely business data about the identity and financial health of small and medium businesses in the US.
Ensembl Genomes Biology Providing genome data for non-vertebrate species, with tools for the manipulation, analysis and visualisation of that data
EOPC-DE-Early-Onset-Prostate-Cancer-Germany Healthcare The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium’s member projects.
EOSDIS – NASA’s earth observing system data Science
Equity in Athletics Sports
Ergast Formula 1, from 1950 up to date (API) Sports
ETH Entomological Collection (ETHEC) Fine Grained Butterfly (Lepidoptra) Images Images
European Climate Assessment & Dataset Climate/Weather
European Social Survey Sciences
European Union dataset
EuroStat Government
EveryPolitician Government Ongoing project collating and sharing data on every
Face Recognition Benchmark Images Face Recognition – Databases
Facebook Data Scrape (2005) Social Networks Facebook data scrape related to paper The Social Structure of Facebook Networks, by Amanda L. Traud, Peter J. Mucha, Mason A. Porter. We study the social…
Facebook Social Connectedness Index Social Networks
Facebook Social Networks from LAW (since 2007) Social Networks
Factual Global Location Data GIS
FBI This site hosts crime data in the US
FBI Hate Crime 2013 – aggregated data Sciences Contribute to emorisse/FBI-Hate-Crime-Statistics development by creating an account on GitHub.
FCP-INDI Neuroscience
Federal Committee on Statistical Methodology (FCSM) (formerly FedStats) Government
FIFA-2021 Complete Player Dataset eSports #N/A
Financial Times dataset Detailed public dataset about Financial market
Finland Government Suomi.fi Open Data is Finland’s open data catalogue. The portal gathers all open data published in Finland in a single service. Come, use it, or publish via us!
FiveThirtyEight They have a wide variety of datasets on their Github. The specialty of this site is that they have a detailed data dictionary explaining each of the dataset which is very beneficial. I absolutely lovee their FIFA dataset. (*Proud Gunner).
Flickr Personal Taxonomies Museums
FLOSSmole data about free, libre, and open source software development Software
Football/Soccer resources (data and APIs) Sports Loading…
Foursquare from UMN/Sarwat (2013) Social Networks This data set contains 2153471 users, 1143092 venues, 1021970 check-ins, 27098490 social connections, and 2809581 ratings that users assigned to venues; all…
Fragile States Index Sciences
France Government data.gouv.fr dataset search
Fredericton, NB, Canada Government #N/A
Free Music Archive Machine Learning Loading…
Freebase of people, places, and things Museums
Gapminder World demographic databases Healthcare
Gatineau, QC, Canada Government
GDC Healthcare GDC supports several cancer genome programs for CCG, TCGA, TARGET etc.
GDELT Global Events Database Sciences The GDELT Project
GDXray Images X-ray images for X-ray testing and Computer Vision
Gene Expression Omnibus (GEO) Biology
Gene Ontology (GO) Biology Download annotations
General Social Survey (GSS) since 1972 Sciences Download annotations
GENIE Healthcare
Genomic-Hallmarks-Prostate-Adenocarcinoma-CPC-GENE Healthcare
Geo Maps GIS High Quality GeoJSON maps programmatically generated
Geo Spatial Data from ASU GIS
Geo Wiki Project GIS Citizen-driven Environmental Monitoring
GeoFabrik GIS OSM data extracted to a variety of formats and areas
GeoLife GPS Trajectory from Microsoft Research Transportation
GeoNames Worldwide GIS
German Climate Data Center Climate/Weather
German Federal Office for Radiation Protection (Bundesamt für Strahlenschutz) Government The German Federal Office for Radiation Protection (Bundesamt für Strahlenschutz) has a comprehensive database of smartphones – new and old – and the level of radiation they emit
German Political Speeches Corpus Museums
German Social Survey Sciences
German train system by Deutsche Bahn Transportation
Germany Government
Ghent, Belgium Government
GHTorrent Software
GitHub Collaboration Archive Social Networks
Glasgow, Scotland, UK Government
Global Administrative Areas Database (GADM) GIS
Global Biotic Interactions (GloBI) Biology
Global Climate Data Since 1929 Climate/Weather
Global Economic Complexity data
Global Power Plant Database Energy
Global Religious Futures Project Sciences
Global Wind Atlas Science
Google Public Domains
Google Books
Google Books Ngrams Museums
Google dataset This dataset is specifically for accurate landmark recognition
Google Finance Finance
Google MC-AFP Museums
Google Open Images
Google Public Datasets Google has hosted tons of datasets on Google Public Datasets which is basically their Cloud Platform. You can browse through their dataset collection using BigQuery. The first 1 Terabyte of queries you make are basically free.
Google Scholar citation relations Social Networks
Google Trends Finance
Google Trends dataset
Google Web 5gram Museums
Grand Comics Database Public Domains
Graviti Open Datasets Data Catalogues Open Datasets
Greece Government
Guardian world governments Government
Gun Violence Data Sciences
Gutenberg eBooks List Museums
Halifax, NS, Canada Government
Hansards text chunks of Canadian Parliament Museums
Hard Drive Failure Rates Time Series
Harvard Dataverse Network of scientific data Search Engines
Harvard Medical School (HMS) LINCS Project Biology
Harvard Education dataset Education
Heart Rate Time Series from MIT Time Series
Helsinki Region, Finland Government
HES – Household Electricity Study, UK Energy
HFED Energy
High-Resolution Contact Networks from Wearable Sensors Social Networks
Historical MacroEconomic Statistics Economics
Homeland Infrastructure Foundation-Level Data GIS
Hong Kong, China Government
Houston, TX, US Government
Hubway Million Rides in MA Transportation
Human Connectome Project Neuroscience
Human Genome Diversity Project Biology
Human Microbiome Project (HMP) Biology
HumanEva Dataset Images
Humanitarian Data Exchange Sciences
Hyperspectral benchmark dataset on soil moisture Agriculture Hyperspectral benchmark dataset on soil moisture
iAWE Energy
IceCube – South Pole Neutrino Observatory Physics
ICOS PSP Benchmark Biology
ICPSR (UMICH) Search Engines
ICWSM Data Challenge (since 2009) Data Challenges
IEEE Geoscience and Remote Sensing Society DASE Website GIS
Image dataset of Human Face
ImageNet ImageNet is an image database consisting of images organized according to the WordNet hierarchy.
ImageNet (in WordNet hierarchy) Images
IMDb Database Machine Learning
Indian Government Data Government
Indian Government datasets
Indie Map: social graph and crawl of top IndieWeb sites Social Networks
Indonesian Data Portal Government
Indoor Image dataset from MIT
Indoor Scene Recognition Images
Infochimps Public Domains
INFORM Index for Risk Management Sciences
Informatics for Integrating Biology & the Bedside Healthcare
INFORUM Economics Interindustry Forecasting at the Education of Maryland
Instagram Graph API
Institute for Demographic Studies Sciences
Institute of Education Sciences Search Engines
Integrated Marine Observing System (IMOS) Science
International Affective Picture System, UFL Images
International HapMap Project Biology
International Monetary Fund Public data Economic
International Networks Archive Sciences
International Social Survey Program ISSP Sciences
International Studies Compendium Project Sciences
International Trade Statistics Economics
Internet Product Code Database Economics
Internet-Wide Scan Data Repository Computer Networks
Iowa – Welcome to the State of Iowa’s data portal. Please explore data […] Government
Iranis Machine Learning A Large-scale Dataset of Farsi/Arabic License Plate Characters
Ireland’s Open Data Portal Government
Israel’s Open Data Portal Government
Istanbul Municipality Open Data Portal Government
Italy Government Il Portale dati.gov.it è il catalogo nazionale dei metadati
Jail deaths in America Government The U.S. government does not release jail
James McGuire Cross National Data Sciences
Japan Government
Jeopardy Quiz show dataset
Joint External Debt Data Hub Economics
Jon Haveman International Trade Data Links Economics
Kaggle Data Catalogues Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.
Kaggle Competition Data Data Challenges
KDD Cup by Tencent 2012 Data Challenges
KDD Cups KDD is a competition held by ACM on Knowledge Discovery and Data Mining hosting datasets with detailed data dictionaries & instructions.
KDNuggets Data Collections Public Domains
KDNuggets dataset
Keel Repository for classification, regression and time series Machine Learning
KEGG Biology
KITTI Vision Benchmark Suite Images
Labeled Faces in the Wild (LFW) Machine Learning
Labeled Information Library of Alexandria Images
Labelme Image dataset from MIT
Lahman’s Baseball Database Sports
Landsat 8 on AWS GIS
Large Movie Review dataset from Stanford
Laval, QC, Canada Government
Lemons quality control dataset Agriculture Lemons qualoity control dataset
Lending Club Loan Data Machine Learning
LendingClub The site hosts massive datasets about loan related data. You have to create an account to access the data.
Lexington, KY Government
Libraries.io Open Source Repository and Dependency Metadata Software
Ligo Open Science Center (LOSC) Physics
List of all countries in all languages GIS
LJ Speech Museums
Localytics Data Visualization Challenge Data Challenges
LODUM Datasets from Education of Munster.
London Datastore, UK Government
London, ON, Canada Government
Long-Term Productivity Database – The Long-Term Productivity database was […] Economics
Los Angeles Open Data Government
Luxembourg – Luxembourgish Open Data Portal Government
M-AILabs Speech Museums
Machine Comprehension Test (MCTest) of text from Microsoft Research Museums
Machine Learning Data Set Repository Machine Learning
Machine Translation of European languages Museums
MacroData Guide by Norsk samfunnsvitenskapelig datatjeneste Sciences
Making Sense of Microposts 2013 Museums
Making Sense of Microposts 2016 Museums
Marinexplore – Open Oceanographic Data Science
Mass Mobilization Data Project Sciences
MassGIS, Massachusetts, U.S. Government
Massive Visual Memory Stimuli, MIT Images
MeDAL Healthcare A large medical text dataset curated for abbreviation
Medical Insurance dataset
Medicare Coverage Database (MCD), U.S. Healthcare
Medicare Data Engine of medicare.gov Data Healthcare
Medicare Data File Healthcare
Medicare related dataset Healthcare
MeSH, the vocabulary thesaurus used for indexing articles for PubMed Healthcare
Metastatic-Prostate-Adenocarcinoma-MCTP Healthcare
Metastatic-Prostate-Cancer-SU2CPCF-Dream-Team Healthcare
Metropolitan Museum of Art Collection API Museums
Metropolitan Transportation Commission (MTC), California, US Government
Mexico Government
Microsoft Academic Knowledge Graph Sciences
Microsoft Academic Research data Sciences
Microsoft Azure Data Market Free DataSets Public Domains
Microsoft Data Science for Research Public Domains
Microsoft MAchine Reading COmprehension Dataset (or MS MARCO) Museums
Microsoft Research Open Data Public Domains
Million Song Dataset Machine Learning
Minneapolis Institute of Arts metadata Museums
Minnesota Population Center Sciences
MIRAGE-2019 Computer Networks MIRAGE-2019 is a human-generated dataset for mobile traffic
Mississauga, ON, Canada Government View current and historical planning data, such as information on population, demographics, census, development, growth forecasts, housing, employment, office and land use.
MIT dataset Education
MIT Reality Mining Dataset Sciences
MNIST database of handwritten digits, near 1 million examples Images
MNIST dataset This is a repository containing a hand-written digit dataset (About 60,000 samples).
Mobile Social Networks from UMASS Social Networks
Moldova Government
Moncton, NB, Canada Government City of Moncton – Open Data
Montreal BIXI Bike Share Transportation Vous cherchez un moyen économique, écologique et pratique pour vous déplacer? Explorez Montréal en faisant la location d’un vélo BIXI!
Montreal, QC, Canada Government
More Song Datasets Machine Learning
Mountain View, California, US (GIS) Government City of Mountain View
MovieLens Data Sets Machine Learning
MSK-IMPACT-Clinical-Sequencing-Cohort-MSKCC-Prostate-Cancer Healthcare
Multi-Domain Sentiment Dataset (version 2.0) Museums
Multi-View Region of Interest Prediction Dataset for Autonomous Driving Images
NaF-Prostate Healthcare
NASA dataset Physics Massive set of datasets pertaining to Space
NASA Exoplanet Archive Physics
NASA Global Imagery Browse Services Climate/Weather
NASDAQ Finance
National Estuarine Research Reserves System-Wide Monitoring Program Science
National Technical Reports Library Search Engines National Technical Report Library
National Weather Service GIS Data Portal GIS
Natural Earth – vectors and rasters of the world GIS
Natural History Museum (London) Data Portal Museums
NaturalLanguage Museums A topic-centric list of HQ open datasets. Contribute to awesomedata/awesome-public-datasets development by creating an account on GitHub.
NBER Patent Citations Complex Networks
NCBI Proteins Biology
NCBI Taxonomy Biology
NCI Genomic Data Commons Biology
NDAR Neuroscience
Netflix Prize Data Challenges
Netherlands Government
Network Repository with Interactive Exploratory Analysis Tools Complex Networks Loading…
Network Twitter Data Social Networks
NeuroData Neuroscience
Neuroelectro Neuroscience
Neuroendocrine-Prostate-Cancer Healthcare
NeuroMorpho Neuroscience
New York Department of Sanitation Monthly Tonnage Government
New York State Education Department Data Education
New Yorker caption contest ratings Machine Learning Loading…
New Zealand Government Find Stats NZ’s information releases, news stories, and reports grouped by topic.
Newspaper Navigator Images Explore the visual and textual content within the Chronicling America digitized newspaper collection in new ways using machine learning!
NFL play-by-play data Sports
NIMH Data Archive Neuroscience
NIST complex networks data collection Complex Networks
NOAA Bering Sea Climate Climate/Weather
NOAA Climate Datasets Climate/Weather These links provide quick access to many of NCEI’s climate and weather datasets, products, and various web pages and resources. Related Content
NOAA Realtime Weather Models Climate/Weather #N/A
NOAA SURFRAD Meteorology and Radiation Datasets Climate/Weather The Global Monitoring Laboratory conducts research on greenhouse gas and carbon cycle feedbacks, changes in clouds, aerosols, and surface radiation, and recovery of stratospheric ozone.
Noisy speech database for training speech enhancement algorithms and TTS Museums
Notre Dame Global Adaptation Index (ND-GAIN) Sciences
NPCR-2001-2015 Healthcare De-identified cancer incidence data are available to researchers for free in these databases.
NPCR-2005-2015 Healthcare De-identified cancer incidence data are available to researchers for free in these databases.
NSSDC (NASA) data of 550 space spacecraft Physics How to obtain data from the NASA Space Science Data Coordinated Archive (NSSDCA)
Number of Ebola Cases and Deaths in Affected Countries (2014) Healthcare
NYC betanyc Government
NYC Open Data Government NYC Open Data helps New Yorkers use and learn about City data
NYC Taxi Trip Data 2009- Transportation
NYC Taxi Trip Data 2013 Transportation FOIA/FOILed Taxi Trip Data from the NYC Taxi and Limousine Commission 2013. Released by http://chriswhong.com/open-data/foil_nyc_taxi/ trip_data.7z and…
NYC Uber trip data April 2014 to September 2014 Transportation Loading…
Oakland, California, US Government
OANDA Finance #N/A
OASIS Neuroscience Loading…
OECD Government Find, compare and share the latest OECD data: charts, maps, tables and related publications
Oil and Gas Authority Open Data Science Loading…
Oklahoma Government
OONI: Open Observatory of Network Interference Computer Networks The Open Observatory of Network Interference (OONI) is a global community measuring internet censorship around the world. Run OONI Probe to detect internet censorship. Use OONI Explorer to track internet censorship worldwide in near real-time.
Open Crime and Policing Data in England, Wales and Northern Ireland Sciences
Open Data Certificates (beta) Search Engines
Open Data for Africa Government
Open Government Data (OGD) Platform India Government Open Government Data Platform (OGD) India is a single-point of access to Datasets/Apps in open format published by Ministries/Departments. Details of Events, Visualizations, Blogs, infographs.
Open Image dataset Images The Open Images dataset. Contribute to openimages/dataset development by creating an account on GitHub.
Open Images From Google Images
Open Library Data Dumps Public Domains Open Library is an open, editable library catalog, building towards a web page for every book ever published. Read, borrow, and discover more than 3M books for free.
Open Mobile Data by MobiPerf Computer Networks Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.
Open Multilingual Wordnet Museums
Open Traffic collection Transportation Collection of open data resources for traffic information – graphhopper/open-traffic-collection
Open-ODS (structure of the UK NHS) Healthcare This domain name has been registered with Gandi.net. It is currently parked by the owner.
OpenAddresses GIS
OpenCorporates Database of Companies in the World Economics #N/A
OpenDataNetwork – A search engine of all Socrata powered data portals Search Engines Find the data you need to power your business, app, or analysis from across the open data ecosystem.
OpenDataPhilly Government OpenDataPhilly is a catalog of open data in the
OpenDataSoft’s list of 1,600 open data Government Follow the news of the Opendatasoft community. Get the latest news in your inbox by subscribing to our newsletter!
OpenDota data dump eSports
OpenFlights – airport, airline and route data Transportation
OpenfMRI Neuroscience
OpenNEURO Neuroscience
OpenPaymentsData, Healthcare financial relationship data Healthcare
OpenSanctions Sciences OpenSanctions helps investigators find leads, allows companies to manage risk and enables technologists to build data-driven products.
OpenStreetMap (OSM) GIS
Optimized Soil Adjusted Vegetation Index Agriculture Optimized Soil Adjusted Vegetation Index
Oregon Government
OSU Cognitive Modeling Repository Datasets Psychology+Cognition
OSU Financial data Finance
Ottawa, ON, Canada Government City of Ottawa Open Data (Open Ottawa)
Our World in Data Economics
Oxford Autonomous Driving dataset Machine Learning
Palmer Penguins Biology
Palo Alto, California, US Government
Pathguid Biology
Paul Hensel General International Data Page Sciences Best Quality Drugs! Valid pharmacy recognized by the CFA. 100% Satisfaction Guaranteed. Best prices for excellent quality!
PEM1 – Proton Exchange Membrane (PEM) Fuel Cell Dataset Energy Loading…
Personae Corpus Museums
PewResearch Internet Survey Project Sciences
PewResearch Society Data Collection Sciences
Philadelphia Bike Share Stations (JSON) Transportation
PhysioBank Databases Healthcare A large and growing archive of physiological data.
Pinhooker: Thoroughbred Bloodstock Sale Data Sports An R Package to compile data sets of historic results from thoroughbred sales – phillc73/pinhooker
PLAID Energy The Plug Load Appliance Identification Dataset
Plane Crash Database, since 1920 Transportation
PLCO-Prostate Healthcare
PLCO-Prostate-Diagnostic-Procedures Healthcare
PLCO-Prostate-Medical-Complications Healthcare
PLCO-Prostate-Screening Healthcare
PLCO-Prostate-Screening-Abnormalities Healthcare
PLCO-Prostate-Treatments Healthcare
Pleiades – Gazetteer and graph of ancient places GIS Pleiades gives scholars, students, and enthusiasts worldwide the ability to use, create, and share historical geographic information about the ancient world in digital form.
Portland, Oregon Government Welcome to the City of Portland Corporate GIS (CGIS) team page. CGIS provides corporate spatial data, systems, applications and services to the organization and Portland citizens. Here you can stay current on projects, find maps, applications, access enterprise data, and offer any feedback.
Portugal – Pordata organization Government Loading…
POS/NER/Chunk annotated data Museums Twitter NLP Tools. Contribute to aritter/twitter_nlp development by creating an account on GitHub.
PRAD-CA-Prostate-Adenocarcinoma-Canada Healthcare The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium’s member projects.
PRAD-FR-Prostate-Adenocarcinoma-France Healthcare The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium’s member projects.
PRAD-UK-Prostate-Adenocarcinoma-United-Kingdom Healthcare The ICGC Data Portal provides tools for visualizing, querying and downloading the data released quarterly by the consortium’s member projects.
Pro Kabadi season 1 to 7 Sports This Repo contain both Python Code (unorganized) and Data Used for Downloading Stats Data from Pro Kabadi. – ranganadhkodali/Pro-Kabadi-season-1-7-Stats
Program for International Student Assessement (PISA) Education #N/A
Prostate Adenocarcinoma (MSKCC/DFCI) Healthcare
Prostate-3T Healthcare
Prostate-Adenocarcinoma-Broad-Cornell-2012 Healthcare
Prostate-Adenocarcinoma-Broad-Cornell-2013 Healthcare
Prostate-Adenocarcinoma-CNA-study-MSKCC Healthcare
Prostate-Adenocarcinoma-Fred-Hutchinson-CRC Healthcare
Prostate-Adenocarcinoma-MSKCC Healthcare
Prostate-Adenocarcinoma-Organoids-MSKCC Healthcare
Prostate-Adenocarcinoma-Sun-Lab Healthcare
Prostate-Adenocarcinoma-TCGA Healthcare
Prostate-Adenocarcinoma-TCGA-PanCancer-Atlas Healthcare
Prostate-Diagnosis Healthcare
Prostate-Fused-MRI-Pathology Healthcare
Prostate-MRI Healthcare
Prostate-R Healthcare
PROSTATEx-Challenge Healthcare
Protein Data Bank Biology As a member of the wwPDB, the RCSB PDB curates and annotates PDB data according to agreed upon standards. The RCSB PDB also provides a variety of tools and resources. Users can perform simple and advanced searches based on annotations relating to sequence, structure and function. These molecules are visualized, downloaded, and analyzed by users who range from students to specialized scientists.
Protein-protein interaction network Complex Networks
Psychiatric Genomics Consortium Biology
PubChem Project Biology Search and explore chemical information in the world’s largest free chemistry database. Search chemicals by name, molecular formula, structure, and other identifiers. Find chemical and physical properties, biological activities, safety and toxicity information, patents, literature citations and more.
PubGene (now Coremine Medical) Biology
Public Git Archive Software source{d} datasets (“big code”) for source code analysis and machine learning on source code – datasets/PublicGitArchive at master · src-d/datasets
Puerto Rico Government Government
Pull Request review comments Software
PyPI and Maven Dependency Network Complex Networks As time is always running out, i don’t think i’ll have the time in a while to work again on the data I collected for the last three articles, Going offline with Maven, State of the Maven/Java dependency graph and State of the PyPi/Python dependency graph. So, as it took me a long time to build…
QIN-PROSTATE-Repeatability Healthcare
Quandl Finance It is a massive repository for Economic and Financial data. Most of the datasets are free but some are available to purchase as well.
Quebec City, QC, Canada Government
Quebec Province of Canada Government En 2018, une démarche structurée!
Question Answering dataset Public Domains
Rapid7 Sonar Internet Scans Computer Networks A security research project that conducts internet-wide surveys across different services and protocols to gain insights into global exposure to common vulnerabilities.
RDataMining – “R and Data Mining” ebook data Machine Learning
Real Estate Price prediction dataset Finance regression analysis, mutiple regression,linear regression, prediction
REDD Energy
Reddit Social Networks This site hosts all the comments millions of users made on Reddit from 2005 to 2017!
Reddit Comments Social Networks
Reddit Datasets Public Domains r/datasets: A place to share, find, and discuss Datasets.
Regina SK, Canada Government
Registered Meteorites on Earth Machine Learning #N/A
Renfe (Spanish National Railway Network) dataset Transportation
Reserve Bank of India RBI provides number of datasets related to Money Market Operations, Banking products etc
Restaurants Health Score Data in San Francisco Machine Learning
Retrosheet Baseball Statistics Sports
Reverse Geocoder using OSM data GIS Simple but fast reverse geocoding up to city granularitiy level – kno10/reversegeocode
RevolutionAnalytics Collection Public Domains
Rfam Biology The Rfam database is a collection of RNA families
Rijksmuseum Historical Art Collection Museums The Rijksmuseum links individuals with art and history. Our data services are important building blocks for this. That is why we offer them in different ways, guided by the Rijksmuseum’s progressive Open Data policy and inspired by the FAIR data principles.
Rio de Janeiro, Brazil Government
RITA Airline On-Time Performance data Transportation
RITA/BTS transport data collection (TranStat) Transportation
Robin Wilson – Free GIS Datasets GIS
Romania Government
Rotten Tomato Reviews Public Domains About 400,000 reviews from Rotten Tomato
RuFa Images Contains images of text written in one of two Arabic fonts
Russia Government
Sample R data sets Public Domains
San Antonio, TX Government Community Information Now – CI:Now is a nonprofit
San Diego, CA Government
San Francisco Data sets Government
San Jose, California, US Government
San Mateo County, California, US Government
Sanger Catalogue of Somatic Mutations in Cancer (COSMIC) Biology COSMIC, the Catalogue Of Somatic Mutations In Cancer, is the world’s largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer.
Sanger Genomics of Drug Sensitivity in Cancer Project (GDSC) Biology
Saskatchewan, Province of Canada Government
SaudiNewsNet Collection of Saudi Newspaper Articles (Arabic, 30K articles) Museums
SciencesPo World Trade Gravity Datasets Economics
Scopus Citation Database Complex Networks
Seattle Government
SEC EDGAR Finance EDGAR, the Electronic Data Gathering, Analysis, and Retrieval
SEER-YR1973_2015.SEER9 Healthcare
SEER-YR1992_2015.SJ_LA_RG_AK Healthcare
SEER-YR2000_2015.CA_KY_LO_NJ_GA Healthcare
SEER-YR2000_2015.CA_KY_LO_NJ_GA Healthcare
Sentiment Analysis dataset -Sentiment140 Public Domains
Sequence Read Archive(SRA) Biology
Several Shape-from-Silhouette Datasets Images
Singapore Government Data Government
Skin Cancer dataset Healthcare
Skytrax’ Air Travel Reviews Dataset Social Networks
Sloan Digital Sky Survey (SDSS) Physics
Small Network Data Complex Networks
Smart Meter Data Portal Energy
Smithsonian Institution Global Volcano and Eruption Database Sciences
SMS Spam Collection in English Museums
Social Twitter Data Social Networks
Socrata Data Catalogue Socrata hosts cleaned datasets across domains such as Government data, Radiation data, Workplace related data etc.
Source Code Identifiers Software
SourceForge.net Research Data Social Networks
South Africa Government
South Africa Trade Statistics Government
Space Apps Challenge Data Challenges
Spam Messages dataset Complex Networks
Spoken digit dataset Complex Networks
St Louis Federal Finance
StackExchange Data Explorer Sciences
Standard Sentiment dataset from Stanford Public Domains
Stanford Dogs Dataset Images
Stanford GraphBase Complex Networks
Stanford Large Network Dataset Collection Complex Networks
Stanford Longitudinal Network Data Sources Complex Networks
Stanford Microarray Data Biology
Stanford Question Answering Dataset (SQuAD) Museums
State of Utah, US Government
Statista.com – statistics and Studies Search Engines
Statistics from the General Statistics Office of Vietnam Government
Stats4Stem R data sets (archived) Public Domains
StatSci.org Public Domains
Stock Market dataset -DataHub Finance
Stowers Institute Original Data Repository Biology The Stowers Institute for Medical Research focuses on basic biomedical research in genetic model organisms as a way to understand the molecular mechanisms…
Street View House Numbers from Stanford Education
Student Data from Free Code Camp Education
Study Forrest Neuroscience
SUN database, MIT Images
SVIRO Synthetic Vehicle Interior Rear Seat Occupancy Images
Switzerland Government
SYND Energy A synthetic energy dataset for non-intrusive load monitoring
Systems Science of Biological Dynamics (SSBD) Database Biology
Taiwan Government
Taiwan gov Government
Tate Collection metadata Museums
TCGA-PRAD-US Healthcare
Tel-Aviv Open Data Government
Telecom Italia Big Data Challenge Data Challenges
Tennis database of rankings, results, and stats for ATP Sports
Tennis database of rankings, results, and stats for WTA Sports
Terrorism Research and Analysis Consortium Sciences
Texas Inmates Executed Since 1984 Sciences
Texas Open Data Government
Text data from eBook dataset Public Domains
The Action Similarity Labeling (ASLAN) Challenge Images
The Atlas of Economic Complexity Economics
The Big Bad NLP Database Museums
The Cancer Genome Atlas (TCGA), available via Broad GDAC Biology
The Cancer Genome Atlas project (TCGA) Healthcare
The Cancer Imaging Archive (TCIA) Healthcare
The Catalogue of Life Biology
The Center for International Data Economics
The COVID Tracking Project Healthcare The COVID Tracking Project collects
The Getty vocabularies Museums
The global dataset of historical yields for major crops 1981–2016 Agriculture Loading…
The Koblenz Network Collection Complex Networks
The Laboratory for Web Algorithmics (UNIMI) Complex Networks
The Observatory of Economic Complexity Economics
The Oxford-IIIT Pet Dataset Images
The Peer-to-Peer Trace Archive Computer Networks
The Personal Genome Project Biology
The Public Utility Data Liberation Project (PUDL) Energy
The Washington Post List Public Domains
The World Bank Government
The World Bank Open Data Resources for Climate Change Climate/Weather
TIGER/Line – U.S. boundaries and roads GIS
TikTok Dataset Machine Learning
Time Series Data Library (TSDL) from MU Time Series
Titanic Survival Data Set Sciences
Top Streamers on Twitch Entertainment This contains data of Top 1000 Streamers
Toronto Bike Share Stations (JSON and GBFS files) Transportation
Toronto, ON, Canada Government
Tracebase Energy
Traffic and Log Data Captured During a Cyber Defense Exercise CyberSecurity
Transport for London (TFL) Transportation
TravisTorrent Dataset Data Challenges MSR’2017 Mining Challenge
TunedIT Data Challenges Data mining & machine learning data sets, algorithms, challenges
Tunisia Government
Turing Change Point Dataset Time Series
Twenty News Groups dataset Public Domains
Twitch Top Streamer’s Data Social Networks
Twitter Data for Online Reputation Management Social Networks
Twitter Data for Sentiment Analysis Social Networks
Twitter data on US Airline sentiment Social Networks
Twitter Graph of entire Twitter site Social Networks
Twitter Scrape Calufa May 2011 Social Networks
TwoFishes – Foursquare’s coarse geocoder GIS
TZ Timezones shapefile GIS
U.K. Government Data Government
U.S. American Community Survey Government
U.S. Bureau of Transportation Statistics (BTS) Transportation
U.S. CDC Public Health datasets Government
U.S. Census Bureau Government
U.S. Congressional Research Service (CRS) Reports Government
U.S. Department of Agriculture’s Nutrient Database Agriculture
U.S. Department of Housing and Urban Development (HUD) Government
U.S. Domestic Flights 1990 to 2009 Transportation
U.S. Federal Government Agencies Government
U.S. Federal Government Data Catalog Government
U.S. Food and Drug Administration (FDA) Government
U.S. Freight Analysis Framework since 2007 Transportation
U.S. National Center for Education Statistics (NCES) Government
U.S. Open Government Government
U.S. Patent and Trademark Office (USPTO) Bulk Data Products Government
UC Riverside Time Series Dataset Time Series
UCB’s Archive of Social Science Data (D-Lab) Sciences
UCI Machine Learning Repository Machine Learning This site consists of datasets hosted by the Education of California, Irvine. It has a collection of about 400+ datasets aimed towards the Machine Learning community.
UCI Network Data Repository Complex Networks
UCI Spam Email dataset Complex Networks
UCLA Social Sciences Data Archive Sciences
UCLA SOCR data collection Public Domains
UCSC Public Data Biology
UCSD Network Telescope, IPv4 /8 net Computer Networks
UEA Climatic Research Unit Climate/Weather The Climatic Research Unit (CRU) is based in the School of Environmental Sciences , UEA and is considered one of the world’s leading institution concerned with the study of natural and anthropogenic climate change.
UFL sparse matrix collection Complex Networks
UFO Reports Public Domains
Uganda Bureau of Statistics Government
UK 2011 Census Open Atlas Project Government
UK-DALE – UK Domestic Appliance-Level Electricity Energy
Ukraine Government
Ukraine Energy Centre Datasets Energy
UN Civil Society Database Sciences
UN Commodity Trade Statistics Economics
UN Environmental Data GIS
UN Human Development Reports Economics
UNICEF Data Catalogue This site hosts data about the lives of children with details such as Nutrition, Education etc.
UniGene Biology
UNIMI/LAW Social Network Datasets Social Networks
United Nations Government
United States Congress Twitter Data – Daily datasets with tweets of 1100+ […] Social Networks
Universal Dependencies Museums
Universal Protein Resource (UnitProt) Biology
Universities Worldwide Sciences
Education of North Carolina dataset Education Health related datasets.
UPJOHN for Labor Employment Research Sciences
Uppsala Conflict Data Program Sciences
Uruguay Government
US Census data Government Detailed US census data
US Counties Government This is a repository of various data, broken down by US
US visualization public data Government
USA Soccer Teams and Locations – USA soccer teams and locations. MLS, […] Sports
USENET postings corpus of 2005~2011 Museums
USGS Earthquake Archives Science
Valley Transportation Authority (VTA), California, US Government
Vancouver, BC Open Data Catalog Government
Victoria, BC, Canada Government
Vienna, Austria Government
Violent-Flows – Crowd Violence / Non-violence Database and benchmark Images
Visual genome Images
Visual QA dataset Images It is a dataset for Open ended questions about images.
Wahington Post Climate Change Climate/Weather Loading…
Walmart dataset eCommerce This dataset has details about Sales transactions from about 45 Walmart stores in the US.
Weather Forecasting dataset Climate/Weather
Webhose – News/Blogs in multiple languages Museums
Wikidata – Wikipedia databases Museums
Wikileaks 911 pager intercepts Public Domains
Wikipedia Database Public Domains This dataset is perfect for Natural Language Processing Tasks.
Wikipedia Links data – 40 Million Entities in Context Museums
WordNet databases and tools Museums
World Bank Finance These datasets are offered by the World Bank. They also provide several tools such as Education Indices, Open Data Catalog etc.
World Bank dataset Finance
World Bank Open Data Sciences
World boundaries from the U.S. Department of State GIS
World countries in multiple formats GIS
World Health Organization Global Health Observatory Healthcare
World Inequality Database Sciences
WorldClim – Global Climate Data Climate/Weather
WorldPop project Sciences
WorldTree Corpus of Explanation Graphs for Elementary Science Questions Museums
WSU Graph Database Complex Networks
WU Historical Weather Worldwide Climate/Weather
xView Images One of the largest overhead image dataset.
Yahoo Finance Finance
Yahoo Knowledge Graph COVID-19 Datasets Healthcare
Yahoo Webscope Public Domains
Yahoo! Graph and Social Data Social Networks
Yahoo! Ratings and Classification Data Machine Learning
Yelp dataset Public Domains This dataset contains over 8 million + yelp reviews. This dataset is perfect for Text Classification use cases.
Yelp Dataset Challenge Public Domains
Youtube 8m Machine Learning
YouTube Faces Database Images
YouTube Video dataset Public Domains This is a YouTube labelled video dataset. It consists of 8 million video IDs with related data.
Youtube Video Social Graph in 2007,2008 Social Networks
YouTube-BoundingBoxes Machine Learning
Zalando -Fashion MNIST dataset eCommerce
Zenodo – An open dependable home for the long-tail of science Search Engines
Svenska kraftnät Energy Electricity statistics – We produce Sweden statistics for production, consumption, imbalance index, import and export and more.
Mimer Energy Production statistics
Noorpool Energy Nord Pool runs the leading power market in Europe offering both day-ahead and intraday markets to its customers.
OpenAQ Air Quality OpenAQ is a non-profit organization empowering communities around the globe to clean their air by harmonizing, sharing, and using open air quality data.
Datasets Provider URL Category Description



, ,



One response to “List of datasets for all your data science work”

  1. […] analytics project I do, the first thing that I need is a dataset. I have put together a list of online datasets that you can use for your various analytics, there are sometimes you wish to extract data on your […]

Leave a Reply

Your email address will not be published. Required fields are marked *