内容简介:Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there
The Best (FREE) Data Repositories for Aspiring Data Scientists
A Quick Reference guide to data for any and every industry imaginable
Draft ·4min read
Earlier this week, Google announced that its Dataset Search engine is now out of beta . This is a great accomplishment for the world and an invaluable tool for any aspiring Data Scientist in 2020.
In honor of the news, I thought I’d put together a list of my favorite data repositories that I’ve used in the past to create a quick reference guide for any and all aspiring Data Scientists. No matter what industry you want to get into, there’s definitely a dataset for it here :)
Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there
Date is Plural is a weekly newsletter of useful/curious datasets. You can find a huge archive of datasets on their google doc. Just hit ctrl + f for a topic you’d like to look into and see the dozens of results that pop up.
Data World is an open data repository containing data contributed by thousands of users and organizations all across the world.
What I love about this is site is that it contains really hard to find data from. In particular, the healthcare field is one of the more difficult industries to get publicly available data from(due to privacy concerns). But luckily, Data World has 3667 free health datasets you can use for your next project .
A data set search engine… powered by Google. No further explanation needed.
Kaggle enables data scientists and other developers to engage in running machine learning contests, write and share code, and to host datasets. The types of data science problems posted on Kaggle can be anything from attempting to predict cancer occurrence by examining patient records to analyzing sentiment to evoke by movie reviews and how this affects audience reaction.
This repository is mostly for data visualizations, but I think what they do is a lot of fun.
Makeover Monday was an initiative started in the first week of 2016, between Andy Kriebel (Head Coach, the Information Lab UK — @ vizwizbi ) and Andy Cotgreave (Tableau Evangelist — @ acotgreave ).
Every week, usually on a Sunday, Andy K will post (via blog and twitter) an original visualization to be “made over”. Some are awful, some are already great in which case the challenge is to present a different angle on the original
When complete, post a link to the visualisation and/or a picture, using the hashtag #MakeoverMonday. All the individual screenshots are compiled into one big Pinterest collage of combined visualizations
A place to share, find, and discuss Datasets. You can request datasets from other subsribers as well as share and contribute your own.
UCI Machine Learning Repository
The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited “papers” in all of computer science.
Under the terms of the 2013 Federal Open Data Policy , newly-generated government data is required to be made available in open, machine-readable formats, while continuing to ensure privacy and security.
That’s going to be all for now. Please feel free to bookmark this article and use it as a quick reference for your data pursuits.
Did I miss your favorite repository? Let me know below so I can add it to the guide. Until next time everyone, happy coding.
My name is Kishen Sharma and I am a Data Scientist based in the Bay Area. I create content to educate and motivate aspiring Data Scientists all across the world.
Links to my blog and social media : https://linktr.ee/keesh_codes
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。