The Best (FREE) Data Repositories for Aspiring Data Scientists in 2020

栏目: IT技术 · 发布时间: 4年前

内容简介:Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there

The Best (FREE) Data Repositories for Aspiring Data Scientists

A Quick Reference guide to data for any and every industry imaginable

The Best (FREE) Data Repositories for Aspiring Data Scientists in 2020

Earlier this week, Google announced that its Dataset Search engine is now out of beta . This is a great accomplishment for the world and an invaluable tool for any aspiring Data Scientist in 2020.

In honor of the news, I thought I’d put together a list of my favorite data repositories that I’ve used in the past to create a quick reference guide for any and all aspiring Data Scientists. No matter what industry you want to get into, there’s definitely a dataset for it here :)

Awesome Public Datasets

Awesome Public Datasets is a repository on GitHub of high quality topic-centric public data sources. They are collected and tidied from blogs, answers, and user responses. Almost all of these are free with a few exceptions here and there

Data is Plural

Date is Plural is a weekly newsletter of useful/curious datasets. You can find a huge archive of datasets on their google doc. Just hit ctrl + f for a topic you’d like to look into and see the dozens of results that pop up.

Data World

Data World is an open data repository containing data contributed by thousands of users and organizations all across the world.

What I love about this is site is that it contains really hard to find data from. In particular, the healthcare field is one of the more difficult industries to get publicly available data from(due to privacy concerns). But luckily, Data World has 3667 free health datasets you can use for your next project .

Google Data Set Search

A data set search engine… powered by Google. No further explanation needed.

Kaggle

Kaggle enables data scientists and other developers to engage in running machine learning contests, write and share code, and to host datasets. The types of data science problems posted on Kaggle can be anything from attempting to predict cancer occurrence by examining patient records to analyzing sentiment to evoke by movie reviews and how this affects audience reaction.

Makeover Monday

This repository is mostly for data visualizations, but I think what they do is a lot of fun.

Makeover Monday was an initiative started in the first week of 2016, between Andy Kriebel (Head Coach, the Information Lab UK — @ vizwizbi ) and Andy Cotgreave (Tableau Evangelist — @ acotgreave ).

Every week, usually on a Sunday, Andy K will post (via blog and twitter) an original visualization to be “made over”. Some are awful, some are already great in which case the challenge is to present a different angle on the original

When complete, post a link to the visualisation and/or a picture, using the hashtag #MakeoverMonday. All the individual screenshots are compiled into one big Pinterest collage of combined visualizations

r/datasets/

A place to share, find, and discuss Datasets. You can request datasets from other subsribers as well as share and contribute your own.

UCI Machine Learning Repository

The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited “papers” in all of computer science.

United States Government

Under the terms of the 2013 Federal Open Data Policy , newly-generated government data is required to be made available in open, machine-readable formats, while continuing to ensure privacy and security.

That’s going to be all for now. Please feel free to bookmark this article and use it as a quick reference for your data pursuits.

Did I miss your favorite repository? Let me know below so I can add it to the guide. Until next time everyone, happy coding.

The Best (FREE) Data Repositories for Aspiring Data Scientists in 2020

My name is Kishen Sharma and I am a Data Scientist based in the Bay Area. I create content to educate and motivate aspiring Data Scientists all across the world.

Links to my blog and social media : https://linktr.ee/keesh_codes


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

个体崛起

个体崛起

水木然 / 2017-8-1 / 49.00

互联网时代,社会的组织结构发生着巨变,个体经济将开始大放异彩,未来社会经济的基本单元不再是企业,而是个体。 在这种变化中,如何围绕核心竞争力来构建商业模式,在市场上取得独特地位?每个人的核心竞争力是什么?本书围绕经济战略布局、个体发展规划的生存模式予以分析与梳理,同时把个人命运与国家命脉紧密联系在一起,充满正能量,旨在给读者以反思与启示。一起来看看 《个体崛起》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

URL 编码/解码
URL 编码/解码

URL 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具