Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

栏目: IT技术 · 发布时间: 5年前

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

Because the journey matters more than the destination, and having the right tools makes the journey adorable.

:warning: this article is constantly updated with new stuff I discover and gets recommended in the comments!:warning:

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

Indeed, I would go there for the holidays. (Photo by James Wheeler on pexels.com)

S ince the success of myprevious article, I decided to keep with this format, outsourcing tools, libraries, and everything I recently discovered and use in The Lab.

Every day there is a new framework, a new implementation, or a new tool that makes life easier. It is indeed very hard to stay updated, and even doing it would mean giving up a lot of time we could invest in doing something else.

Of course, we are not meant to be always updated with the newest discoveries, or the new minor release of a specific library but sometimes it is necessary, interesting or simply we are curious to find something new!

Let’s start!

  • Texthero : Text preprocessing, representation, and visualization from zero to hero. Apply tf-idf , tokenize, do PCA in a pipeline-oriented way.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

  • Google Data Studio: your to-go frontend. Create dashboards, reports, and analyses in a Google Docs way. Just plug your DB, upload your CSVs, and get started for free.
  • Deepnote : jupyter notebooks on steroids. Collaborate, code reviews, better plotting, support for AWS S3, MongoDB, and many more. All in your browser.
  • Streamlit : The fastest way to build data apps. An alternative to Google Data Studio. Create python-based web apps, visualizations, and reports.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

  • Coming from R and switching to python? Try plotnine : an implementation of a grammar of graphics in Python, based on ggplot2 .
  • pivottablejs : drag ‘n drop pivot tables in Jupyter Notebooks.
  • RISE : turn your notebooks into a reveal.js -based slideshow.
  • gmaps : Google Maps-based visualization library: create beautiful and interactive maps and heatmaps.
  • flair : a very simple framework for state-of-the-art NLP. Backed by Zalando in Berlin.
  • light fm : a python implementation of popular recommendation algorithms.
  • ds-cheatsheets : a huge collection of cheatsheets, from python to R, including SQL.
  • Scraper.AI : a web scraper that actually works.
  • AlwaysAI : deploy computer vision model to edge devices such as Nvidia Jetson, Raspberry PI in minutes. Their catalogue covers different pre-trained models, from object segmentation to pose estimation.
  • Notion : The unopinionated note-taking app. Use Markdown, create tables, list, canvas, and even kanban boards.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

Notion: the note-taking app you will actually use.

Ah, they also provide a Python API :

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

  • Weights & Biases : while training deep learning models, often could happen that results from experiments get lost, overwritten or it is difficult to keep track. Weights &Biases help you keep track of model training, experiments, just by adding a few lines of code.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

An example of W&B using Tensorflow
  • Machine learning without code? Obviously AI : probably the next step in AutoML, it is enough to upload (or connect) your data, pick your target, and ObviouslyAI will do the rest making the ML process accessible to anyone. They also generate a decision tree for you, helping to provide an explainable model.

An example using the AT&T churn dataset.
  • ML Playground : play with different algorithms, add neurons, remove layers, draw your data, or upload yours!
  • Papers with code : exactly what it says, find papers with attached together the GitHub repo. Ready to be forked!
  • Clever Grid : Get a 1-core GPU along with 250GB of training data for about 10€ per day.
  • AWS DeepRacer : train your self-driving (model) car, compete with other people on famous F 1 tracks such as Circuit de Barcelona-Catalunya. You can also buy a hardware version of the DeepRacer car on Amazon.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

  • Music Time for Spotify : a VSCode editor plugin that discovers your most productive music to listen to while you code.
  • gspread_dataframe : ever needed to send a pandas dataframe to Google Sheets?
  • Kite : when AI meets code autocompletion and suggestions. They provide plugins for mayor Python IDE such as VSCode, Pycharm, and Spyder.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

By looking at the facial expression he definitely read a few chapters. Hey Daniel Bourke , what about yours? :-)
  • datatau : Hacker News for data science.
  • Deta : a cloud provider with a particularly generous free tier!
  • Looking for a side project? Find side projects you are interested in and take part in! Solodoers .
  • cookiecutter-data-science : a project bootstrapper for data science. Because data science code quality is about correctness and reproducibility.
  • tqdm : because we always wanted to have a progress bar in for loops.
  • ELI5 : visualize and debug various Machine Learning models, from black-boxes to explainable AI.
Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity
An example of ELI5 applied to text categorization.
  • Self-promotion : some time ago I didthis tutorial on how to create a motion heatmap using Open-cv. Since it’s one of my most starred Github projects here you are!

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

  • gpxpy : You know you can export your favorite’s running app data into a .gpx file? Those files can be parsed into pandas (maybe a data science project for your portfolio?) I once did something similar exporting data from a sailboat trip:

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

Not that windy…

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity

  • Lifelines : a Python library that implements common survival analysis models. Survival analysis is widely used in predicting things such as how likely an event occurs at a specific time, for example, that a customer will unsubscribe to our service.
  • tensor-house : A collection of reference machine learning and optimization models for enterprise operations, really interesting to learn how different machine learning models can be used together to solve different real-life problems.
  • Gradio : create easy-to-use UIs for your models, very interesting for showcasing models predictions, from NLP to images and regressions.

Data Scientist’s Toolbelt: A List of Tools to Help You Grow and Increase Productivity


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

阿里巴巴

阿里巴巴

邓肯·克拉克 (Duncan Clark) / 中信出版社 / 2016-8-22 / CNY 58.00

阿里巴巴的故事在中国已是家喻户晓,马云的个人魅力和非凡的商业头脑也早已声名远扬。而一千个人眼中会有一千个不一样的马云, 一个外国投资人、咨询顾问眼中的马云和阿里巴巴会是什么样的?1994年就来到中国,阿里巴巴创业早期的咨询顾问克拉克先生将阿里巴巴帝国崛起过程中他的见闻、感触和思考结合深入的访谈、研究写成了这本书。 书中既可以读到阿里巴巴艰辛的创业历程、惊心动魄的商业对垒,也不乏有趣好玩儿的背......一起来看看 《阿里巴巴》 这本书的介绍吧!

JSON 在线解析
JSON 在线解析

在线 JSON 格式化工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具