Full Stack Data Scientist: a Jack-of-All-Trades

栏目: IT技术 · 发布时间: 4年前

内容简介:A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.The scope of a full stack data scientist covers every component of a data science business initiative, from ident

What is a Full Stack Data Scientist?

The scope of the role and skills required

Photo by freestocks on Unsplash

A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.

The scope of a full stack data scientist covers every component of a data science business initiative, from identifying to training to deploying machine learning models that provide benefit to stakeholders.

Basic stages in the data science life cycle

Basic stages in the data science lifecycle that can be owned by a full stack data scientist:

  1. Business problem. Unless research-oriented, all data science projects should start with a problem that adds value to a business either through efficiency gains, automation, or new capabilities.
  2. Data collection/identification . Machine learning requires quality data to build a quality model for use.
  3. Data exploration and analysis . The data must be analyzed and understood before a model can be built.
  4. Machine learning . Train a model to solve the business problem given the data.
  5. Model analysis and acceptance . Analyze the model results and behavior. Share with stakeholders for approval.
  6. Model deployment . Make the model accessible to the end-user.
  7. Model monitoring . Ensure that the model behaves as expected in the future.

A Jack of all Trades: the Skillset

The high-level skills listed are also keys to successful data science initiatives. It is worth highlighting the soft skills , without which data science technology may not provide value.

Business Acumen

A full stack data scientist must be able to identify and understand business problems (or opportunities) that can be solved using the data science toolkit.

In order to prioritize projects and process flows with most value to their organization, they must understand the needs and goals of their organization.

Ultimately the business doesn’t care how cool or accurate a model is if it provides no value.

Collaboration

Full stack data scientists do not work in a vacuum. They must collaborate with stakeholders to identify existing problems or inefficiencies that can be solved with data science. Once problems are identified, collaboration is essential to ensure that the result is acceptable and meets their needs. Further, collaboration with SME’s (Subject Matter Expert) enables a them to work quickly, such as finding data sources in the organization.

Communication

Effective communication with the business via oral and written mediums allows for better collaboration and “selling” the model to the end-users. This means tailoring data science ideas, results, and value in plain language to non-technical audiences. In some cases, the end-user must understand and trust the model before they choose to use it.

Identifying Data Sources and ETL

Models cannot be trained if there is no data. Oftentimes data is not readily available; it needs to be found, extracted, transformed, and loaded to the right place.

Programming

A full stack data scientist must be able to write clean, efficient object-oriented code that works reliably in production. Ideally, such code will be modular and each function or class validated by unit tests.

Data Analysis and Exploration

This skill is essential because useful machine learning models cannot be built without data understanding.

Machine Learning and Statistics

Perhaps this is a given — without machine learning or statistics, the work is not “data science”. A full stack data scientist must be able to experiment with appropriate machine learning algorithms to solve machine learning problems.

It is worth highlighting that sometimes implementing logical or business rules over a machine learning solution results in immediate value to the business, despite the simplicity. A machine learning model can take weeks or months to get right whereas a business rule may be “good enough” for now.

Model Deployment / Data Engineering

Lastly, a full stack data scientist must have the skill to deploy model pipelines to production. Model pipelines allow the end-user to query a model with data or access pre-generated model results in a desired way. If no deployment mechanism exists, they must be able to design and set up this pipeline.

If a model is not deployed (or perhaps presented in a business analysis), it is not useful and does not provide business value.

Master of None: the Challenges

The skills listed in the previous section are diverse and varied.

With such varied requirements, is not possible for a full stack data scientist to master all the skills, especially as technology, algorithms, and tools advance. Instead, this person must pick and choose which elements are most useful to the projects at hand and interesting to focus on.

Two key underlying skills of the full stack data scientist are the ability to design a system or process and the ability to quickly pick up new technologies.

The Benefit

On the flip side, a full stack data scientist is a data science team rolled up into one (or two).

For organizations new to data science, they can create business value without immediately building up a full team. To be most effective, the full stack data scientist should be given the ability to select and apply the right tools.

Takeaways

A full stack data scientist goes above and beyond the typical data scientist role in two ways:

  1. Links business needs to machine learning (or not machine learning ) solutions
  2. Deploys models to “production”

These two elements are the keys to any organization extracting value from data science — solving the right problems and making them accessible to the end-user.

Further Reading


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

SQL基础教程

SQL基础教程

MICK / 孙淼、罗勇 / 人民邮电出版社 / 2013-8-1 / CNY 69.00

本书介绍了关系数据库以及用来操作关系数据库的SQL语言的使用方法,提供了大量的示例程序和详实的操作步骤说明,读者可以亲自动手解决具体问题,循序渐进地掌握SQL的基础知识和技巧,切实提高自身的编程能力。在每章结尾备有习题,用来检验读者对该章内容的理解程度。另外本书还将重要知识点总结为“法则”,方便大家随时查阅。 本书适合完全没有或者具备较少编程和系统开发经验的初学者,也可以作为大中专院校的教材......一起来看看 《SQL基础教程》 这本书的介绍吧!

RGB转16进制工具
RGB转16进制工具

RGB HEX 互转工具

URL 编码/解码
URL 编码/解码

URL 编码/解码