Full Stack Data Scientist: a Jack-of-All-Trades

栏目: IT技术 · 发布时间: 5年前

内容简介:A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.The scope of a full stack data scientist covers every component of a data science business initiative, from ident

What is a Full Stack Data Scientist?

The scope of the role and skills required

Photo by freestocks on Unsplash

A full-stack data scientist is a jack-of-all trades who engineers and works on each stage in the data science lifecycle, from beginning to end.

The scope of a full stack data scientist covers every component of a data science business initiative, from identifying to training to deploying machine learning models that provide benefit to stakeholders.

Basic stages in the data science life cycle

Basic stages in the data science lifecycle that can be owned by a full stack data scientist:

  1. Business problem. Unless research-oriented, all data science projects should start with a problem that adds value to a business either through efficiency gains, automation, or new capabilities.
  2. Data collection/identification . Machine learning requires quality data to build a quality model for use.
  3. Data exploration and analysis . The data must be analyzed and understood before a model can be built.
  4. Machine learning . Train a model to solve the business problem given the data.
  5. Model analysis and acceptance . Analyze the model results and behavior. Share with stakeholders for approval.
  6. Model deployment . Make the model accessible to the end-user.
  7. Model monitoring . Ensure that the model behaves as expected in the future.

A Jack of all Trades: the Skillset

The high-level skills listed are also keys to successful data science initiatives. It is worth highlighting the soft skills , without which data science technology may not provide value.

Business Acumen

A full stack data scientist must be able to identify and understand business problems (or opportunities) that can be solved using the data science toolkit.

In order to prioritize projects and process flows with most value to their organization, they must understand the needs and goals of their organization.

Ultimately the business doesn’t care how cool or accurate a model is if it provides no value.

Collaboration

Full stack data scientists do not work in a vacuum. They must collaborate with stakeholders to identify existing problems or inefficiencies that can be solved with data science. Once problems are identified, collaboration is essential to ensure that the result is acceptable and meets their needs. Further, collaboration with SME’s (Subject Matter Expert) enables a them to work quickly, such as finding data sources in the organization.

Communication

Effective communication with the business via oral and written mediums allows for better collaboration and “selling” the model to the end-users. This means tailoring data science ideas, results, and value in plain language to non-technical audiences. In some cases, the end-user must understand and trust the model before they choose to use it.

Identifying Data Sources and ETL

Models cannot be trained if there is no data. Oftentimes data is not readily available; it needs to be found, extracted, transformed, and loaded to the right place.

Programming

A full stack data scientist must be able to write clean, efficient object-oriented code that works reliably in production. Ideally, such code will be modular and each function or class validated by unit tests.

Data Analysis and Exploration

This skill is essential because useful machine learning models cannot be built without data understanding.

Machine Learning and Statistics

Perhaps this is a given — without machine learning or statistics, the work is not “data science”. A full stack data scientist must be able to experiment with appropriate machine learning algorithms to solve machine learning problems.

It is worth highlighting that sometimes implementing logical or business rules over a machine learning solution results in immediate value to the business, despite the simplicity. A machine learning model can take weeks or months to get right whereas a business rule may be “good enough” for now.

Model Deployment / Data Engineering

Lastly, a full stack data scientist must have the skill to deploy model pipelines to production. Model pipelines allow the end-user to query a model with data or access pre-generated model results in a desired way. If no deployment mechanism exists, they must be able to design and set up this pipeline.

If a model is not deployed (or perhaps presented in a business analysis), it is not useful and does not provide business value.

Master of None: the Challenges

The skills listed in the previous section are diverse and varied.

With such varied requirements, is not possible for a full stack data scientist to master all the skills, especially as technology, algorithms, and tools advance. Instead, this person must pick and choose which elements are most useful to the projects at hand and interesting to focus on.

Two key underlying skills of the full stack data scientist are the ability to design a system or process and the ability to quickly pick up new technologies.

The Benefit

On the flip side, a full stack data scientist is a data science team rolled up into one (or two).

For organizations new to data science, they can create business value without immediately building up a full team. To be most effective, the full stack data scientist should be given the ability to select and apply the right tools.

Takeaways

A full stack data scientist goes above and beyond the typical data scientist role in two ways:

  1. Links business needs to machine learning (or not machine learning ) solutions
  2. Deploys models to “production”

These two elements are the keys to any organization extracting value from data science — solving the right problems and making them accessible to the end-user.

Further Reading


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

ASP.NET动态网站开发基础教程

ASP.NET动态网站开发基础教程

郭兴峰 / 清华大学 / 2006-5 / 32.00元

ASP.NET是由Microsoft公司推出的新一代Web开发构架。开发人员可以通过ASP.NET实现动态网站的开发,包括开发Web应用程序和Web服务。   本书详细讲解了ASP.NET动态网站开发技术,共分13章,内容包括ASP.NET语言基础、HTML与Script语言、C#语言基础、ASP.NET常用对象、数据库访问技术、数据服务控件和数据绑定技术、ASP.NET配置和部署、ASP.......一起来看看 《ASP.NET动态网站开发基础教程》 这本书的介绍吧!

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具