Transform your Data Science Projects with these 5 Steps of Design Thinking

栏目: IT技术 · 发布时间: 5年前

内容简介:It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.The data science field is transforming in 2020 at the speed that software engineering changed in 2010.

Collect, Refine, Expand, Learn & Maintain

Transform your Data Science Projects with these 5 Steps of Design Thinking

Probability does not result in Inference for all populations (Credit: Jeff Leek)

S killed data scientists share something in common. They can build product solutions… with data.

It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.

Modern data scientists require a new mindset: design thinking.

The data science field is transforming in 2020 at the speed that software engineering changed in 2010.

Products, frameworks, and programming languages will fade out of popularity; design thinking is always relevant.

Data scientists and students know me for the Data Science Standards¹, a framework I created to launch data science products in businesses.

Here are the 5 Stages of Design Thinking with step-by-step actions and questions to guide you in your data science journey.

Step 1: Data Collection

Y our ability to ask actionable questions to aggregate, browse, and collect data can mean the difference between a successful product and research that is never implemented.

Product success requires thorough data navigation skills and a checklist that focuses on a repeatable process.

Ask yourself these questions when collecting data:

  • Where is my data stored?
  • How large is the data size?
  • What quantity and quality of data will I need to launch this product or service?
  • Who manages the data that I need to access?
  • When is the data updated?
  • Why is this data relevant for my product?

Step 2: Data Refinement

L arge quantities of data are good; high quality data is better. World class Kaggle Grandmasters win competitions and Data Scientists are promoted at work when they invest their time to refine data.

Products managers and software engineers do not take responsibility for data refinement, which requires skilled data scientists to make the difficult decisions on what makes reliable and responsible data.

Start with these questions when refining data:

  • Who has insight into data dictionaries for data features?
  • What data requires querying, feature engineering, and pre-processing? By what techniques?
  • When will the required data be ready in a high quality/high quantity state to move to the next stage of the Data Science Workflow?
  • Where will the refined data be stored?
  • Why will data need to be refined?
  • How will the refined data be tested and validated for consistent performance?

Step 3: Data Expansion

E ven with the best data available for a data scientist, a problem may not be solvable. Frequently, more data can be the difference between a dead-end product or a product that leads the market with unique insights.

Successful products in 2020 require both data refinement and data expansion. Integrations with APIs, similar datasets, and alternative data gives data science teams the confidence to potentially discover important insights from data. Data expansion enables feature enrichment and extends the data science workflow success rate for products.

Apply these questions when expanding data:

  • Who controls data access?
  • What budget is available to acquire or generate more data?
  • When do you stop expanding data or continue to iterate with machine learning?
  • Where can you acquire high quality data sources?
  • Why are more data features needed to improve your product or solution?
  • How will you decide what data is most relevant to expand your data?

Step 4: Data Learning

A nalytics and business intelligence test what data variables may be important; data learning runs models on features to predict insights for a product.

Data Learning considers how compute, storage, and machine learning frameworks can accelerate your workflow.

Ask yourself these questions during the Data Learning stage:

  • Who determines what benchmarks are needed for a successful model?
  • What machine learning frameworks and algorithms will you choose for what you will predict?
  • When do you decide that your modeling results are significant or ready for production?
  • Where will you process data learning locally or on what cloud systems?
  • Why does your feature request or product need machine learning?
  • How much compute time and compute resources are available to model the data?

Transform your Data Science Projects with these 5 Steps of Design Thinking

Design Think with Tidyverse in R (Credit: Researchgate)

Step 5: Data Maintenance

Y our machine learning has exceeded benchmarks and you have implemented your solution into production with your data engineer and software engineers.

But now what?

All machine learning and data reduces in quality over time. Skilled data scientists monitor their machine learning to verify results and they maintain quality in production.

Apply these questions to better monitor your data:

  • Who is responsible for making changes to data models when performance changes?
  • What triggers, pipelines or data jobs do you implement to monitor the quality of your data in production?
  • When performance falls below required benchmarks, what data governance processes do you action?
  • Where will you commit time in your schedule on a recurring basis to monitor your data pipeline for quality control?
  • Why are your data modeling results reducing in quality in production?
  • How do you communicate data modeling results to your product managers, data engineers, and software engineers and with what frequency?

In Summary:

For your current and next data science product features, think about all 5 Steps of Design Thinking in your Data Science workflow: (1) Data Cleaning , (2) Data Refinement , (3) Data Expansion , (4) Data Learning , and (5) Data Maintenance .

With Design Thinking applied to your data science workflow, you will be a better data scientist starting today.

Works Cited:

¹ Data Science Standards

More from David Yakobovitch:

Listen to the HumAIn Podcast | Subscribe to my newsletter


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Go Web编程

Go Web编程

谢孟军 / 电子工业出版社 / 2013-6-1 / 65.00元

《Go Web编程》介绍如何用Go语言进行Web应用的开发,将Go语言的特性与Web开发实战组合到一起,帮读者成功地构建跨平台的应用程序,节省Go语言开发Web的宝贵时间。有了这些针对真实问题的解决方案放在手边,大多数编程难题都会迎刃而解。 在《Go Web编程》中,读者可以更加方便地找到各种编程问题的解决方案,内容涵盖文本处理、表单处理、Session管理、数据库交互、加/解密、国际化和标......一起来看看 《Go Web编程》 这本书的介绍吧!

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换