Transform your Data Science Projects with these 5 Steps of Design Thinking

栏目: IT技术 · 发布时间: 4年前

内容简介：It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.The data science field is transforming in 2020 at the speed that software engineering changed in 2010.

Collect, Refine, Expand, Learn & Maintain

David Yakobovitch

Mar 5 ·5min read

Transform your Data Science Projects with these 5 Steps of Design Thinking — Probability does not result in Inference for all populations (Credit: Jeff Leek)

S killed data scientists share something in common. They can build product solutions… with data.

It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.

Modern data scientists require a new mindset: design thinking.

The data science field is transforming in 2020 at the speed that software engineering changed in 2010.

Products, frameworks, and programming languages will fade out of popularity; design thinking is always relevant.

Data scientists and students know me for the Data Science Standards¹, a framework I created to launch data science products in businesses.

Here are the 5 Stages of Design Thinking with step-by-step actions and questions to guide you in your data science journey.

Step 1: Data Collection

Y our ability to ask actionable questions to aggregate, browse, and collect data can mean the difference between a successful product and research that is never implemented.

Product success requires thorough data navigation skills and a checklist that focuses on a repeatable process.

Ask yourself these questions when collecting data:

Where is my data stored?
How large is the data size?
What quantity and quality of data will I need to launch this product or service?
Who manages the data that I need to access?
When is the data updated?
Why is this data relevant for my product?

Step 2: Data Refinement

L arge quantities of data are good; high quality data is better. World class Kaggle Grandmasters win competitions and Data Scientists are promoted at work when they invest their time to refine data.

Products managers and software engineers do not take responsibility for data refinement, which requires skilled data scientists to make the difficult decisions on what makes reliable and responsible data.

Start with these questions when refining data:

Who has insight into data dictionaries for data features?
What data requires querying, feature engineering, and pre-processing? By what techniques?
When will the required data be ready in a high quality/high quantity state to move to the next stage of the Data Science Workflow?
Where will the refined data be stored?
Why will data need to be refined?
How will the refined data be tested and validated for consistent performance?

Step 3: Data Expansion

E ven with the best data available for a data scientist, a problem may not be solvable. Frequently, more data can be the difference between a dead-end product or a product that leads the market with unique insights.

Successful products in 2020 require both data refinement and data expansion. Integrations with APIs, similar datasets, and alternative data gives data science teams the confidence to potentially discover important insights from data. Data expansion enables feature enrichment and extends the data science workflow success rate for products.

Apply these questions when expanding data:

Who controls data access?
What budget is available to acquire or generate more data?
When do you stop expanding data or continue to iterate with machine learning?
Where can you acquire high quality data sources?
Why are more data features needed to improve your product or solution?
How will you decide what data is most relevant to expand your data?

Step 4: Data Learning

A nalytics and business intelligence test what data variables may be important; data learning runs models on features to predict insights for a product.

Data Learning considers how compute, storage, and machine learning frameworks can accelerate your workflow.

Ask yourself these questions during the Data Learning stage:

Who determines what benchmarks are needed for a successful model?
What machine learning frameworks and algorithms will you choose for what you will predict?
When do you decide that your modeling results are significant or ready for production?
Where will you process data learning locally or on what cloud systems?
Why does your feature request or product need machine learning?
How much compute time and compute resources are available to model the data?

Step 5: Data Maintenance

Y our machine learning has exceeded benchmarks and you have implemented your solution into production with your data engineer and software engineers.

But now what?

All machine learning and data reduces in quality over time. Skilled data scientists monitor their machine learning to verify results and they maintain quality in production.

Apply these questions to better monitor your data:

Who is responsible for making changes to data models when performance changes?
What triggers, pipelines or data jobs do you implement to monitor the quality of your data in production?
When performance falls below required benchmarks, what data governance processes do you action?
Where will you commit time in your schedule on a recurring basis to monitor your data pipeline for quality control?
Why are your data modeling results reducing in quality in production?
How do you communicate data modeling results to your product managers, data engineers, and software engineers and with what frequency?

In Summary:

For your current and next data science product features, think about all 5 Steps of Design Thinking in your Data Science workflow: (1) Data Cleaning , (2) Data Refinement , (3) Data Expansion , (4) Data Learning , and (5) Data Maintenance .

With Design Thinking applied to your data science workflow, you will be a better data scientist starting today.

Works Cited:

¹ Data Science Standards

More from David Yakobovitch:

Listen to the HumAIn Podcast | Subscribe to my newsletter

What are the 10 Must Read Data Science and AI Books of 2020

Data Science and AI — Project-based learning

What Skills New and Seasoned Data Scientists should learn in 2020

Data Science Skills & 2020 Trends

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Transform your Data Science Projects with these 5 Steps of Design Thinking

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

自品牌

[美] 丹·斯柯伯尔（Dan Schawbel） / 佘卓桓 / 湖南文艺出版社 / 2016-1-1 / 39.80元

什么是自品牌？如何利用新媒体推广自己？如何放大自己的职业优势？细化到如何巩固“弱联系”人脉？如何在团队里合作与生存？如何开创自己的事业？这些都是职场人不得不面临的问题，但少有人告诉你答案，你需要利用书里分享的高效方法独辟蹊径，把自己变成职场里高性价比的人才。这是一本教你利用新型社交媒体开发职业潜能的自我管理读本，不管你是新人还是老鸟，都可以通过打造自品牌在职场中脱颖而出。如果不甘平庸，就亮......一起来看看《自品牌》这本书的介绍吧!

码农工具

在线进制转换器

各进制数互转换器

图片转BASE64编码

在线图片转Base64编码工具