内容简介:It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.The data science field is transforming in 2020 at the speed that software engineering changed in 2010.
Collect, Refine, Expand, Learn & Maintain
Mar 5 ·5min read
S killed data scientists share something in common. They can build product solutions… with data.
It is no longer good enough to be a data scientist who can solve math and statistics problems applied to Python, R or Julia programming.
Modern data scientists require a new mindset: design thinking.
The data science field is transforming in 2020 at the speed that software engineering changed in 2010.
Products, frameworks, and programming languages will fade out of popularity; design thinking is always relevant.
Data scientists and students know me for the Data Science Standards¹, a framework I created to launch data science products in businesses.
Here are the 5 Stages of Design Thinking with step-by-step actions and questions to guide you in your data science journey.
Step 1: Data Collection
Y our ability to ask actionable questions to aggregate, browse, and collect data can mean the difference between a successful product and research that is never implemented.
Product success requires thorough data navigation skills and a checklist that focuses on a repeatable process.
Ask yourself these questions when collecting data:
- Where is my data stored?
- How large is the data size?
- What quantity and quality of data will I need to launch this product or service?
- Who manages the data that I need to access?
- When is the data updated?
- Why is this data relevant for my product?
Step 2: Data Refinement
L arge quantities of data are good; high quality data is better. World class Kaggle Grandmasters win competitions and Data Scientists are promoted at work when they invest their time to refine data.
Products managers and software engineers do not take responsibility for data refinement, which requires skilled data scientists to make the difficult decisions on what makes reliable and responsible data.
Start with these questions when refining data:
- Who has insight into data dictionaries for data features?
- What data requires querying, feature engineering, and pre-processing? By what techniques?
- When will the required data be ready in a high quality/high quantity state to move to the next stage of the Data Science Workflow?
- Where will the refined data be stored?
- Why will data need to be refined?
- How will the refined data be tested and validated for consistent performance?
Step 3: Data Expansion
E ven with the best data available for a data scientist, a problem may not be solvable. Frequently, more data can be the difference between a dead-end product or a product that leads the market with unique insights.
Successful products in 2020 require both data refinement and data expansion. Integrations with APIs, similar datasets, and alternative data gives data science teams the confidence to potentially discover important insights from data. Data expansion enables feature enrichment and extends the data science workflow success rate for products.
Apply these questions when expanding data:
- Who controls data access?
- What budget is available to acquire or generate more data?
- When do you stop expanding data or continue to iterate with machine learning?
- Where can you acquire high quality data sources?
- Why are more data features needed to improve your product or solution?
- How will you decide what data is most relevant to expand your data?
Step 4: Data Learning
A nalytics and business intelligence test what data variables may be important; data learning runs models on features to predict insights for a product.
Data Learning considers how compute, storage, and machine learning frameworks can accelerate your workflow.
Ask yourself these questions during the Data Learning stage:
- Who determines what benchmarks are needed for a successful model?
- What machine learning frameworks and algorithms will you choose for what you will predict?
- When do you decide that your modeling results are significant or ready for production?
- Where will you process data learning locally or on what cloud systems?
- Why does your feature request or product need machine learning?
- How much compute time and compute resources are available to model the data?
Step 5: Data Maintenance
Y our machine learning has exceeded benchmarks and you have implemented your solution into production with your data engineer and software engineers.
But now what?
All machine learning and data reduces in quality over time. Skilled data scientists monitor their machine learning to verify results and they maintain quality in production.
Apply these questions to better monitor your data:
- Who is responsible for making changes to data models when performance changes?
- What triggers, pipelines or data jobs do you implement to monitor the quality of your data in production?
- When performance falls below required benchmarks, what data governance processes do you action?
- Where will you commit time in your schedule on a recurring basis to monitor your data pipeline for quality control?
- Why are your data modeling results reducing in quality in production?
- How do you communicate data modeling results to your product managers, data engineers, and software engineers and with what frequency?
In Summary:
For your current and next data science product features, think about all 5 Steps of Design Thinking in your Data Science workflow: (1) Data Cleaning , (2) Data Refinement , (3) Data Expansion , (4) Data Learning , and (5) Data Maintenance .
With Design Thinking applied to your data science workflow, you will be a better data scientist starting today.
Works Cited:
More from David Yakobovitch:
Listen to the HumAIn Podcast | Subscribe to my newsletter
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
数据挖掘中的新方法:支持向量机
邓乃扬、田英杰 / 科学出版社 / 2004-6-10 / 53.00元
支持向量机是数据挖掘中的一个新方法。支持向量机能非常成功地处理回归问题(时间序列分析)和模式识别(分类问题、判别分析)等诸多问题,并可推广于预测和综合评价等领域,因此可应用于理科、工科和管理等多种学科。目前国际上支持向量机在理论研究和实际应用两方面都正处于飞速发展阶段。希望本书能促进它在我国的普及与提高。 本书对象既包括关心理论的研究工作者,也包括关心应用的实际工作者。对于有关领域的具有高等......一起来看看 《数据挖掘中的新方法:支持向量机》 这本书的介绍吧!