Python in Computationally-Intensive Areas: Machine Learning

栏目: IT技术 · 发布时间: 4年前

Python in Computationally-Intensive Areas: Machine Learning

Even the most ingenious learning algorithm will not suffice if it never completes.

Jun 19 ·5min read

Machine learning tends to be a computationally-intensive task for many practical use cases. It is vital that your learning algorithm performs well, or at the very least, completes.

Do not get me wrong, there are many practical algorithms and ideas that arise from Computational Learning Theory .

In fact, I have written about a few of them: Defining Goodness in Machine Learning Algorithms and What To Do If Learning Fails .

If performance is key in practical machine learning use cases, why is Python one of the most commonly used language in data science?

Introduction

Let us set the scene before we dive into answering this question. Back when I was a student in my introductory computer science course, the primary language we learned was Java (a compiled language). A year later, the same course was now teaching Python (an interpreted language) as its primary language. Why the switch?

Since Python is not the fastest for every problem, my hypothesis is that Python is just easier to learn and use. This hypothesis can be extended into data science where people of diverse levels of engineering backgrounds are creating great machine learning models, often by means of the back and forth process of experimenting, prototyping and running experiments .

Still, performance is important. So let us dive into how we can use Python in a computationally-intensive area like machine learning.

Global Temperature Prediction using Least Squares Polynomial Fitwith NumPy, SciPy and MatplotLib

Let us step through the creation of a simple Global Temperature Predictor, where we will stop along the way to discuss how Python’s libraries are key in assisting us in machine learning.

To start off, we will be using Numpy.

NumPyis an extension package to Python for multi-dimensional arrays. It is designed for scientific computation and is a memory-efficient container that provides fast numerical operations . Since Numpy is mostly written in C (a very fast compiled language), it is able to off-load its computationally-intensive tasks to its lower-layer.

Here is a quick comparison of loop performance,

Python in Computationally-Intensive Areas: Machine Learning — Python Machine Learning Colab Notebook

Python : 1000 loops, best of 3: 237 µs per loop. NumPy : 1000000 loops, best of 3: 1.22 µs per loop.

Aside from the potentially increased performance, Numpy has a plethora of useful tools. Here are some of my favorite:

numpy.reshape :Gives a new shape to an array without changing its data.

numpy.copy :Performs true copy.

numpy.flatten:Flattens our array.

numpy.empty:Does not set the array values to zero, and may therefore be marginally faster.

numpy.ma:Deals with (propagation of) missing data.

numpy.genfromtxt:Deals with (propagation of) missing data for text files.

numpy.linspace:Evenly spaces numbers over a specified interval.

numpy.clip:Trims outliers.

We will begin by using sample data before we get to the real data. Here we are generating temperature data as a function of month of the year.

We can use Matplotlib, a Python library, to easily visualize our data.

We can then use SciPy to fit our data to a periodic function using the optimize library. No need to reinvent the wheel here.

scipy :A scientific toolkit for Linear algebra, Interpolation, Optimization and fit, Statistics and random numbers, Numerical integration, Fast Fourier transforms, Signal processing, and Image manipulation.

Let us extend the idea for our global temperature model on real data. Numpy can easily load in our real data from the NASA GLOBAL Land-Ocean Temperature Index in 0.01 degrees Celsius base period: 1951–1980 . Some of this data has NaN values, but Numpy can handle this without our assistance.

We can plot a heat map with Matplotlib to get intuition about the trends in our data.

We can flatten our data with Numpy so that we can have an easy data set to work with. Next, we split data into train and test, but we want to preserve the order here so we can do predictions on the “future”. Error will be calculated as the squared distance of the model’s prediction to the real data .

Finally, we can perform training and plot our graph. Here we are training with SciPy least squares polynomial fit, where the outcome is a polynomial that minimizes the sum of the squared distance of the model’s prediction to the real data. Its coefficients are the unique model that can perform our predictions. In the below graph, the higher degree polynomial is performing the best on the test set.

Conclusion

What Python lacks in performance, it makes up for in ease of use with its robust libraries. In addition, these libraries often improve Python performance in many use cases.

Please see the linked Colab Notebook for the associated Python source code.

References

http://scipy-lectures.org

Building Machine Learning Systems with Python by Willi Richert and Luis Pedro Coelho

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Python in Computationally-Intensive Areas: Machine Learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Building Web Reputation Systems

Randy Farmer、Bryce Glass / Yahoo Press / 2010 / GBP 31.99

What do Amazon's product reviews, eBay's feedback score system, Slashdot's Karma System, and Xbox Live's Achievements have in common? They're all examples of successful reputation systems that enable ......一起来看看《Building Web Reputation Systems》这本书的介绍吧!

码农工具