Python一步一步进行数据分析

栏目: Python · 发布时间: 5年前

内容简介:你已经决定来学习Python,但是你之前没有编程经验。因此,你常常对从哪儿着手而感到困惑,这么多Python的知识需要去学习。以下这些是那些开始使用Python数据分析的初学者的普遍遇到的问题:需要多久来学习Python?我需要学习Python到什么程度才能来进行数据分析呢?
编辑推荐:
本文来源博客园,本文主要介绍了设置的编程环境,然后学习怎么使用IPython notebook,希望对您的学习有所帮助。

你已经决定来学习Python,但是你之前没有编程经验。因此,你常常对从哪儿着手而感到困惑,这么多 Python 的知识需要去学习。以下这些是那些开始使用Python数据分析的初学者的普遍遇到的问题:

需要多久来学习Python?

我需要学习Python到什么程度才能来进行数据分析呢?

学习Python最好的书或者课程有哪些呢?

为了处理数据集,我应该成为一个Python的编程专家吗?

当开始学习一项新技术时,这些都是可以理解的困惑,这是《在20小时内学会任何东西》的作者所说的。不要害怕,我将会告诉你怎样快速上手,而不必成为一个Python编程“忍者”。

不要犯我之前犯过的错

在开始使用Python之前,我对用Python进行数据分析有一个误解:我必须不得不对Python编程特别精通。因此,我参加了Udacity的Python编程入门课程,完成了code academy上的Python教程,同时阅读了若干本Python编程书籍。就这样持续了3个月(平均每天3个小时),我那会儿通过完成小的软件项目来学习Python。敲代码是快乐的事儿,但是我的目标不是去成为一个Python开发人员,而是要使用Python数据分析。之后,我意识到,我花了很多时间来学习用Python进行软件开发,而不是数据分析。

在几个小时的深思熟虑之后,我发现,我需要学习5个Python库来有效地解决一系列的数据分析问题。然后,我开始一个接一个的学习这些库。

学习途径

从code academy开始学起,完成上面的所有练习。每天投入3个小时,你应该在20天内完成它们。Code academy涵盖了Python基本概念。但是,它不像Udacity那样以项目为导向;没关系,因为你的目标是从事数据科学,而不是使用Python开发软件。

当完成了code academy练习之后,看看这个Ipython notebook:

Python必备教程(在总结部分我已经提供了下载链接)。

它包括了code academy中没有提到的一些概念。你能在1到2小时内学完这个教程。

现在,你知道足够的基础知识来学习Python库了。

Numpy

首先,开始学习Numpy吧,因为它是利用Python科学计算的基础包。对Numpy好的掌握将会帮助你有效地使用其他 工具 例如Pandas。

我已经准备好了IPython笔记,这包含了Numpy的一些基本概念。这个教程包含了Numpy中最频繁使用的操作,例如,N维数组,索引,数组切片,整数索引,数组转换,通用函数,使用数组处理数据,常用的统计方法,等等。

Numpy Basics Tutorial

Index Numpy 遇到Numpy陌生函数,查询用法,推荐!

Pandas

Pandas包含了高级的数据结构和操作工具,它们使得Python数据分析更加快速和容易。

教程包含了series, data frams,从一个axis删除数据,缺失数据处理,等等。

Pandas Basics Tutorial

Index Pandas 遇到陌生函数,查询用法,推荐!

pandas教程-百度经验

Matplotlib

这是一个分为四部分的Matplolib教程。

1st 部分:

第一部分介绍了Matplotlib基本功能,基本figure类型。

Simple Plotting example

In [113]:
 %matplotlib inline 
 import matplotlib.pyplot as plt #importing matplot lib library
 import numpy as np 
 x = range(100) 
 #print x, print and check what is x
 y =[val**2 for val in x] 
 #print y
 plt.plot(x,y) #plotting x and y
 Out[113]:
 [<matplotlib.lines.Line2D at 0x7857bb0>] 

Python一步一步进行数据分析

for ax in axes:
 ax.plot(x, y, 'r')
 ax.set_xlabel('x')
 ax.set_ylabel('y')
 ax.set_title('title')
 
 fig.tight_layout()

Python一步一步进行数据分析

ax.plot(x, x**2, label="y = x**2")
 ax.plot(x, x**3, label="y = x**3")
 ax.legend(loc=2); # upper left corner
 ax.set_xlabel('x')
 ax.set_ylabel('y')
 ax.set_title('title');

Python一步一步进行数据分析

fig, axes = plt.subplots(1, 2, figsize=(10,4))

axes[0].plot(x, x**2, x, np.exp(x))

axes[1].plot(x, x**2, x, np.exp(x))

axes[1].set_yscale("log")

axes[1].set_title("Logarithmic scale (y)");

Python一步一步进行数据分析

n = np.array([0,1,2,3,4,5])

In [47]:

axes[0].scatter(xx, xx + 0.25*np.random.randn(len(xx)))

axes[0].set_title("scatter")

axes[1].step(n, n**2, lw=2)

axes[1].set_title("step")

axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5)

axes[2].set_title("bar")

axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);

axes[3].set_title("fill_between");

Python一步一步进行数据分析

Using Numpy

In [17]:
 x = np.linspace(0, 2*np.pi, 100)
 y =np.sin(x)
 plt.plot(x,y)
 Out[17]:
 [<matplotlib.lines.Line2D at 0x579aef0>] 

Python一步一步进行数据分析

In [24]:
 x= np.linspace(-3,2, 200)
 Y = x ** 2 - 2 * x + 1.
 plt.plot(x,Y)
 Out[24]:
 [<matplotlib.lines.Line2D at 0x6ffb310>] 

Python一步一步进行数据分析

In [32]:

# plotting multiple plots

x =np.linspace(0, 2 * np.pi, 100)

y = np.sin(x)

z = np.cos(x)

plt.plot(x,y)

plt.plot(x,z)

# Matplot lib picks different colors for different plot.

Python一步一步进行数据分析

In [35]:
 cd C:\Users\tk\Desktop\Matplot
 
 C:\Users\tk\Desktop\Matplot
 In [39]:
 data = np.loadtxt('numpy.txt')
 plt.plot(data[:,0], data[:,1]) # plotting column 1 vs column 2
 # The text in the numpy.txt should look like this
 # 0 0
 # 1 1
 # 2 4
 # 4 16
 # 5 25
 # 6 36
 Out[39]:
 [<matplotlib.lines.Line2D at 0x740f090>] 

Python一步一步进行数据分析

In [56]:

data1 = np.loadtxt('scipy.txt') # load the file

for val in data1.T: #loop over each and every value in data1.T

plt.plot(data1[:,0], val) #data1[:,0] is the first row in data1.T

# data in scipy.txt looks like this:

# 0 0 6

# 1 1 5

# 2 4 4

# 4 16 3

# 5 25 2

# 6 36 1

[[ 0. 1. 2. 4. 5. 6.]

[ 0. 1. 4. 16. 25. 36.]

[ 6. 5. 4. 3. 2. 1.]]

Python一步一步进行数据分析

Scatter Plots and Bar Graphs

In [64]:
 sct = np.random.rand(20, 2)
 print sct
 plt.scatter(sct[:,0], sct[:,1]) # I am plotting a scatter plot.
 
 [[ 0.51454542 0.61859101]
 [ 0.45115993 0.69774873]
 [ 0.29051205 0.28594808]
 [ 0.73240446 0.41905186]
 [ 0.23869394 0.5238878 ]
 [ 0.38422814 0.31108919]
 [ 0.52218967 0.56526379]
 [ 0.60760426 0.80247073]
 [ 0.37239096 0.51279078]
 [ 0.45864677 0.28952167]
 [ 0.8325996 0.28479446]
 [ 0.14609382 0.8275477 ]
 [ 0.86338279 0.87428696]
 [ 0.55481585 0.24481165]
 [ 0.99553336 0.79511137]
 [ 0.55025277 0.67267026]
 [ 0.39052024 0.65924857]
 [ 0.66868207 0.25186664]
 [ 0.64066313 0.74589812]
 [ 0.20587731 0.64977807]]
 Out[64]:
 <matplotlib.collections.PathCollection at 0x78a7110> 

Python一步一步进行数据分析

In [65]:
 ghj =[5, 10 ,15, 20, 25]
 it =[ 1, 2, 3, 4, 5]
 plt.bar(ghj, it) # simple bar graph
 Out[65]:
 <Container object of 5 artists> 

Python一步一步进行数据分析

In [74]:
 ghj =[5, 10 ,15, 20, 25]
 it =[ 1, 2, 3, 4, 5]
 plt.bar(ghj, it, width =5)# you can change the thickness of a bar, by default the bar will have a thickness of 0.8 units
 Out[74]:
 <Container object of 5 artists> 

Python一步一步进行数据分析

In [75]:
 ghj =[5, 10 ,15, 20, 25]
 it =[ 1, 2, 3, 4, 5]
 plt.barh(ghj, it) # barh is a horizontal bar graph
 Out[75]:
 <Container object of 5 artists>

Python一步一步进行数据分析

In [95]:

new_list = [[5., 25., 50., 20.], [4., 23., 51., 17.], [6., 22., 52., 19.]]

x = np.arange(4)

plt.bar(x + 0.00, new_list[0], color ='b', width =0.25)

plt.bar(x + 0.25, new_list[1], color ='r', width =0.25)

#plt.show()

Python一步一步进行数据分析

In [100]:
 #Stacked Bar charts
 p = [5., 30., 45., 22.]
 q = [5., 25., 50., 20.]
 x =range(4)
 plt.bar(x, p, color ='b')
 plt.bar(x, q, color ='y', bottom =p) 
 Out[100]:
 <Container object of 4 artists> 

Python一步一步进行数据分析

In [35]:
 # plotting more than 2 values
 A = np.array([5., 30., 45., 22.])
 B = np.array([5., 25., 50., 20.])
 C = np.array([1., 2., 1., 1.])
 X = np.arange(4)
 plt.bar(X, A, color = 'b')
 plt.bar(X, B, color = 'g', bottom = A)
 plt.bar(X, C, color = 'r', bottom = A + B) # for the third argument, I use A+B
 plt.show() 

Python一步一步进行数据分析

In [94]:
 black_money = np.array([5., 30., 45., 22.]) 
 white_money = np.array([5., 25., 50., 20.])
 z = np.arange(4)
 plt.barh(z, black_money, color ='g')
 plt.barh(z, -white_money, color ='r')# - notation is needed for generating, back to back charts
 Out[94]:
 <Container object of 4 artists> 

Python一步一步进行数据分析

Other Plots

In [114]:
 #Pie charts
 y = [5, 25, 45, 65]
 plt.pie(y)
 Out[114]:
 ([<matplotlib.patches.Wedge at 0x7a19d50>,
 <matplotlib.patches.Wedge at 0x7a252b0>,
 <matplotlib.patches.Wedge at 0x7a257b0>,
 <matplotlib.patches.Wedge at 0x7a25cb0>],
 [<matplotlib.text.Text at 0x7a25070>,
 <matplotlib.text.Text at 0x7a25550>,
 <matplotlib.text.Text at 0x7a25a50>,
 <matplotlib.text.Text at 0x7a25f50>]) 

Python一步一步进行数据分析

In [115]:
 #Histograms
 d = np.random.randn(100)
 plt.hist(d, bins = 20)
 Out[115]:
 (array([ 2., 3., 2., 1., 2., 6., 5., 7., 10., 12., 9.,
 12., 11., 5., 6., 4., 1., 0., 1., 1.]),
 array([-2.9389701 , -2.64475645, -2.35054281, -2.05632916, -1.76211551,
 -1.46790186, -1.17368821, -0.87947456, -0.58526092, -0.29104727,
 0.00316638, 0.29738003, 0.59159368, 0.88580733, 1.18002097,
 1.47423462, 1.76844827, 2.06266192, 2.35687557, 2.65108921,
 2.94530286]),
 <a list of 20 Patch objects>) 

Python一步一步进行数据分析

In [116]:
 d = np.random.randn(100)
 plt.boxplot(d)
 #1) The red bar is the median of the distribution
 #2) The blue box includes 50 percent of the data from the lower quartile to the upper quartile. 
 # Thus, the box is centered on the median of the data.
 Out[116]:
 {'boxes': [<matplotlib.lines.Line2D at 0x7cca090>],
 'caps': [<matplotlib.lines.Line2D at 0x7c02d70>,
 <matplotlib.lines.Line2D at 0x7cc2c90>],
 'fliers': [<matplotlib.lines.Line2D at 0x7cca850>,
 <matplotlib.lines.Line2D at 0x7ccae10>],
 'medians': [<matplotlib.lines.Line2D at 0x7cca470>],
 'whiskers': [<matplotlib.lines.Line2D at 0x7c02730>,
 <matplotlib.lines.Line2D at 0x7cc24b0>]} 

Python一步一步进行数据分析

In [118]:
 d = np.random.randn(100, 5) # generating multiple box plots
 plt.boxplot(d)
 Out[118]:
 {'boxes': [<matplotlib.lines.Line2D at 0x7f49d70>,
 <matplotlib.lines.Line2D at 0x7ea1c90>,
 <matplotlib.lines.Line2D at 0x7eafb90>,
 <matplotlib.lines.Line2D at 0x7ebea90>,
 <matplotlib.lines.Line2D at 0x7ece990>],
 'caps': [<matplotlib.lines.Line2D at 0x7f2b3b0>,
 <matplotlib.lines.Line2D at 0x7f49990>,
 <matplotlib.lines.Line2D at 0x7ea14d0>,
 <matplotlib.lines.Line2D at 0x7ea18b0>,
 <matplotlib.lines.Line2D at 0x7eaf3d0>,
 <matplotlib.lines.Line2D at 0x7eaf7b0>,
 <matplotlib.lines.Line2D at 0x7ebe2d0>,
 <matplotlib.lines.Line2D at 0x7ebe6b0>,
 <matplotlib.lines.Line2D at 0x7ece1d0>,
 <matplotlib.lines.Line2D at 0x7ece5b0>],
 'fliers': [<matplotlib.lines.Line2D at 0x7e98550>,
 <matplotlib.lines.Line2D at 0x7e98930>,
 <matplotlib.lines.Line2D at 0x7ea8470>,
 <matplotlib.lines.Line2D at 0x7ea8a10>,
 <matplotlib.lines.Line2D at 0x7eb6370>,
 <matplotlib.lines.Line2D at 0x7eb6730>,
 <matplotlib.lines.Line2D at 0x7ec6270>,
 <matplotlib.lines.Line2D at 0x7ec6810>,
 <matplotlib.lines.Line2D at 0x8030170>,
 <matplotlib.lines.Line2D at 0x8030710>],
 'medians': [<matplotlib.lines.Line2D at 0x7e98170>,
 <matplotlib.lines.Line2D at 0x7ea8090>,
 <matplotlib.lines.Line2D at 0x7eaff70>,
 <matplotlib.lines.Line2D at 0x7ebee70>,
 <matplotlib.lines.Line2D at 0x7eced70>],
 'whiskers': [<matplotlib.lines.Line2D at 0x7f2bb50>,
 <matplotlib.lines.Line2D at 0x7f491b0>,
 <matplotlib.lines.Line2D at 0x7e98cf0>,
 <matplotlib.lines.Line2D at 0x7ea10f0>,
 <matplotlib.lines.Line2D at 0x7ea8bf0>,
 <matplotlib.lines.Line2D at 0x7ea8fd0>,
 <matplotlib.lines.Line2D at 0x7eb6cd0>,
 <matplotlib.lines.Line2D at 0x7eb6ed0>,
 <matplotlib.lines.Line2D at 0x7ec6bd0>,
 <matplotlib.lines.Line2D at 0x7ec6dd0>]} 

Python一步一步进行数据分析

2nd 部分:

%matplotlib inline

import numpy as np

import matplotlib.pyplot as plt

In [22]:

p =np.random.standard_normal((50,2))

q =np.random.standard_normal((50,2))

q += np.array((1,1)) #center the distribution at (-1,1)

plt.scatter(p[:,0], p[:,1], color ='.25')

plt.scatter(q[:,0], q[:,1], color = '.75')

Out[22]:

<matplotlib.collections.PathCollection at 0x71dab90>

Python一步一步进行数据分析

In [34]:
 dd =np.random.standard_normal((50,2))
 plt.scatter(dd[:,0], dd[:,1], color ='1.0', edgecolor ='0.0') # edge color controls the color of the edge
 Out[34]:
 <matplotlib.collections.PathCollection at 0x7336670> 

Python一步一步进行数据分析

Custom Color for Bar charts,Pie charts and box plots:

In [9]:
 vals = np.random.random_integers(99, size =50)
 color_set = ['.00', '.25', '.50','.75']
 color_lists = [color_set[(len(color_set)* val) // 100] for val in vals]
 c = plt.bar(np.arange(50), vals, color = color_lists)

Python一步一步进行数据分析

In [8]:
 hi =np.random.random_integers(8, size =10)
 color_set =['.00', '.25', '.50', '.75']
 plt.pie(hi, colors = color_set)# colors attribute accepts a range of values
 plt.show()
 #If there are less colors than values, then pyplot.pie() will simply cycle through the color list. In the preceding 
 #example, we gave a list of four colors to color a pie chart that consisted of eight values. Thus, each color will be used twice 

Python一步一步进行数据分析

In [27]:
 values = np.random.randn(100)
 w = plt.boxplot(values)
 for att, lines in w.iteritems():
 for l in lines:
 l.set_color('k') 

Python一步一步进行数据分析

Color Maps

In [34]:
 # how to color scatter plots
 #Colormaps are defined in the matplotib.cm module. This module provides 
 #functions to create and use colormaps. It also provides an exhaustive choice of predefined color maps.
 import matplotlib.cm as cm
 N = 256
 angle = np.linspace(0, 8 * 2 * np.pi, N)
 radius = np.linspace(.5, 1., N)
 X = radius * np.cos(angle)
 Y = radius * np.sin(angle)
 plt.scatter(X,Y, c=angle, cmap = cm.hsv)
 Out[34]:
 <matplotlib.collections.PathCollection at 0x714d9f0>

Python一步一步进行数据分析

In [44]:
 #Color in bar graphs
 import matplotlib.cm as cm
 vals = np.random.random_integers(99, size =50)
 cmap = cm.ScalarMappable(col.Normalize(0,99), cm.binary)
 plt.bar(np.arange(len(vals)),vals, color =cmap.to_rgba(vals))
 Out[44]:
 <Container object of 50 artists> 

Python一步一步进行数据分析

Line Styles

In [4]:

def pq(I, mu, sigma):

a = 1. / (sigma * np.sqrt(2. * np.pi))

b = -1. / (2. * sigma ** 2)

return a * np.exp(b * (I - mu) ** 2)

I =np.linspace(-6,6, 1024)

plt.plot(I, pq(I, 0., 1.), color = 'k', linestyle ='solid')

plt.plot(I, pq(I, 0., .5), color = 'k', linestyle ='dashed')

plt.plot(I, pq(I, 0., .25), color = 'k', linestyle ='dashdot')

Out[4]:

[<matplotlib.lines.Line2D at 0x562ffb0>]

Python一步一步进行数据分析

In [12]:
 N = 15
 A = np.random.random(N)
 B= np.random.random(N)
 X = np.arange(N)
 plt.bar(X, A, color ='.75')
 plt.bar(X, A+B , bottom = A, color ='W', linestyle ='dashed') # plot a bar graph
 plt.show() 

Python一步一步进行数据分析

In [20]:

def gf(X, mu, sigma):

a = 1. / (sigma * np.sqrt(2. * np.pi))

b = -1. / (2. * sigma ** 2)

X = np.linspace(-6, 6, 1024)

for i in range(64):

samples = np.random.standard_normal(50)

mu,sigma = np.mean(samples), np.std(samples)

plt.plot(X, gf(X, mu, sigma), color = '.75', linewidth = .5)

plt.plot(X, gf(X, 0., 1.), color ='.00', linewidth = 3.)

Out[20]:

[<matplotlib.lines.Line2D at 0x59fbab0>]

Python一步一步进行数据分析

Fill surfaces with pattern

In [27]:

N = 15

A = np.random.random(N)

B= np.random.random(N)

X = np.arange(N)

plt.bar(X, A, color ='w', hatch ='x')

# some other hatch attributes are :

#/

#\

#|

#-

#+

#x

#o

#O

#.

#*

Out[27]:

<Container object of 15 artists>

Python一步一步进行数据分析

Marker styles

In [29]:

cd C:\Users\tk\Desktop\Matplot

C:\Users\tk\Desktop\Matplot

Python一步一步进行数据分析

In [14]:

X= np.linspace(-6,6,1024)

Yb = np.sinc(X) +1

plt.plot(X, Ya, marker ='o', color ='.75')

plt.plot(X, Yb, marker ='^', color='.00', markevery= 32)# this one marks every 32 nd element

Out[14]:

[<matplotlib.lines.Line2D at 0x7063150>]

Python一步一步进行数据分析

Own Marker Shapes- come back to this later

In [31]:

# Marker Size

A = np.random.standard_normal((50,2))

B = np.random.standard_normal((50,2))

B += np.array((1, 1))

plt.scatter(A[:,0], A[:,1], color ='k', s =25.0)

plt.scatter(B[:,0], B[:,1], color ='g', s = 100.0) # size of the marker is specified using 's' attribute

Out[31]:

<matplotlib.collections.PathCollection at 0x7d015f0>

Python一步一步进行数据分析

In [20]:
 import matplotlib as mpl
 mpl.rc('lines', linewidth =3)
 mpl.rc('xtick', color ='w') # color of x axis numbers
 mpl.rc('ytick', color = 'w') # color of y axis numbers
 mpl.rc('axes', facecolor ='g', edgecolor ='y') # color of axes 
 mpl.rc('figure', facecolor ='.00',edgecolor ='w') # color of figure
 mpl.rc('axes', color_cycle = ('y','r')) # color of plots
 x = np.linspace(0, 7, 1024)
 plt.plot(x, np.sin(x))
 plt.plot(x, np.cos(x))
 Out[20]:
 [<matplotlib.lines.Line2D at 0x7b0fb70>] 

Python一步一步进行数据分析

3rd 部分:

图的注释--包含若干图,控制坐标轴范围,长款比和坐标轴。

Annotation

In [1]:
 %matplotlib inline
 import numpy as np
 import matplotlib.pyplot as plt
 In [28]:
 X =np.linspace(-6,6, 1024)
 Y =np.sinc(X)
 plt.title('A simple marker exercise')# a title notation
 plt.xlabel('array variables') # adding xlabel
 plt.ylabel(' random variables') # adding ylabel
 plt.text(-5, 0.4, 'Matplotlib') # -5 is the x value and 0.4 is y value
 plt.plot(X,Y, color ='r', marker ='o', markersize =9, markevery = 30, markerfacecolor='w', linewidth = 3.0, markeredgecolor = 'b')
 Out[28]:
 [<matplotlib.lines.Line2D at 0x84b6430>] 

Python一步一步进行数据分析

In [39]:

def pq(I, mu, sigma):

a = 1. / (sigma * np.sqrt(2. * np.pi))

b = -1. / (2. * sigma ** 2)

I =np.linspace(-6,6, 1024)

plt.plot(I, pq(I, 0., 1.), color = 'k', linestyle ='solid')

plt.plot(I, pq(I, 0., .5), color = 'k', linestyle ='dashed')

plt.plot(I, pq(I, 0., .25), color = 'k', linestyle ='dashdot')

# I have created a dictinary of styles

design = {

'facecolor' : 'y', # color used for the text box

'edgecolor' : 'g',

'boxstyle' : 'round'

}

plt.text(-4, 1.5, 'Matplot Lib', bbox = design)

plt.plot(X, Y, c='k')

plt.show()

#This sets the style of the box, which can either be 'round' or 'square'

#'pad': If 'boxstyle' is set to 'square', it defines the amount of padding between the text and the box's sides

Python一步一步进行数据分析

Alignment Control

The vertical alignment options are as follows:

'center': This is relative to the center of the textbox

'top': This is relative to the upper side of the textbox

'bottom': This is relative to the lower side of the textbox

'baseline': This is relative to the text's baseline

Horizontal alignment options are as follows:

align ='bottom' align ='baseline'

------------------------align = center--------------------------------------

align= 'top

In [41]:

cd C:\Users\tk\Desktop

C:\Users\tk\Desktop

In [44]:

from IPython.display import Image

Image(filename='text alignment.png')

#The horizontal alignment options are as follows:

#'center': This is relative to the center of the textbox

#'left': This is relative to the left side of the textbox

#'right': This is relative to the right-hand side of the textbox

Out[44]:

Python一步一步进行数据分析

In [76]:

X = np.linspace(-4, 4, 1024)

plt.annotate('Big Data',

ha ='center', va ='bottom',

xytext =(-1.5, 3.0), xy =(0.75, -2.7),

arrowprops ={'facecolor': 'green', 'shrink':0.05, 'edgecolor': 'black'}) #arrow properties

plt.plot(X, Y)

Out[76]:

[<matplotlib.lines.Line2D at 0x9d1def0>]

Python一步一步进行数据分析

In [74]:

from IPython.display import Image

Image(filename='arrows.png')

Out[74]:

Python一步一步进行数据分析

Legend properties:

'loc': This is the location of the legend. The default value is 'best', which will place it automatically. Other valid values are

'shadow': This can be either True or False, and it renders the legend with a shadow effect.

'fancybox': This can be either True or False and renders the legend with a rounded box.

'title': This renders the legend with the title passed as a parameter.

'ncol': This forces the passed value to be the number of columns for the legend

In [101]:

x =np.linspace(0, 6,1024)

y1 =np.sin(x)

y2 =np.cos(x)

plt.xlabel('Sin Wave')

plt.ylabel('Cos Wave')

plt.plot(x, y1, c='b', lw =3.0, label ='Sin(x)') # labels are specified

plt.plot(x, y2, c ='r', lw =3.0, ls ='--', label ='Cos(x)')

plt.legend(loc ='best', shadow = True, fancybox = False, title ='Waves', ncol =1) # displays the labels

plt.grid(True, lw = 2, ls ='--', c='.75') # adds grid lines to the figure

plt.show()

Python一步一步进行数据分析

Shapes

In [4]:

#Paths for several kinds of shapes are available in the matplotlib.patches module

dis = patches.Circle((0,0), radius = 1.0, color ='.75' )

plt.gca().add_patch(dis) # used to render the image.

dis = patches.Rectangle((2.5, -.5), 2.0, 1.0, color ='.75') #patches.rectangle((x & y coordinates), length, breadth)

plt.gca().add_patch(dis)

dis = patches.Ellipse((0, -2.0), 2.0, 1.0, angle =45, color ='.00')

plt.gca().add_patch(dis)

dis = patches.FancyBboxPatch((2.5, -2.5), 2.0, 1.0, boxstyle ='roundtooth', color ='g')

plt.gca().add_patch(dis)

plt.grid(True)

plt.axis('scaled') # displays the images within the prescribed axis

plt.show()

#FancyBox: This is like a rectangle but takes an additional boxstyle parameter

#(either 'larrow', 'rarrow', 'round', 'round4', 'roundtooth', 'sawtooth', or 'square')

Python一步一步进行数据分析

In [22]:

import matplotlib.patches as patches

theta = np.linspace(0, 2 * np.pi, 8) # generates an array

vertical = np.vstack((np.cos(theta), np.sin(theta))).transpose() # vertical stack clubs the two arrays.

#print vertical, print and see how the array looks

plt.gca().add_patch(patches.Polygon(vertical, color ='y'))

plt.axis('scaled')

plt.grid(True)

#The matplotlib.patches.Polygon()constructor takes a list of coordinates as the inputs, that is, the vertices of the polygon

Python一步一步进行数据分析

In [34]:
 # a polygon can be imbided into a circle
 theta = np.linspace(0, 2 * np.pi, 6) # generates an array
 vertical = np.vstack((np.cos(theta), np.sin(theta))).transpose() # vertical stack clubs the two arrays. 
 #print vertical, print and see how the array looks
 plt.gca().add_patch(plt.Circle((0,0), radius =1.0, color ='b'))
 plt.gca().add_patch(plt.Polygon(vertical, fill =None, lw =4.0, ls ='dashed', edgecolor ='w'))
 plt.axis('scaled')
 plt.grid(True)
 plt.show() 

Python一步一步进行数据分析

In [54]:
 #In matplotlib, ticks are small marks on both the axes of a figure
 import matplotlib.ticker as ticker
 X = np.linspace(-12, 12, 1024)
 Y = .25 * (X + 4.) * (X + 1.) * (X - 2.)
 pl =plt.axes() #the object that manages the axes of a figure
 pl.xaxis.set_major_locator(ticker.MultipleLocator(5))
 pl.xaxis.set_minor_locator(ticker.MultipleLocator(1))
 plt.plot(X, Y, c = 'y')
 plt.grid(True, which ='major') # which can take three values: minor, major and both
 plt.show() 

Python一步一步进行数据分析

In [59]:

name_list = ('Omar', 'Serguey', 'Max', 'Zhou', '

Abidin')

value_list = np.random.randint(0, 99, size =

len(name_list))

pos_list = np.arange(len(name_list))

ax = plt.axes()

ax.xaxis.set_major_locator(ticker.FixedLocator

((pos_list)))

ax.xaxis.set_major_formatter(ticker.FixedFormatter

((name_list)))

plt.bar(pos_list, value_list, color = '.75',align =

'center')

plt.show()

Python一步一步进行数据分析

4th 部分:

包含了一些复杂图形。

Working with figures

In [4]:
 %matplotlib inline
 import numpy as np
 import matplotlib.pyplot as plt
 In [5]:
 T = np.linspace(-np.pi, np.pi, 1024) #
 fig, (ax0, ax1) = plt.subplots(ncols =2)
 ax0.plot(np.sin(2 * T), np.cos(0.5 * T), c = 'k')
 ax1.plot(np.cos(3 * T), np.sin(T), c = 'k')
 plt.show() 

Python一步一步进行数据分析

Setting aspect ratio

In [7]:
 T = np.linspace(0, 2 * np.pi, 1024)
 plt.plot(2. * np.cos(T), np.sin(T), c = 'k', lw = 3.)
 plt.axes().set_aspect('equal') # remove this line of code and see how the figure looks
 plt.show() 

Python一步一步进行数据分析

In [12]:
 X = np.linspace(-6, 6, 1024)
 Y1, Y2 = np.sinc(X), np.cos(X)
 plt.figure(figsize=(10.24, 2.56)) #sets size of the figure
 plt.plot(X, Y1, c='r', lw = 3.)
 plt.plot(X, Y2, c='.75', lw = 3.)
 plt.show() 

Python一步一步进行数据分析

In [8]:
 X = np.linspace(-6, 6, 1024)
 plt.ylim(-.5, 1.5)
 plt.plot(X, np.sinc(X), c = 'k')
 plt.show() 

Python一步一步进行数据分析

In [16]:
 X = np.linspace(-6, 6, 1024)
 Y = np.sinc(X)
 X_sub = np.linspace(-3, 3, 1024)#coordinates of subplot
 Y_sub = np.sinc(X_sub) # coordinates of sub plot
 plt.plot(X, Y, c = 'b') 
 sub_axes = plt.axes([.6, .6, .25, .25])# coordinates, length and width of the subplot frame
 sub_axes.plot(X_detail, Y_detail, c = 'r')
 plt.show() 

Python一步一步进行数据分析

Log Scale

In [20]:

X = np.linspace(1, 10, 1024)

plt.yscale('log') # set y scale as log. we would use plot.xscale()

plt.plot(X, X, c = 'k', lw = 2., label = r'$f(x)=x$')

plt.plot(X, 10 ** X, c = '.75', ls = '--', lw = 2., label = r'$f(x)=e^x$')

plt.plot(X, np.log(X), c = '.75', lw = 2., label = r'$f(x)=\log(x)$')

plt.legend()

#The logarithm base is 10 by default, but it can be changed with the optional parameters basex and basey.

Python一步一步进行数据分析

Polar Coordinates

In [23]:
 T = np.linspace(0 , 2 * np.pi, 1024)
 plt.axes(polar = True) # show polar coordinates
 plt.plot(T, 1. + .25 * np.sin(16 * T), c= 'k')
 plt.show() 

Python一步一步进行数据分析

In [25]:
 import matplotlib.patches as patches # import patch module from matplotlib
 ax = plt.axes(polar = True)
 theta = np.linspace(0, 2 * np.pi, 8, endpoint = False)
 radius = .25 + .75 * np.random.random(size = len(theta))
 points = np.vstack((theta, radius)).transpose()
 plt.gca().add_patch(patches.Polygon(points, color = '.75'))
 plt.show() 

Python一步一步进行数据分析

In [2]:
 x = np.linspace(-6,6,1024)
 y= np.sin(x)
 plt.plot(x,y)
 plt.savefig('bigdata.png', c= 'y', transparent = True) #savefig function writes that data to a file
 # will create a file named bigdata.png. Its resolution will be 800 x 600 pixels, in 8-bit colors (24-bits per pixel) 

Python一步一步进行数据分析

In [3]:
 theta =np.linspace(0, 2 *np.pi, 8)
 points =np.vstack((np.cos(theta), np.sin(theta))).T
 plt.figure(figsize =(6.0, 6.0))
 plt.gca().add_patch(plt.Polygon(points, color ='r'))
 plt.axis('scaled')
 plt.grid(True)
 plt.savefig('pl.png', dpi =300) # try 'pl.pdf', pl.svg'
 #dpi is dots per inch. 300*8 x 6*300 = 2400 x 1800 pixels 

Python一步一步进行数据分析

总结

你学习Python时能犯的最简单的错误之一就是同时去尝试学习过多的库。当你努力一下子学会每样东西时,你会花费很多时间来切换这些不同概念之间,变得沮丧,最后转移到其他事情上。

所以,坚持关注这个过程:

1.理解Python基础

2.学习Numpy

3.学习Pandas

4.学习Matplolib


以上所述就是小编给大家介绍的《Python一步一步进行数据分析》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Blockchain Basics

Blockchain Basics

Daniel Drescher / Apress / 2017-3-16 / USD 20.99

In 25 concise steps, you will learn the basics of blockchain technology. No mathematical formulas, program code, or computer science jargon are used. No previous knowledge in computer science, mathema......一起来看看 《Blockchain Basics》 这本书的介绍吧!

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具