介绍
SQL是用于访问和处理数据库的标准的计算机语言。
常用于数据库管理系统(RDBMS)中,
这类数据库包括 MySQL 、 SQL Server 、 Oracle 等。
Pandas是一个开源的,为 Python 提供高性能的,数据结构以及数据分析工具。
在熟练地使用 SQL 的同时,为满足一些的业务需求,常常需要我们将数据提取后,再对数据进行统计分析,那应该如何使用 Pandas 达到和 SQL 一样的效果呢?
下面的速查表将会逐一使用 Pandas 对常见的 SQL 语句进行映射。
本篇内容
本篇将解构下面的 SQL 查询句式, 使用 Pandas 进行实现
SQL 查询句式
SELECT DISTINCT [字段]
FROM [表] JOIN [bin] ON [连接条件]
WHERE [过滤条件]
GROUP BY [字段]
HAVING [条件]
ORDER BY [字段] DESC
LIMIT [个数] OFFSET [个数]
读取测试数据
import pandas as pd
import pymysql
conn = pymysql.connect(host='127.0.0.1',
user='root',
password='12345678',
db='test_db')
df = pd.read_sql(sql="select * from student", con=conn)
数据预览
df
| id | name | age | sex | city | money | |
|---|---|---|---|---|---|---|
| 0 | 1 | 赵雷 | 1990-01-01 | 男 | 北京 | 20.0 |
| 1 | 2 | 钱电 | 1990-12-21 | 男 | 天津 | 30.0 |
| 2 | 3 | 孙风 | 1990-12-20 | 男 | 成都 | 2.0 |
| 3 | 4 | 李云 | 1990-12-06 | 男 | 北京 | 100.0 |
| 4 | 5 | 周梅 | 1991-12-01 | 女 | 成都 | 50.0 |
| 5 | 6 | 吴兰 | 1992-01-01 | 女 | 北京 | 3.0 |
| 6 | 7 | 郑竹 | 1989-01-01 | 女 | 成都 | 200.0 |
| 7 | 8 | 张三 | 2017-12-20 | 女 | 天津 | 20.0 |
| 8 | 9 | 李四 | 2017-12-25 | 女 | 西安 | 35.0 |
| 9 | 10 | 李四 | 2012-06-06 | 女 | 北京 | 40.0 |
| 10 | 11 | 赵六 | 2013-06-13 | 女 | 成都 | 5.0 |
| 11 | 12 | 孙七 | 2014-06-01 | 女 | 天津 | 210.0 |
SELECT
SQL
SELECT * FROM student
SELECT id, name, sex FROM student
Pandas
df
df[['id','name','sex']]
DISTINCT
SQL
SELECT DISTINCT name FROM student
Pandas
df['name'].unique()
COUNT & SUM & MAX & MIN & AVG
SQL
SELECT COUNT(*) FROM student
SELECT SUM(money) FROM student
SELECT id, MAX(money) FROM student
SELECT id, MIN(money) FROM student
SELECT AVG(money) FROM student
Pandas
df['id'].count()
df['money'].sum()
df[df['money'] == df['money'].max()]
df[df['money'] == df['money'].min()]
df['money'].mean()
描述性统计数据
In [1]: df['money'].describe()
Out[1]: count 12.000000
mean 59.583333
std 72.963825
min 2.000000
25% 16.250000
50% 32.500000
75% 62.500000
max 210.000000
Name: money, dtype: float64
WHERE
例子: =
SQL
SELECT * FROM student WHERE sex = '男'
Pandas
df[df['sex'] == ('男')]
例子: in & not in
SQL
SELECT * FROM student WHERE id IN (2,4,6,8,10)
SELECT * FROM student WHERE id NOT IN (2,4,6,8,10)
Pandas
df[df['id'].isin((2,4,6,8))]
df[~df['id'].isin((2,4,6,8))]
多个条件
SQL
SELECT * FROM student WHERE sex = '男' and id IN (2,4,6,8,10)
Pandas
df[(df['sex'] == ('男')) & (df['id'].isin((2,4,6,8)))]
LIMIT OFFSET
SQL
SELECT * FROM student ORDER BY id DESC LIMIT 3
SELECT * FROM student ORDER BY id DESC LIMIT 3 OFFSET 2
Pandas
df.sort_values('id',ascending=False).head(3)
df.nlargest(2 + 3, 'id').tail(3)
SELECT & WHERE & LIMIT
SQL
SELECT * FROM student WHERE sex = '男' LIMIT 3
SELECT id, name, sex FROM student WHERE sex ='男' LIMIT 3
Pandas
df[df['sex'] == ('男')].head(3)
df[df['sex'] == ('男')][['id','name','sex']].head(3)
ORDER BY
SQL
SELECT * FROM student ORDER BY age
SELECT * FROM student ORDER BY age DESC
Pandas
df.sort_values('age')
df.sort_values('age', ascending=False)
GROUP BY
GROYP BY & COUNT
SQL
SELECT city, COUNT(*) FROM student GROUP BY city
Pandas
df.groupby(['city']).size().to_frame('size').reset_index()
GROYP BY & SUM
SQL
SELECT city, SUM(money) FROM student GROUP BY city
Pandas
df.groupby(['city'])['money'].agg('sum').reset_index()
GROUP BY & ORDER BY & COUNT
GROUP BY 单字段
SQL
SELECT city, COUNT(*) FROM student GROUP BY sex ORDER BY city
Pandas
df.groupby(['city']).size().to_frame('size').reset_index().sort_values('city')
GROUP BY 多字段
SQL
SELECT city, sex, COUNT(*) FROM student GROUP BY city, sex ORDER BY city
Pandas
df.groupby(['city','sex']).size().to_frame('size').reset_index().sort_values('city')
HAVING
SQL
SELECT city, COUNT(*) FROM student GROUP BY city HAVING count(*) > 3
Pandas
df.groupby('city').filter(lambda x:len(x)>3).groupby('city').size().to_frame('size').reset_index()
听说好看的人才能点
以上所述就是小编给大家介绍的《SQL to Pandas 速查表(一)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Natural Language Processing with Python
Steven Bird、Ewan Klein、Edward Loper / O'Reilly Media / 2009-7-10 / USD 44.99
This book offers a highly accessible introduction to Natural Language Processing, the field that underpins a variety of language technologies, ranging from predictive text and email filtering to autom......一起来看看 《Natural Language Processing with Python》 这本书的介绍吧!