4 Less Known Pandas Functions That Can Make Your Work Easier

栏目: IT技术 · 发布时间: 5年前

Learn Pandas for Data Science

4 Less Known Pandas Functions That Can Make Your Work Easier

Supercharge you data science projects

4 Less Known Pandas Functions That Can Make Your Work Easier

Photo by Jérémy Stenuit on Unsplash

Many data scientists have been using Python as their programming language of choice. As an open-source language, Python has gained considerable popularity by providing a variety of data science-related libraries. Particularly, the pandas library is arguably the most prevalent toolbox among Python-based data scientists.

I have to say that the pandas library is so well developed that it provides a very large collection of functions for various operations. However, the drawback of this powerful toolbox is that some useful functions can be less known to beginners. In this article, I would like to share four such functions.

1. The where() Function

Most of the time for the dataset that we’re working with, we have to do some data conversion to make the data in the analyzable format. The where() function is useful to replace the values that doesn’t satisfy the condition. Let’s consider the following example for its usage. Certainly, we first needed to import pandas and numpy as we do for all data manipulation steps.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use where() With Series

In the above figure, we created a Series and applied the where() function. Specifically, the signature use of this function is where(condition, other) . In this call, the condition argument will result in boolean values, and when they’re True , the original values are kept, while they’re False , the value specified by the other argument will be used. In our case, any values that below 1000 were kept, while the ones that were equal to or greater than 1000 were assigned to 1000.

This function can’t only be used with Series, but also with DataFrame. Let’s see a similar usage with the DataFrame. In the example below, the DataFrame df0 ’s odd numbers will all be incremented by 1, and the even values are kept.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use where() With DataFrame

2. The pivot_table() Function

Unlike the where() function, the pivot_table() function is only available to DataFrame. This function is to create a spreadsheet-style pivot table, and thus it’s a great tool to summarize, analyze, and present data by displaying the data in a straightforward manner. Its power can be best shown with a more realistic example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use pivot_table() With DataFrame

In the above figure, we created a DataFrame that consisted of salary and bonus records together with the employees’ gender and department information. We then created a pivot table using the pivot_table() function. Specifically, we set the salary and bonus columns to the values argument, set the department to the index argument, set the gender to the columns argument, and set [np.mean, np.median, np.amax] to the aggfunc argument.

In the output, you can see that we have a pivot table showing us the 2 (gender) by 2 (department) tables in mean, median, and maximum values for the salary and bonus variables. Some interesting observations include that in Department A, women have higher salaries than men, while the pattern is opposite in Department B. In both departments, women and men have similar bonuses.

3. The qcut() Function

When we have a dataset that involves ordinal measures, it sometimes makes more sense to create categorical quantiles to identify possible patterns instead of examining these ordinal measures parametrically. Theoretically, we can calculate the quantile cutoffs ourselves and map the data using these cutoffs to create the new categorical variable.

However, this operation can be easily realized with the qcut() function , which discretizes the variable into equal-sized pools (e.g., quantiles and deciles) based on their ranks. Let’s see how this function works with the following example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use qcut() With DataFrame

In the above figure, we created a DataFrame having 3 columns. We were interested in generating the quantiles for the var2 column. Thus, we specified the q argument to be 4 (it can be 10 if you want deciles). We also specified the label list to mark these quantiles.

4. The melt() Function

Depending on the tools that data scientists use, some prefer the “wide” format (e.g., one subject one row with multiple variables), while some others prefer the “long” format (e.g., one subject multiple rows with one variable). Thus, it’s not uncommon that we need to do data transformation between these formats.

Unlike the transposition T function that transposes the DataFrame entirely, the melt() function is particularly useful to convert the data from the wide to long format. Let’s see how it works with the following example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use melt() With DataFrame

In the above figure, we created a DataFrame in a wide format. Specifically, we have two measures before and after taking the medicine. We then used the melt() function to produce a long-format DataFrame. We specified the SubjectID as the id_vars , the two measures as the value_vars , and rename the columns to be more meaningful.

Before You Go

There are many more functions in pandas that we can explore. In this article, we just learned four functions that some of us don’t know too well, but they can be very useful in our daily data manipulation work.

I hope that you enjoyed reading this piece. You can find the code on GitHub .

About the Author

I write blogs about Python and data processing and analysis. Just in case you’ve missed some of my earlier blogs, here are the links to some articles that are relevant to the current one.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

HTML和XHTML权威指南(第五版)

HTML和XHTML权威指南(第五版)

Chuck Musciano、Bill Kennedy / 技桥 / 清华大学出版社 / 2004-6-1 / 72.00元

HTML!XHTML!级联样式表!编写网页的标准很难整理,因为各种版本的Netscape和Internet Explorer在其实现方式上千差万别。《HTML与XHTML权威指南》将这些标准全部介绍给了读者。本书作者找出了各种标准和浏览器特性,并在创建网页方面为读者提出了很多建议,以便能够被更广泛的浏览者和平台所接受。 学习HTML或XHTML和学习其他任何语言一样。大部分学生都是从......一起来看看 《HTML和XHTML权威指南(第五版)》 这本书的介绍吧!

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HEX CMYK 转换工具
HEX CMYK 转换工具

HEX CMYK 互转工具