4 Less Known Pandas Functions That Can Make Your Work Easier

栏目: IT技术 · 发布时间: 4年前

Learn Pandas for Data Science

4 Less Known Pandas Functions That Can Make Your Work Easier

Supercharge you data science projects

4 Less Known Pandas Functions That Can Make Your Work Easier

Photo by Jérémy Stenuit on Unsplash

Many data scientists have been using Python as their programming language of choice. As an open-source language, Python has gained considerable popularity by providing a variety of data science-related libraries. Particularly, the pandas library is arguably the most prevalent toolbox among Python-based data scientists.

I have to say that the pandas library is so well developed that it provides a very large collection of functions for various operations. However, the drawback of this powerful toolbox is that some useful functions can be less known to beginners. In this article, I would like to share four such functions.

1. The where() Function

Most of the time for the dataset that we’re working with, we have to do some data conversion to make the data in the analyzable format. The where() function is useful to replace the values that doesn’t satisfy the condition. Let’s consider the following example for its usage. Certainly, we first needed to import pandas and numpy as we do for all data manipulation steps.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use where() With Series

In the above figure, we created a Series and applied the where() function. Specifically, the signature use of this function is where(condition, other) . In this call, the condition argument will result in boolean values, and when they’re True , the original values are kept, while they’re False , the value specified by the other argument will be used. In our case, any values that below 1000 were kept, while the ones that were equal to or greater than 1000 were assigned to 1000.

This function can’t only be used with Series, but also with DataFrame. Let’s see a similar usage with the DataFrame. In the example below, the DataFrame df0 ’s odd numbers will all be incremented by 1, and the even values are kept.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use where() With DataFrame

2. The pivot_table() Function

Unlike the where() function, the pivot_table() function is only available to DataFrame. This function is to create a spreadsheet-style pivot table, and thus it’s a great tool to summarize, analyze, and present data by displaying the data in a straightforward manner. Its power can be best shown with a more realistic example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use pivot_table() With DataFrame

In the above figure, we created a DataFrame that consisted of salary and bonus records together with the employees’ gender and department information. We then created a pivot table using the pivot_table() function. Specifically, we set the salary and bonus columns to the values argument, set the department to the index argument, set the gender to the columns argument, and set [np.mean, np.median, np.amax] to the aggfunc argument.

In the output, you can see that we have a pivot table showing us the 2 (gender) by 2 (department) tables in mean, median, and maximum values for the salary and bonus variables. Some interesting observations include that in Department A, women have higher salaries than men, while the pattern is opposite in Department B. In both departments, women and men have similar bonuses.

3. The qcut() Function

When we have a dataset that involves ordinal measures, it sometimes makes more sense to create categorical quantiles to identify possible patterns instead of examining these ordinal measures parametrically. Theoretically, we can calculate the quantile cutoffs ourselves and map the data using these cutoffs to create the new categorical variable.

However, this operation can be easily realized with the qcut() function , which discretizes the variable into equal-sized pools (e.g., quantiles and deciles) based on their ranks. Let’s see how this function works with the following example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use qcut() With DataFrame

In the above figure, we created a DataFrame having 3 columns. We were interested in generating the quantiles for the var2 column. Thus, we specified the q argument to be 4 (it can be 10 if you want deciles). We also specified the label list to mark these quantiles.

4. The melt() Function

Depending on the tools that data scientists use, some prefer the “wide” format (e.g., one subject one row with multiple variables), while some others prefer the “long” format (e.g., one subject multiple rows with one variable). Thus, it’s not uncommon that we need to do data transformation between these formats.

Unlike the transposition T function that transposes the DataFrame entirely, the melt() function is particularly useful to convert the data from the wide to long format. Let’s see how it works with the following example.

4 Less Known Pandas Functions That Can Make Your Work Easier

Use melt() With DataFrame

In the above figure, we created a DataFrame in a wide format. Specifically, we have two measures before and after taking the medicine. We then used the melt() function to produce a long-format DataFrame. We specified the SubjectID as the id_vars , the two measures as the value_vars , and rename the columns to be more meaningful.

Before You Go

There are many more functions in pandas that we can explore. In this article, we just learned four functions that some of us don’t know too well, but they can be very useful in our daily data manipulation work.

I hope that you enjoyed reading this piece. You can find the code on GitHub .

About the Author

I write blogs about Python and data processing and analysis. Just in case you’ve missed some of my earlier blogs, here are the links to some articles that are relevant to the current one.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

HTML & XHTML

HTML & XHTML

Chuck Musciano、Bill Kennedy / O'Reilly Media / 2006-10-27 / GBP 39.99

"...lucid, in-depth descriptions of the behavior of every HTML tag on every major browser and platform, plus enough dry humor to make the book a pleasure to read." --Edward Mendelson, PC Magazine "Whe......一起来看看 《HTML & XHTML》 这本书的介绍吧!

HTML 编码/解码
HTML 编码/解码

HTML 编码/解码

MD5 加密
MD5 加密

MD5 加密工具

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具