2 Easy Ways To Avoid Racial Discrimination in Your Model

栏目: IT技术 · 发布时间: 5年前

2 Easy Ways To Avoid Racial Discrimination in Your Model

Photo By Mars Martinez on Unsplash

A high-level goal of many AI projects is to address the ethical implications of algorithms along the lines of fairness and discrimination.

Why would we care about fairness?

It is a known fact that algorithms can facilitate illegal discrimination .

For example, it may not surprise that each investor wants to put more capital in loans with a high return of investment and low risk.

A modern idea is to use a machine learning model to decide, based on the sliver of known information about the outcome of past loans, which future loan requests give the largest chance of the borrower fully paying it back while achieving the best trade-off with high returns (high-interest rate).

There’s one problem: the model is trained on historical data, and poor uneducated people, often racial minorities or people with less working experience have a historical trend of being more likely to succumb to loan charge-off than the general population.

So if our model is trying to maximize the return of investment, it may also be targeting white people, people in specific zip codes, people with work experience, de facto denying opportunities for fair loans to the remaining population.

Such behavior would be illegal.

There could be two points of failure here:

  • we could have unwittingly encoded biases into the model based on a biased exploration of the data,
  • the data itself could encode biases due to human decisions made to create it.

Luckily combating disparate treatment is easy.

Method 1: Disparate Treatment Check

Although no definition is widely agreed as a good definition of fairness, we can use statistical parity to test the hypothesis of fairness on a protected attribute such as race. This is a disparate treatment check.

Let’s consider the population of borrowers who applied for a loan called as P , and there is a known subset B of Black borrowers within that population.

We assume that there is some distribution D over P which represents the probability that any of those borrowers will be picked by our model for evaluation.

Our model is a classifier m : X0,1 that gives labels to borrowers. If m =1 then the person will Charge Off on his loan, if m =0, the person will fully pay his loan.

The bias or statistical imparity of m on B with respect to X , D is the difference between the probability that a random Black borrower is labeled 1 and the probability that a random non-Black borrower is labeled 1.

If the statistical imparity is small, then we can say that our model is having statistical parity. This metric describes how fair our model is with respect to the protected subset population B .

The input of the function is an array of binary values (1 if the sample is a loan requested by a Black person, 0 else) and a second array of binary values (1 if the model predicted that the loan will Charge Off, 0 else).

Method 2: Disparate Impact Check

Disparate treatment is often referred to as intentional. On the other hand, disparate impact is unintentional. In United States labor law disparate impact refers to “practices in employment, housing, and other areas that adversely affect one group of people of a protected characteristic more than another, even though rules applied by employers or landlords are formally neutral.”

Disparate impact measures the ratio of conditional probabilities P that the majority and protected classes get a particular outcome. The legal definition mentions a threshold of 80%.

If P(White|chargeoff)/P(Black|chargeoff) <= 80% then the definition of disparate impact is satisfied.

The input of the function is an array of binary values (1 if the sample is a loan requested by a Black person, 0 else) and a second array of binary values (1 if the model predicted that the loan will Charge Off, 0 else).

The output is True if the model demonstrates discrimination, False else. The degree of discrimination is also provided between 0 and 1.

Conclusion

In this article, we introduced statistical parity as a metric that characterizes the degree of discrimination between groups, where groups are defined concerning some protected class (e.g. Black population). We also covered the 80 percent rule to measure disparate impact.

Both methods make an easy starting point to check fairness for a classifier model. An advanced understanding is offered in this tutorial on fairness in machine learning .

You can read more about uncertainty in AI in my follow-up articles below:


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Python高性能(第2版)

Python高性能(第2版)

[加] 加布丽埃勒•拉纳诺(Gabriele Lanaro) / 袁国忠 / 人民邮电出版社 / 2018-8 / 59.00元

本书是一本Python性能提升指南,展示了如何利用Python的原生库以及丰富的第三方库来构建健壮的应用程序。书中阐释了如何利用各种剖析器来找出Python应用程序的性能瓶颈,并应用正确的算法和高效的数据结构来解决它们;介绍了如何有效地利用NumPy、Pandas和Cython高性能地执行数值计算;解释了异步编程的相关概念,以及如何利用响应式编程实现响应式应用程序;概述了并行编程的概念,并论述了如......一起来看看 《Python高性能(第2版)》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

RGB HSV 转换
RGB HSV 转换

RGB HSV 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具