Abstract
Increasingly, discrimination by algorithms is perceived as a societal and legal problem. As a response, a number of criteria for implementing algorithmic fairness in machine learning have been developed in the literature. This paper proposes the continuous fairness algorithm \((\hbox {CFA}\theta )\) which enables a continuous interpolation between different fairness definitions. More specifically, we make three main contributions to the existing literature. First, our approach allows the decision maker to continuously vary between specific concepts of individual and group fairness. As a consequence, the algorithm enables the decision maker to adopt intermediate “worldviews” on the degree of discrimination encoded in algorithmic processes, adding nuance to the extreme cases of “we’re all equal” and “what you see is what you get” proposed so far in the literature. Second, we use optimal transport theory, and specifically the concept of the barycenter, to maximize decision maker utility under the chosen fairness constraints. Third, the algorithm is able to handle cases of intersectionality, i.e., of multi-dimensional discrimination of certain groups on grounds of several criteria. We discuss three main examples (credit applications; college admissions; insurance contracts) and map out the legal and policy implications of our approach. The explicit formalization of the trade-off between individual and group fairness allows this post-processing approach to be tailored to different situational contexts in which one or the other fairness criterion may take precedence. Finally, we evaluate our model experimentally.
Similar content being viewed by others
Notes
We would like to point out that our framework also covers cases of non-algorithmic decision making and thus applies widely to decisions implicating fairness.
In particular this requires that the raw data contains sufficiently fine degrees of evaluation. If this is granted, then adding some stochastic noise to the evaluation of the raw data will remove undesired “concentration” effects, if necessary.
This is due to the fact that a necessary precondition for the need to randomize is that different individuals have exactly the same fair score; this is excluded in a continuous setting, and can be neglected in the discrete setting if the evaluation procedure is sufficiently fine-grained, see the discussion above.
In the one-dimensional case, a kind of barycenter has been used by Feldman et al. (2015), but with respect to the Wasserstein-1 distance.
These conditions include absolute continuity of the raw distribution, which can be arbitrarily approximated by discrete distributions in Wasserstein space, as noted above; and a quadratic cost (or utility) function, a condition that can be fulfilled by initially transforming the utility function of the decision maker appropriately.
CJEU Case C-443/15 Parris ECLI:EU:C:2016:897.
In many cases, \(n=1\) may be sufficient. Indeed, in most real-life applications, at some point all the information about the individuals must be brought into a linear ordering.
We use the term “probability measure” in the sense of “normed measure”; no randomness is insinuated by this terminology.
The pullback measure \(dx^n\circ S_k^{-1}\) is defined by \(dx^n\circ S_k^{-1}(B):=dx^n(\{x\in \mathbb {R}^n: S_k(x)\in B\})\), which gives the proportion of individuals whose score lies in the set B.
This is typically the case for discrete measures, so that the notion of transport map needs to be relaxed to transport plan. In the context of algorithmic fairness, this leads to the necessity of randomisation, as in Dwork et al. (2012).
Recall the map \(T^\theta \) is defined in Definition 2.3.
This may be significant in order to handle intersectionality, as explained in the introduction.
The main obstruction to continuity of transport maps is the possible non-connectedness of the support of the target measure (cf. Theorem 1.27 in Ambrosio and Gigli 2013). This is the case if, within one group, there are further subgroups with very different score statistics.
If the group size is odd, the college will have to randomize its admission decision for the lowest ranking pair of individuals at or above score 0.5. Of course this issue is not visible in our continuum framework.
See, e.g., https://www.kreditech.com/.
This pattern is basically found, for example, in the empirical study by Ayres and Siegelmann (1995), p. 309 et seq., for offers made by car dealers to members of the respective groups. In other settings, of course, women may be the group discriminated against.
CJEU Case C-236/09 Test-Achat ECLI:EU:C:2011:100, para. 28–34; on this, see Tobler (2011).
See, e.g., the guidelines by the European Commission stating that ”[t]he use of risk factors which might be correlated with gender [...] remains possible, as long as they are true risk factors in their own right” (European Commission, Guidelines on the application of Council Directive 2004/113/EC to insurance, in the light of the judgment of the Court of Justice of the European Union in Case C-236/09 (Test-Achats), OJ 2012 C 11/1, para. 17). Thus, the legal admissibility of correlated factors crucially depends on whether these factors can plausibly be related to the risks covered by the insurance. In machine learning contexts, where specific factors may not always be reconstructable from the output (particularly in deep neural networks), insurers can “play it safe” by approximating male and female scores.
This becomes apparent in the very definition of indirect discrimination, for example in Art. 2(b) of Directive 2004/113/EC on sex [i.e., gender] discrimination: ”where an apparently neutral [...] practice would put persons of one sex at a particular disadvantage compared with persons of the other sex, unless that [...] practice is objectively justified by a legitimate aim and the means of achieving that aim are appropriate and necessary.”
CJEU, Case C-450/93, Kalanke, EU:C:1995:322, para 22.
CJEU, Case C-409/95, Marschall, EU:C:1997:533, para 33.
CJEU, Case C-450/93, Kalanke, EU:C:1995:322, para 22; US Supreme Court, Fisher II, 136 S. Ct., p. 2210.
Cf. CJEU, Case C-158/97, Badeck, EU:C:2000:163, paras. 55 and 63 (concerning selection for training and interview).
We calculate these thresholds as follows: Each group has a share of 16.6 % of the entire population. Hence, to reach a disparity measure of more than 0.8, the groups need to have a positive selection rate of at least 13.28 %. Group 3 reaches this threshold from rank 53,000 on, with 13.42 %; Group 5 at rank 64,000 with 13.5 %; and Group 6 at rank 95,000 with 13.6 %. For each of the other fairness evaluations, we calculate the thresholds correspondingly.
References
Agueh M, Carlier G (2011) Barycenters in the Wasserstein space. SIAM J Math Anal 43:904–924
Ambrosio L, Gigli N (2013) A user’s guide to optimal transport. Springer, Berlin
Ayres I, Siegelmann P (1995) Race and gender discrimination in bargaining for a new car. Am Econ Rev 85:304–321
Barocas S, Selbst A (2016) Big data’s disparate impact. Calif Law Rev 104:671–732
Berk R et al (2018) Fairness in criminal justice risk assessments: the state of the art. Sociol Methods Res 1–42
Bent J (2019) Is algorithmic affirmative action legal? Georget. Law J (forthcoming). https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3372690. Accessed on 19 June 2019
Binns R (2018) Fairness in machine learning: lessons from political philosophy. J Mach Learn Res 81:1–11
Brenier Y (1987) Décomposition polaire et réarrangement monotone des champs de vecteurs. C R Acad Sci Paris 305:805–808
Brenier Y (1991) Polar factorization and monotone rearrangement of vector-valued functions. Commun Pure Appl Math 44:375–417
Calders T et al (2013) Controlling attribute effect in linear regression. In: 2013 IEEE 13th international conference on data mining, pp 71–80
Calders T, Verwer S (2010) Three naive Bayes approaches for discrimination-free classification. Data Min Knowl Discov 21:277–292
Calders T, Žliobaitė I (2013) Why unbiased computational processes can lead to discriminative decision procedures. In: Custers T et al (eds) Discrimination and privacy in the information society. Springer, Berlin, pp 43–57
Calmon F et al (2017) Optimized pre-processing for discrimination prevention. Adv Neural Inf Process Syst 30:3995–4004
Chen D et al (2017) Enhancing transparency and control when drawing data-driven inferences about individuals. Big Data 5:197–212
Chouldechova A (2017) Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5:153–163
Chouldechova A, Roth A (2018) The frontiers of fairness in machine learning. arXiv:1810.08810
Craig P, de Búrca G (2011) EU law, 5th edn. Oxford University Press, Oxford
Datta A et al (2015) Automated experiments on ad privacy settings. Proc Priv Enhan Technol 1:92–112
del Barrio E et al (2019) Obtaining fairness using optimal transport theory. In: Proceedings of the 36th international conference on machine learning, vol PMLR 97, pp 2357–2365
Dwork C et al (2012) Fairness through awareness. In: Proceedings of the 3rd innovations in theoretical computer science conference, pp 214–226
EEOC (2015) Uniform guidelines on employment selection procedures, 29 C.F.R.§1607
Feldman M et al (2015) Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 259–268
Fish B et al (2016) A confidence-based approach for balancing fairness and accuracy. In: Proceedings of the 2016 SIAM international conference on data mining, pp 144–152
Flamary, R, Courty, N (2017) POT python optimal transport library. https://github.com/rflamary/POT. Accessed 1st July 2019
Friedler S et al (2016) On the (im)possibility of fairness. arXiv:1609.07236
Fukuchi K et al (2015) Prediction with model-based neutrality. IEICE Trans Inf Syst E98–D(8):1503–1516
Fuster A et al (2018) Predictably unequal? The effects of machine learning on credit markets. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3072038. Accessed 19 June 2019
German Federal Ministry of Justice (2018) Sachverständigenrat für Verbraucherfragen. Verbrauchergerechtes Scoring, Report
Gillen S et al (2018) Online learning with an unknown fairness metric. Adv Neural Inf Process Syst 32:2600–2609
Guan L et al (2016) From physical to cyber: escalating protection for personalized auto insurance. In: Proceedings of the 14th ACM conference on embedded network sensor systems, pp 42–55
Hacker Ph (2017) Personal data, exploitative contracts, and algorithmic fairness: autonomous vehicles meet the internet of things. Int Data Priv Law 7:266–286
Hacker Ph (2018) Teaching fairness to artificial intelligence: existing and novel strategies against algorithmic discrimination under EU law. Common Mark Law Rev 55:1143–1186
Hardt M et al (2016) Equality of opportunity in supervised learning. Adv Neural Inf Process Syst 29:3315–3323
Hurley M, Adebayo J (2016) Credit scoring in the era of big data. Yale JL Tech 18:148–216
Järvelin K, Kekäläinen J (2002) Cumulated gain-based evaluation of IR techniques. ACM Trans Inf Syst 20:422–446
Johndrow J, Lum K (2019) An algorithm for removing sensitive information: application to race-independent recidivism prediction. Ann Appl Stat 13:189–220
Joseph M et al (2016) Fairness in learning: classic and contextual bandits. Adv Neural Inf Process Syst 29:325–333
Kamishima T et al (2018) Recommendation independence. Proc Mach Learn Res 81:1–15
Kim P (2017) Auditing algorithms for discrimination. Univ PA Law Rev Online 166:189–203
Kleinberg J et al (2016) Inherent trade-offs in the fair determination of risk scores. arXiv:1609.05807
Kleinberg J et al (2018) Algorithmic fairness. In: AEA papers and proceedings, vol 108, pp 22–27
Kroll JA et al (2017) Accountable algorithms. Univ Pa Law Rev 165:633–705
Lowry S, Macpherson G (1988) A blot on the profession. Br Med J 296:657–658
Malamud D (2015) The strange persistence of affirmative action under title VII. West Va Law Rev 118:1–22
Moses M (2016) Living with moral disagreement: the enduring controversy about affirmative action. University of Chicago Press, Chicago
Naibandian J (2010) The US Supreme Court’s “consensus” on affirmative action. In: Broadnax W (ed) Diversity and affirmative action in public service. Westview Press, Boulder, pp 111–125
Pasquale F (2015) The black box society. The secret algorithms. That control money and information. Harvard University Press, Cambridge
Pérez-Suay A et al (2017) Fair kernel learning. In: Joint European conference on machine learning and knowledge discovery in databases. Springer, Cham, pp 339–355
Poueymirou M (2017) Schuette v. coalition to defend affirmative action and the death of the political process doctrine. UC Irvine Law Rev 7:167–194
Qin T (2010) LETOR: a benchmark collection for research on learning to rank for information retrieval. Inf Retr 13:346–374
Reed Ch et al (2016) Responsibility, autonomy and accountability: legal liability for machine learning. Queen Mary School of Law Legal Studies Research Paper No. 243/2016, https://ssrn.com/abstract=2853462. Accessed 19 June 2019
Reuters (2018) Amazon ditched AI recruiting tool that favored men for technical jobs, The Guardian (11 October 2018). https://www.theguardian.com/technology/2018/oct/10/amazon-hiring-ai-gender-bias-recruiting-engine. Accessed 19 June 2019
Robinson K (2016) Fisher’s cautionary tale and the urgent need for equal access to an excellent education. Harv Law Rev 130:185–240
Romei A, Ruggieri S (2014) A multidisciplinary survey on discrimination analysis. Knowl Eng Rev 29:582–638
Rothmann R et al (2014) Credit Scoring in Oesterreich, Report of the Austrian Academy of Sciences
Selbst A (2017) Disparate impact in big data policing. Georg Law Rev 52:109–195
Sullivan C (2005) Disparate impact: looking past the desert palace mirage. William Mary Law Rev 47:911–1002
Tobler C (2005) Indirect discrimination. Intersentia, Cambridge
Tobler Ch (2011) Case C-236/09, Association belge des Consommateurs Test-Achats ASBL, Yann van Vugt, Charles Basselier v. Conseil des ministres, Judgment of the Court of Justice (Grand Chamber) of 1 March 2011. Common Mark Law Rev 48:2041–2060
Tutt A (2017) An FDA for algorithms. Adm Law Rev 69:83–125
Veale M, Binns R (2017) Fairer machine learning in the real world: mitigating discrimination without collecting sensitive data. Big Data Soc July–December:1–17
Wachter S et al (2017) Why a right to explanation of automated decision-making does not exist in the general data protection regulation. Int Data Priv Law 7:76–99
Waddington L, Bell M (2001) More equal than others: distinguishing European Union equality directives. Common Mark Law Rev 38:587–611
Wightman L (1998) LSAC national longitudinal bar passage study, LSAC research report series
Yang K et al (2018) A nutritional label for rankings. In: Proceedings of the 2018 international conference on management of data, vol 18, pp 1773–1776
Zafar M et al (2017) Fairness constraints: mechanisms for fair classification. Artif Intell Stat 20:962–970
Zehlike M et al (2017) FA*IR: a fair top-k ranking algorithm. In: 26th ACM international conference on information and knowledge management, pp 1569–1578
Zemel R et al (2013) Learning fair representations. In: Proceedings of the 30th international conference on machine learning, pp 325–333
Žliobaitė I (2015) On the relation between accuracy and fairness in binary classification. FATML 2015
Žliobaitė I (2017) Measuring discrimination in algorithmic decision making. Data Min Knowl Discov 31:1060–1089
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Toon Calders.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zehlike, M., Hacker, P. & Wiedemann, E. Matching code and law: achieving algorithmic fairness with optimal transport. Data Min Knowl Disc 34, 163–200 (2020). https://doi.org/10.1007/s10618-019-00658-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-019-00658-8