print("AI Bias")

Notes from Amanda Levendowski's Paper

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3024938


Abstract

AI systems can reflect or exacerbate societal bias, from racist facial recognition to sexist NLP.

These biases threaten to overshadow AI's technological gains and potential benefits.

May sources of bias:

Role of law itself: copyright

AI learn to 'think' by reading, viewing, and listening to copies of human works.

Copyright law's exclusion of access to certain copyrighted source materials may create or promote biased AI systems.

Copyright law limits bias mitigation techniques, such as testing AI through reverse engineering, algorithmic accountability processes, and competing to convert customers.

The rules of copyright law also privilege access to certain works over others, encouraging AI creators to use easily available, legally low-risk sources of data for teaching AI, even when those data are demonstrably biased.

Very similar issue with public medical imaging datasets, since we have very limited sources, typically from only a few sites in the US, which is what a lot of the world is using (+ their local hospital data if researchers or companies)

A different part of copyright law -- fair use doctrine -- has traditionally been used to address similar concerns in other tech fields, and may be capable of address AI bias.

In large part because the normative values embedded within traditional fair use ultimately align with the goals of mitigating AI bias and quite literally may help create fairer AI systems.


word2vec word embeddings`

man is to computer programmer / woman is to homemaker

Garbage in, garbage out

Copyright law causes friction that limits access to training data and restricts who can use certain data.

This friction is a significatn contributor to biased AI.

The friction caused by copyright law encourages AI creators to use biased, low-friction data (BLFD) for training AI systems, like word2vec, despite those demonstrable biases.

This also prevention of bias mitigation techniques, like reweighting algorithmic inputs or supplementing datasets with additional data.

Copyright law can even preclude potential competitors from converting the customers of dominant AI players.


Part I - Teaching Systems to be Artificially Intelligent

Toy datasets like MNIST and Cats / Dogs may not be that relevant, but when serious systems are deployed and have unintended biases, they cause real problems.

The implicit biases resulting in Type 1 and Type 2 erros become important and even dangerous.

The internet may be full of cats, but it does not follow that the photographs and videos featuring those cats are free for anyone to use.

It remains an open question whether copies created
for purposes of training AI systems constitute “copies” under the
Copyright Act, which defines “copies” as “material objects . . . in which
a work is fixed by any method now known or later developed, and from
which the work can be perceived, reproduced, or otherwise communicated, either directly or with the aid of a machine or device.

Thus, certain “copies” may be so fleeting that they are not considered
copies at all.65 Google, for example, has developed a technique called
federated learning, which localizes training data to the originating
mobile device rather than copying data to a centralized server.66 It
remains far from settled that decentralized training data stored in random
access memory (RAM) would not be considered “copies” under the
Copyright Act

Thus, the rules of copyright law can be understood as
causing two kinds of friction: competition and access. From a
competition perspective, copyright law can limit implementation of bias
mitigation techniques on existing AI systems and constrain competition
to create less biased systems. And from an access perspective, copyright
law can privilege the use of certain works over others, inadvertently
encouraging AI creators to use easily available, legally low-risk works
as training data, even when those data are demonstrably biased.

Part 3 - Invoking Fair Use To Create Fairer AI Systems

🌅