Go to page content

Data isn't always neutral: how data can deepen racial inequalities

An examination of systems, justice and global politics

RACISM.jpg

There’s a common myth that ‘facts are facts’ and those facts ‘don’t lie’.

The top UK data experts will tell you otherwise. In reality, facts can lie - and worse, they can mirror and deepen racial inequalities.

Algorithms created from data collection can be biased against ethnic minorities, which can have serious effects on a person’s life, including their freedom, their level of healthcare and their country’s economic wealth.

How can fact-driven systems be racist? 🤔

Big data and AI (Artificial Intelligence) is supposed to make everything from healthcare to law enforcement more efficient. Many also thought that big data would make society fairer, because it would remove human biases. For example, with class, gender and crucially, race.

But, David Gillborn, professor of Critical Race Studies at the University of Birmingham, says that if anything, data itself has perpetuated bias.

‘Numbers are constructed by researchers in exactly the same way as interview data. Which numbers to collect? When to collect them? Where to look? Who to ask? Every one of these questions will change the number that you end up with’, he explained.

Essentially, the racial biases of researchers can be passed on to the data they collect. They can choose a data sample with a certain demographic based on personal preference, or even only choose to present data that supports their existing beliefs.

The real problem is that biased data can inform the data-driven technology we use for everyday life: solidifying and worsening systemic racism in our society. If you’re confused about how- you’re definitely not alone, in this article we’ll go into a few examples of how this happens, and what we can do about it.

Predicting whether criminals will reoffend 🎲

One of the most unnerving ways biased data determines people’s lives is by predicting the likelihood of criminals reoffending. A ProPublica study found that the algorithms used across America to predict whether people will reoffend after being released from prison are deeply biased against black people.

These algorithms are used at every stage of the criminal justice system to decide who can be set free, assigning bond amounts, and in general determine the defendant’s freedom.

The study revealed that the algorithm was assigning many false positives for black people, and many false negatives for white people. One example would be when the algorithm predicted a ‘high risk’ of reoffending to black teenage girl who took someone else’s bicycle when late to collect her little sister, and a ‘low risk’ to a 41 year-old white man who shoplifted, and had a previous conviction of armed robbery. Two years later, the black teenager didn’t reoffend, and the white man was sentenced to eight years in prison after another robbery.

Predicting whether children will become criminals 🚸

Similarly, predictive algorithms like the Netherlands top 600, and top 400, which predict how likely children under 12 are to become criminals, base predictions on class and race related variables.

Calculating the criminal potential of children based on data collection is already pretty ethically questionably, but an ENAR study found that minority children are unfairly discriminated against; data collection is mostly deployed in predominantly non-white and low-income areas, and the results end up being presented in a misleading and disproportionate way.

Convicting people 🚨

Another highly concerning study by Runnymede revealed that in the UK since 2004 the police have collected and stored the DNA 🔬 and fingerprints from all people who are arrested in the ‘National DNA Database’, even if they were proven innocent or no charges were pressed. This database is checked everyday to compare the DNA profiles of these people to DNA collected at crime scenes. Of course, people of colour ‘are picked up more frequently, and oftentime for no cause’.

Appearing on this record will show up on someone’s advanced criminal record check and you could be refused a visa or a job. Twenty-five private commercial companies have accessed this data base in the past ten years, which is a clear violation of data privacy laws.

In the report, Matilda MacAttram says the database ‘poses one the biggest threats to race relations in Britain… the numbers of innocent people from Britain’s black communities profiled on this database far outstrips that of any other group’. As part of the problem- this large scale ‘biosurveillance’ can threaten your fundamental right to privacy, and in reality, it hasn’t led to an increase in convictions through DNA.

Deciding who gets public healthcare 🏥

Unfortunately, certain uses of data embeds systemic racism in the algorithms used in US public healthcare, too. An algorithm used to decide if millions of patients get access to a high-risk care management program was found by researchers to have a ‘significant’ racial bias against black patients - choosing healthier white patients over sicker black patients for the program.

The algorithm assessed how high someones' medical bills were, assuming a higher medical bill probably meant their health condition was worse. In actuality, using this data massively skewed results- because of the effects of systemic racism, ‘ranging from distrust of the health-care system to direct racial discrimination by health-care providers’ white people were more likely to have higher medical bills because they could better access and afford healthcare. The black people who were discriminated against were found to have lower bills but, in general, were in greater need.

'The Big Picture': digital colonialism explained 🌍

If we take a step back, a lot of ‘digital wealth’, or, emerging ICTs (Information and Communication Technologies), are concentrated in a handful of nations.

This is what’s called ‘digital colonialism’; rich 'developed' economies have a monopoly on data technology and emerging countries become increasingly dependent on their services. It's important to note that data is now a highly valuable resource for every country.

A real problem is that not all countries are included in the ‘global’ conversations on data norm setting and data protection regulation in cyber security. At the G20 summit countries like India, Indonesia and South Africa refused to sign an international declaration on global data flows because they were never consulted on their own data interests beforehand.

A working example of data colonialism would be the fact that India is the country in which most people are Facebook users, yet none of Facebook's data centres are based in India. Data localisation in developing countries like India would mean less countries dependent on, and paying for the ‘import’ of, foreign-owned digital infrastructure, and companies like Facebook would have to pay taxes in these countries. Indonesia also has a large Facebook user population, but the US threatened to end preferential trade agreements if it implemented localisation regulations.

So, how can we fix this? 💪

In law enforcement, Rebekah Delsol, author of ‘Stop and Search: The anatomy of a Police Power’, talks about how ‘more robust data gathering’, that’s publicly available, of ‘all stops and searches and detentions’ and ‘the race and faith of the individual’, could more reveal the bigger picture of racial bias in the UK, and shine a light on the scale of the issue. The government of Ontario similarly published ‘Data Standards for the identification and Monitoring of Systemic Racism’ to argue how governments can use data collection to ‘help identify and monitor systemic racism and racial disparities within the public sector’. Data collection can be used for explicitly anti-racist purposes.

In healthcare, although the NHS is largely reliant on paper files, Reform says that if algorithms are built expertly we could actually use AI in the NHS to help predict which ‘groups of individuals’ are at risk of what illness and allow the NHS to target treatment more effectively towards them.

With data colonialism, global conversations about the allocation of digital resources where many more countries are included may be a solution.

Final thoughts 💭

Data and algorithms are only as fair as we build them to be. Data-driven systems can and have re-enforced systemic racism and other prejudices present in our societies.

What’s more, the monopolisation of data storage by a few countries in the world has created an unequal and exploitative big data landscape world-wide.We need to be more aware about the impacts of unbiased data, and work towards solutions that tackle this.

We've also got a fantastic piece coming up on how data can be used to combat societal inequalities like this, so stay tuned🌟

FIND OUT WHICH COMPANIES HAVE YOUR DATA