This content is part of the Essential Guide: How to solve your TMI problem: Data science analytics to the rescue

Why bias is among the data science problems

The demand for data scientists is huge. But the risks of bad, biased data are also huge. Data scientist Cathy O'Neil makes the case for creating a more ethical data scientist.

You might not know it, but there's a potential dark side to the field of data science, and it's something many companies overlook. At a time when the amount of big data is creating a big demand for data scientists, Cathy O'Neil, a data scientist herself, has just written a book called Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Her concern is that in the rush to take advantage of big data, built-in biases might be missed making the conclusions drawn inaccurate and potentially quite damaging. She sat down with senior technology editor Valerie Silverthorne to talk about data science problems and why companies need to have a data science ethics policy in place.

Cathy O'Neil: I bristle at the idea that you can solve anything with data. I call myself a data skeptic.

What about bias and other data science problems companies need to be afraid of?

Cathy O'Neil, data scientistCathy O'Neil

O'Neil: We have a trust problem. Not enough scrutiny. Everybody needs data scientists. But we need to be putting a lot more types of people on these teams with data scientists to make sure the choices are well thought out. Data scientists are not trained to think ethically or think through these things. There can be unintended consequences that a sociologist might see but data scientists would be completely stupid about. Our justice and our predictive policing data is based on Jim Crow laws and if you're using that historical data to train our current models they're going to be racist. The assumption is that once you've done something with data it is automatically free of values and objectives. Social scientists know better than that. Data scientists don't know better than that.

What's at risk if we don't pay attention to bias in the field of data science?

O'Neil: There's a risk in this process that we'd actually be automating bias in. If no one on the team asks the right questions you could get algorithms that are biased against women or people of color or older people. In the very near future companies building internal algorithms that assess employees may very soon be facing litigation for discriminatory processes. This is not just a dream I'm having. We need to monitor these things and make sure they're not discriminatory if we want to be better than we are.

Is this just an internal issue?

O'Neil: It's more obvious when it comes to things like hiring, but you can create algorithms that are customer-facing too. If your business has anything to do with making loans, that's an obvious place where discrimination could be a factor. There are lots of examples.

So what can be done about data science problems, such as bias?

O'Neil: Some universities are starting to teach ethics classes to data scientists. But there has not been a lot of regulation in this area. There are lots of rules and ethics for experimentation in biomedicine and you have to give your consent. That kind of thing doesn't exist in the world of big data. We are all constantly being A/B tested and most of the time it's for stupid things like "What color is this advertisement?" We don't want to have to consent to those things. It's not really the testing that bothers me; it's the fact that we're literally throwing out these algorithms into the wild and thinking they're perfect. There is no reason to think they work at all. I liken it to having a car company put cars on the road with no safety tests. No one measures the results. We have to measure.

Next Steps

Good at physics? Here's your new career

Why a data scientist is not a BI person

Why data scientists make top dollar

Dig Deeper on Topics Archive