The global development industry is generating a lot of data on the ‘developing’ world–data that has not always been available. As technology has made data collection easier and scalable, many in the development industry have already established that monitoring (i.e., data collection) is much easier than evaluating (i.e., data insights). However, both aspects of M&E require good methodologies to ensure the data are accurately represented.
Despite making my living working with data, I am somewhat of a data skeptic. Specifically, I am skeptical of the notion that numbers and data are truth. Much like geographer Doreen Massey’s conceptualization of space as a product of social relations, data embodies social relations and biases. In other words, it is difficult to guarantee the neutrality of data and numbers in terms of how they are collected, what they show, and how they are analyzed. All of this information is subject to human bias – whether intentional or unintentional – with the way humans label data, the limitations of finite data samples, and the human-designed technology that might reinforce biases.
The way humans label data
Does the way we identify data represent cultural bias? In some ways, yes. Labels can be culturally problematic in the way we classify data and the way people interpret those classifications. For example, when collecting demographic information for a survey, limiting gender to two categories, we can reinforce our own notion of gender categories and unintentionally bias the data. India and Nepal, for example, both recognize a third gender on official documents. M&E data in these countries however, do not always reflect this change. Mortiz Hardt, a researcher at IBM, notes five ways that big data is unfair. Along with different cultural understandings and the consistent, if unintentional, representation of social categories (e.g., race and gender), Hardt notes sample size as a problem.
Limited sample sizes of data
The issue of certain groups not being represented in the data is a particular problem for global development. A recent study by the Global Web Index highlights that geolocation can lead to groups in the ‘developing’ world not being counted by web analytics. Virtual private networks (VPNs), which are a common tool for accessing blocked sites, and shared devices are some of the main culprits. Additionally, issues of privacy can change responses and skew the data and limit the sample size of quality data. For example, in some societies, even if a woman owns a cell phone, she is not always free to respond without having her calls and text messages monitored.
Are we training machines to mimic our cultural biases that are in data?
This human bias within data is of particular concern for predictive modeling and big data, both of which are starting to enter development as seen in report reports by UN Pulse and the World Economic Forum. But an algorithm for predictive modeling is just training a machine based on the data that it’s given. So if the data are biased, the prediction will be biased. According to Wired Magazine article with Danielle Citron, a University of Maryland law professor, humans can trust algorithms too much, in that “[…]we think of them as objective, whereas the reality is that humans craft those algorithms and can embed in them all sorts of biases and perspectives.”
So what does data bias mean for global development and M&E professionals?
Global development needs to continue being data-driven. This is emphasized by one of the principles for digital development being focused on data driven decision making. It is equally important we recognize and understand the biases we incorporate into datasets and the biases of the datasets of the datasets we use.
At the end of the day, Tech for M&E begins with the humans behind the data. With the vast amounts of data provided with modern digital data collection tools, M&E practitioners need to understand how they can act as gatekeepers to ensure that we note the bias we are embedding in our data.
Interested in this topic on data in global development and measuring results? Join our top selling online course on Technology for Monitoring & Evaluation, which begins April 20, 2015.