Questions about Diversity to Answer in Every Analysis
Written by: Lindsey Elhart - Data Analyst (2018-2020)
Data is a powerful tool to initiate change and bring accountability. This was a resounding takeaway from two recent diversity events, the Women in Tech Regatta and Tableau’s Data + Diversity webinar. We cannot overturn centuries of oppression in a single day, however we can make progress when we choose to speak up, seek out data that is more representative of the population we are serving, and use our influence for a more inclusive environment.
One way I can take action in confronting racism and inequality in my own life is recognizing the influence I have in data projects. I have a system in place to validate the accuracy, reliability, and relevancy of an analysis, however this can be furthered by embedding a diversity lens. Below are questions I will ask myself and of the team to help me realize this vision:
- Who is represented in the data set? And, who is not? Why could this be and can I influence it? Caroline Criado Perez walks through these series of questions in her book Invisible Women, where she highlights how the gender-data-gap perpetuates systemic discrimination against women. This is just one example of how ‘missing data’ can lead to real consequences.
- How can I make diversity data actionable? A panelist at the Women in Tech Regatta shared a real life example as her company addresses the lack of diversity in its workforce. She shared how we must stray away from broad categories and rather look at the whole company. For instance: what does representation look like at different level bands and promotion distributions, who is leaving and why, and how can this information inform new metrics in annual reviews.
- How can I share granular data while abiding to the data use agreements? This highlights the tension of using disaggregated data to an extent that can be acted on, while taking steps to limit the identifiability of sensitive data.
- Who is part of the project team? A better solution can be created when it brings together many perspectives and acknowledges the values and needs of the consumer. A diversity of voices at the table is required to generate inclusive and comprehensive conclusions and solutions.
- Have potential negative consequences that could result from this analysis been discussed and do we have a mitigation plan? It is important to remember that data science that involves human data is human subjects research. Analysts need to be aware of how their analyses, data sets, and algorithms might affect and even be weaponized against oppressed communities.
- How am I continually learning to be a better ally for my BIPOC (Black, Indigenous, and people of color) colleagues and friends and am I implementing what I learn in my work and day-to-day life?
I anticipate these questions will evolve time. A study that I look forward to following and to inform such growth is the recent NCI grant awarded to Fred Hutch to build genetic diversity into cancer research. The goal is to create and share colorectal cancer risk-prediction models for multi-ethnic populations. “We currently have a cancer risk prediction score that works in people of European descent. But it doesn’t predict cancer risk well in Latinx populations or Asians or African Americans or Native populations or others” states Dr. Ulrike Peters, a Hutch principal investigator. This research will be used to inform screening and prevention strategies, as well as personalized medicine for all racial groups.
Confronting racial inequity in data analytics requires ongoing attention. Author and scholar Ibram Kendi visualizes this continual effort when he describes how being a racist or anti-racist is like a peelable nametag. It reflects the very moment before you. We must care now and care into the future. We must do more than stating our commitment but have it show in our actions. We must prioritize learning over knowing (read more about this Coop Community Value here).