This year, the term “data science” turns 21 years old. The research field itself stretches far back into previous decades, long before this milestone birthday, before the invention of the world wide web, before Harvard Business Review decreed the study of data science “the sexiest job of the 21st century.” Following the same path of statistics, data science extracts useful information and knowledge from the enormous amounts of raw data produced in fields as wildly divergent as banking, health care, and manufacturing. The ability to analyze and decipher that volume of variables has become increasingly necessary, as new technology and faster processing speeds generate 2.5 quintillion bytes of data every day.
That means the analysis of “big data” is big business. It identifies patterns, trends, and correlations that can translate to profitability, competitiveness, and industry-specific technological developments for scrappy start-ups, big-name companies like Google and Netflix, and others in between.
In just five years, the field skyrocketed in the United States, from 1,700 jobs in 2016 to more than 10,000 in 2021 – a stunning 480% increase. But Black scientists, scholars, and researchers make up just 3% of the professionals who interpret data and analytics.
The Low Stats of Black Data Analysts
That dearth of perspective, says William Southerland, PhD, a professor of biochemistry and molecular biology at the Howard University College of Medicine and interim director of Howard’s new Center for Applied Data Science and Analytics (CADSA), is a constant opportunity for data to be misused regarding communities that have already been historically underserved and marginalized.
“If somebody is applying for a credit card or bank account, even a cell phone, they get a background check to see if they paid their bills last month, right? There are algorithms that look at the data that you provide and inform for the institution whether you’re a good risk or a bad risk. Those algorithms tend to lump people together, and the way they do isn’t always to the advantage of minority groups,” Southerland explains. “The person applying is impacted by data science because it’s the data science that analyzes their history. So we need to get people in the data science industry who understand the algorithms and can therefore make appropriate adjustments as needed to make them sensitive to a granular population.”
Percentage of professionals who interpret data and analytics who are Black.
Across post-secondary campuses in the U.S., the number of data science degree programs are on the rise, up from a paltry 13 in 2014 to more than 50 in 2022, nearly half of them either at the master’s or certificate level. In January 2022, Howard announced plans for its own master’s degree in applied data science, developed with a $5 million grant from Mastercard in partnership with CADSA. The wide-ranging research and instructional activities of CADSA will examine how data science can help eliminate racial and algorithmic biases in financial services and economic disparities that challenge and impede Black communities.
We need to get people in the data science industry who understand the algorithms and can therefore make appropriate adjustments as needed to make them sensitive to a granular population.”
One critical thing to remember about the field of study is how the interpretation of data and statistics is absorbed into our day-to-day lives, from whether we can get a mortgage to what type of health insurance we qualify for to whether our child gets into a school, adds Naniette Coleman, a PhD candidate at the University of California, Berkeley and founder of SICSS-Howard/Mathematica, the Summer Institutes in Computational Social Science’s first HBCU training.
“On a personal level all the way up to statistics for communities, every part of our world connects to data and how it’s fashioned, aggregated, and analyzed. It’s one of the more important issues of our time. With rapid, rampant misinformation, it’s difficult to trust the information that we come across. So having a skillset whereby you know how to interrogate what you see and also consider the origin story of data is critical,” Coleman says. “In addition to those decisions being made on our behalf with data, we have to be in a place where we can wield data in a way to make our points, especially when those points are not the current status quo or challenge existing systemic inequalities, sexism, or homophobia. So that skillset – having the ability to find, understand, analyze, and present data – is, I think, one of the more important superpowers.”
Howard Widens the Pie
Howard is preparing to introduce more future Black data scientists to what Glassdoor ranks as the third best occupation in the U.S. for the seventh year in a row. Funded by the National Institutes of Health, the University’s inaugural Virtual Applied Data Science Training Institute (VADSTI), an eight-week remote data science training series, introduced researchers from more than 20 schools nationwide – and some from other countries – to the foundations of programming and the critical data analytic skills they’ll need to plan and conduct big data research for themselves. VADSTI centers a biomedical, clinical, and genomic focus on diseases that disproportionately impact minority populations, and in 2022, the institute hosted a Summer session with another planned for Fall.
“We want to combine the technical aspects of data science with the social sensitivity and responsibility that you get just from breathing Howard’s air,” Southerland adds. “We want to create a cadre of data science professionals who will take this data science training and go back to their first discipline – whether that’s sociology, history, medicine, pharmacy, or English, whatever it is – and be agents for data equity.”
The ability to program and interpret data accurately is critical across disciplines. At Howard, business, medicine, public health, economics, and engineering have all added at least one requisite statistics course aspiring graduates must pass. Still, it’s important to engage data science as more than a cryptic math-and-numbers thing, experts insist. Data scientists apply predictive modeling, for example, to project the possibility of future events or outcomes using the analyzed patterns they find in data. Law enforcement has used predictive policing to identify – and in almost innumerable cases, target and harass – areas that correlate with a likelihood of illegal activity in the future. In hospitals and physician offices, predictive models based on patient records often guide clinical decision-making and can perpetuate statistical biases and inequities in the quality and access of care.
“Data is becoming more and more crucial serving as the basis for decision-making. Whoever is basing their strategies on data have more chances to prevail,” says Jiang Li, PhD, an associate professor in the Department of Electrical Engineering and Computer Science, who researches the importance of interdisciplinary studies in data science, adding that basic computer science is becoming a must-have skill to survive in the era of big data. In the process of data collection and analysis, he explains, “each step needs a different expertise.”
More Than Math
Collaboration across departments, disciplines, and fields is necessary for not only the validity of the data but to preserve the authentic connection to the community the data research is supposed to serve, especially when that community is Black, adds Kailande Cassamajor (BS ’01), a Howard alum pursuing a master’s degree in data science at Columbia University. In fact, she guarantees a problem in perspective absent a multidisciplinary approach to data collection and analysis.
We need to stop being perceived as a mathematics and computer science-adjacent discipline to being a discipline that requires the holistic view of learning and problem-solving.”
“The majority of people in my program are coming from math and computer science, but I notice a problem when that becomes the centerpiece and data is seen as the end-all, be-all fact without narrative context,” Cassamajor says. “It can only benefit when you have individuals from different backgrounds coming from history, Africanist studies, sociology, social work, public policy, political science. I really believe we need to stop being perceived as a mathematics and computer science-adjacent discipline to being a discipline that requires the holistic view of learning and problem solving.”
Because the future of data science is all-encompassing and far-reaching, the possibilities – as cliché as it may sound – are almost literally endless. Lessening the impact of climate change on low-income communities of color, reducing the disproportionate number of Black students suspended and expelled from school each year, even finding solutions for the blitz of pet owners returning their adopted dogs and cats to rescue shelters – all issues start with valid, credible data before they can be corrected and remedied. In a recent study, 75 executives were guided through an exercise to measure data quality in their respective departments. In the end, only 3% found those departments scored the minimum of “acceptable” in maintaining correct data records. Data is only as viable as the person and process doing the collecting and analysis.
“Context made me realize how powerful a tool data science and data-related work is, but also how important it is to make sure that we’re always tying people and real individuals in real occurrences behind that data, that it’s not just the quantitative analysis – it’s the context, it’s the stories, it’s the real lives that provide the data in the first place, and [reflecting that] in the work,” Cassamajor adds. “My experience at Howard really helped me come into this program with that perspective so that I can continue on with the goal of doing meaningful work that is purposeful.”
David Blackwell, a Data Science Architect
In the upper echelons of data science academia, David Blackwell is a bespectacled superhero. He dedicated his nearly 50-year career as an American statistician and mathematician to pioneering game theory, information theory, and probability theory, and made exceptional contributions to economics, statistics, and the study of what would become data science.
A historymaker, Blackwell distinguished himself early as just the seventh African American to earn a PhD in mathematics when, at 22 years old, he graduated from the University of Illinois at Urbana-Champaign, where he’d previously received both his bachelor’s and master’s degrees. His doctorate completed, the Centralia, Illinois native wrote letters to 104 historically Black colleges and universities in search of a teaching position. He received a total of three offers.
After a year at Southern and another at what’s now Clark Atlanta, Blackwell joined the faculty at Howard in 1944. Three years later, the two-time author was appointed head of the mathematics department. Blackwell made his academic home at Howard for 10 years before moving to the University of California, Berkeley, where he remained until his retirement in 1988.
In his legacy of work, Blackwell masterminded dynamic programming, the renewal theorem, and the Rao-Blackwell Theorem, which is used in finance and science, engineering and statistics, respectively. He was the first African American inducted into the National Academy of Sciences; held honorary degrees from 12 universities, including Howard, Yale, and Harvard; and in 2012, was posthumously awarded the National Medal of Science by President Obama.