Interviewed by Mark Couch
(April 2021) Casey Greene, PhD, joined the University of Colorado School of Medicine in November 2020 to lead the newly created Center for Health Artificial Intelligence and to help establish a new department devoted to data science and informatics.
He joined CU from the University of Pennsylvania Perelman School of Medicine, where he was an associate professor of systems pharmacology and director of the Childhood Cancer Data Lab for Alex’s Lemonade Stand Foundation.
Greene is an experienced leader in the field of data analytics. After completing his PhD in genetics at Dartmouth College in2009, Greene was a postdoctoral fellow at the Lewis-Sigler Institute of Integrative Genomics at Princeton University until 2012. He joined the Dartmouth faculty that year and moved to University of Pennsylvania School of Medicine in 2015.
His research lab develops algorithms that integrate publicly available data from multiple datasets to help model and understand complex biological systems. This approach allows investigators to infer the key contextual information required to interpret the data, and facilitates the process of asking and answering basic science and translational research questions.
What are the goals for the Center for Health Artificial Intelligence?
We’re awash in data. The challenge is figuring out how to use these data in ways that enhance our campus missions. I think about the goals of the center as enhancing research, practice, and education through advanced analytics. Each presents its own challenges and opportunities.
Research is a process of exploring the unknown. We plan and set out on a voyage, recording what we find, and what we find at each step changes our trajectory. It’s very much a fractal process. What we observe one step changes our next one, so even small hunches can send us off on a new directions. Serendipity is a key part of the process. I view the research mission of the Center for Health Artificial Intelligence as making serendipity routine: we develop the methods and tools to help investigators find unexpected but valuable connections in the sea of data that we’re confronted with. Traditionally, serendipity has depended on chance collisions: people talking in the hallway or finding just the right paper in the table of contents of a research journal. I think artificial intelligence methods can be deployed to reveal these opportunities in an intentional way, whether that’s helping to put our data into the context of everyone else’s data or if it’s revealing a key finding in the biomedical literature that, if only we knew it, would send our research program in a new and productive direction.
On the practice side, there are major opportunities to combine analytics with data that we gather as care gets delivered to improve processes and enable care teams to work more efficiently. There are providers on this campus who are already leaders in using data to improve care, and I expect members of the center to continue to complement that mission through analytical advances. There are also opportunities to use AI-based analytics to bring advances from research, for example in genomic profiling, to enhance care in the clinic.
With respect to our education mission, the pervasive nature of large-scale data means that many more people could benefit from applying these new analytical approaches. Even though we know the potential is great, it’s often difficult to figure out how to put them to use for the problems that we face every day. Data analysis works best when it happens close to the data, and that means that the people who are generating the data should be able to be thinking about it in these same ways. We need a multifaceted set of education programs – those that are designed for people for whom advanced analytics will be their primary career, those for whom these methods will complement their primary focus, and those who need to make decisions about analytics but who are unlikely to be running analyses themselves. I expect that faculty we recruit will contribute to these efforts for one or more of these audiences. If we’re going to achieve our potential on this campus, advanced analytical methods should be applied routinely and the capabilities should be available to everyone.
It seems hard to measure the outcomes, but you have a vast body of work so there are ways to measure this. How do you evaluate success?
You’re absolutely right that it’s hard to measure outcomes. In research, we can perform experiments that are guided by these analytical approaches and others based on traditional methods, and we can measure the hit rate of these. At the scale of a campus, what we really want to measure are items like: are advanced analytical approaches being deployed more frequently, are data being collected that can drive the next wave of these methods, and are these methods creating connections that otherwise wouldn’t have been observed. All of these are hard to measure.
What we can measure are proxies. We can examine the extent to which we’re recruiting faculty for whom these analytics are part of their research program, and we can examine their success in terms of both technical advances and scientific discoveries as well as progress in their career. We can measure education programs to examine learner perceptions and, if we’re careful about how we collect data, we can potentially examine outcomes after training. Ultimately, we can begin to measure connections by examining the extent to which faculty on campus submit and receive more multiple principal investigator grants and submit and receive more program project grants. We can look for indicators that connections are being established beyond individual research labs. In the ideal world, two people poised for success if only they join forces and would find each other and team up more often. I agree with you: that’s hard to measure.
Does AI pose risks to privacy and how can we protect individual identities?
Like many technologies, AI poses risks but can also present solutions. In some of our research a few years ago, we wanted to see if we could develop an approach for privacy-preserving data transformation using AI techniques. We used neural networks. Neural networks in the computer science sense are essentially just groups of mathematical functions that get strung together and trained over time through exposure to data. In this case we created two neural networks, and we trained them against each other. One of the neural networks was tasked with to creating entirely new data. The other was trained to try to figure out if the data are real or fake. We trained these pairs of networks until the real and fake data couldn’t be distinguished.
Because neural networks can be very complex, there was a risk that the neural network tasked with creating fake data could simply memorize the real data. We introduced the use of a technique called ‘differential privacy’ in this setting, which enabled us to control how much the neural networks could learn from any one record and prevent them from memorizing the data.
This is a long way to say: yes, AI and advanced analytical methods pose risks to privacy. I can see very clear opportunities for interactions between researchers focused on building new technologies and those focused on the ethical deployment of technologies in research. We also need to be thinking about how AI-based technologies can be deployed to reduce risks to privacy as well.
The other challenge that we haven’t discussed, but that falls along the same line is that of AI models that launder systemic biases. Often these machine learning models are trained to carry out some past behavior more efficiently, for example by training a model based on prior observations to suggest potential courses of treatment. If the training data is biased, the model will be too. It’s clear that AI-based techniques are going to become widely deployed in the years ahead. It will be critically important to have researchers on campus examining these models for bias and developing approaches to counteract, rather than promote, inequity.
Were you always into computers and technology when you were a kid? How did you get interested in this area of research and work?
There are pictures of me using the keyboard of a home-built computer from before the time I can remember, and I’ve always found them fascinating. When I was an undergraduate student, I enjoyed genetics. I worked in a Drosophila lab, and, while I enjoyed the scientific question, I struggled with the immediate task at hand which was counting sternopleural bristles on the sides of fruit flies. I’d look at the side of the fly, and count, ‘one, two . . .’ and flip them over and look at the other side and start again. I eventually reached my limit of sternopleural bristles.
I next spent a summer working with a Drosophila lab at the University of Georgia. Instead of working in the lab, they took advantage of my programming experience and had me work with a computational grad student. I thought, oh wow, I can study genetics… without counting bristles. This is amazing. So that’s what I have ended up doing ever since.
Do you have any favorite findings of work that you done?
I think the next finding is always the most fun one! One example that comes to mind came out of a collaboration with, Deborah Hogan at Dartmouth. We were just on a call this morning. For almost the last decade, we’ve been working with Deb’s lab to understand gene regulation in Pseudomonas aeruginosa. Near the time we started working together, there was group of researchers at Google who developed an approach to take still images from YouTube videos and show them to a neural network. They could block out certain parts of the video and train a neural network to reconstruct the original image. One of the things they showed was that the neural network developed a neuron capable of recognizing cats without ever being told what a cat was. So we did that, but for Pseudomonas gene expression data.
The neural network ended up learning patterns of co-regulated genes. Many were recognizable, but some were less so. We used the approach to examine the response to starvation for a key nutrient. When we looked across all public data, we identified one setting where the results just didn’t make sense: the presence or absence of a second gene, which wasn’t supposed to be related, made a huge difference in how Pseudomonas responded. It took many follow-up experiments to understand the details, but that “aha” moment doesn’t go away.
Have you done any COVID-related work?
We’ve certainly done a few things. A postdoc in our group has spent the last year leading a large-scale collaborative review of the COVID literature. It’s now more than 100,000 words covering more than 1,000 papers and preprints. It has essentially become a book! I also worked with a team that was working to put together a symptom self-reporting app. The idea was to use surveys to gain an early perspective on what the COVID situation was in each zip code – this was before testing was widely available. The app is called How We Feel. Lady Gaga actually tweeted about it. I think COVID has really revealed how urgent it is that we be able to accelerate the pace of biomedical research, and I see AI being a critical part of this in the years ahead.