Dual Degree Program Opens New Horizons for the Next Generation of Scientists
Today, the entire body of digital data in the world exceeds 44 zettabytes. That’s roughly equal to all the data contained in the Library of Congress if it were three billion times its size.
While big data was once considered the next great frontier, the tools available to tame such overwhelming amounts of data are making it possible to manage and explore a resource so vast it was unimaginable just a generation or two ago. However, there is still a shortage of researchers with the deep analytical skills necessary to use that data to make meaningful new discoveries and important scientific advancements.
For those driven to be pioneers in this new age of exploration, the University of Virginia’s Graduate School of Arts & Sciences and its School of Data Science have partnered to create a dual degree program. The new degree allows Ph.D. candidates to combine research in their field with the increasingly essential data-management skills that an M.S. in Data Science provides.
By adding 11 months and approximately 32 credit hours to their graduate studies, the program allows students to expand their research capabilities using state-of-the-art tools and methods for predictive modeling, data mining, machine learning, artificial intelligence and data visualization that are applicable in almost any field of study.
Students in the program are using these skills to make significant strides in a wide variety of endeavors, from understanding the complexities of the human brain, to discovering innovative ways to restore damaged ecosystems, to reimagining the shape of our Galaxy.
“Data management and analysis are increasingly becoming required training for scientists, as much of the most exciting research and studies that push the boundaries of our knowledge create huge data sets,” said Commonwealth Professor of Biology Laura Galloway, who serves as associate dean for the sciences at the College. “This is true across fields, from genomics to telescope survey data, to image files of the brain and development, to chemical models that predict the structure of new pharmaceuticals.”
“These fellowships give students the opportunity to bring data science skills to their research in ways that were just not there before, and that speaks to the future of that research,” added Phil Bourne, the founding dean of the School of Data Science at UVA. “Conversely, the expertise those students bring to the broad field of data science can also be valuable to other disciplines. It's a bi-directional relationship that [explains why we call ourselves] a school without walls.”
Machine Learning in the Lab
At a certain point after he began working toward his Ph.D. in chemistry, Yibo Wang realized he needed to teach himself data science.
Wang studies thin layers of bacteria — called biofilm — through a microscope, trying to better understand why bacterial cells form such complex structures. Key to that process is image analysis, he said. “Traditionally, that was done by mathematical image processing techniques,” he explained. “But nowadays it’s through machine learning.”
So Wang started to explore the field of data science. “I found it quite interesting, and the techniques themselves are very powerful,” he said. “I wanted to learn more.”
The dual degree program helped Wang build on what he had taught himself and develop “a separate expertise,” he said. He was even able to apply what he learned to a study he co-authored, which was published in the acclaimed science journal Nature Communications in 2020. The study outlines a new method for teaching computers to recognize individual cells within images of biofilm.
Wang refers to himself not as a data scientist, but as a scientist practicing data science, and he said he’s excited by the field’s potential to advance disciplines like his.
“Traditional scientists often don’t know much about machine learning and deep learning techniques,” he said. “But if you have data science expertise, you can do so much more.”
Decoding the Brain
For his capstone project as a dual degree fellow in cognitive psychology, Andrew Graves focused on basic human gestures, like spreading your toes and making a fist. Using machine learning, he took brain signals and translated them into prediction models, trying to determine what a person’s hands and feet were doing at any given moment.
“The idea is that [these models] could augment patients’ lives, if they’re used in something like prosthetic limb technology,” Graves said.
When Graves began his Ph.D. in psychology at the University of Virginia, he found himself becoming increasingly interested in statistics and programming — two fields he had never explored in depth. That prompted him to apply for the dual degree program with the School of Data Science.
“I'm interested in brain-computer interfaces, and that's very much a machine learning problem,” he said. “Data science is one of those things that psychologists don't receive a lot of training in.” So when he was accepted to the program, “I felt very fortunate,” he added.
Although Graves is intrigued by data science’s potential to advance the field of psychology in areas such as mental health, after completing his fellowship, he is now considering a career shift. Once he finishes his dissertation, he will likely pursue a data scientist position.
“I've just become fascinated with data science and machine learning in general,” he said. “I really enjoy that type of work.”
Saving the Environment with Data
Contrary to what people outside the field might believe, “data science is not limited to statistics,” said Ruoyu Zhang, a Ph.D. candidate in the College’s Department of Environmental Sciences. “It’s about so much more than that.”
For Zhang, it’s about restoring ecosystems. As an ecohydrologist, Zhang spends a lot of time creating simulations, and he said the dual degree program has given him fresh methods to use. For one chapter of his dissertation, for example, he’s now planning to have a neural network simulate where trees might be planted to help remove nitrates from water in the Chesapeake Bay region.
Neural networks have many uses in Zhang’s field. “There are a lot of people using them to forecast floods,” he said. “Rather than predict the flood at a daily scale, they want to do it at an hourly scale,” he said. It’s harder to make those kinds of finely tuned predictions with traditional models, he explained: “That’s why a lot of people are exploring alternative data science approaches, to get better accuracy.”
Zhang hopes his research can help local decision-makers with a range of environmental efforts, like building green infrastructures and restoring streams. Moving forward, “I want to keep using machine learning to advance my research on improving water quality and [addressing] flood issues,” he said — “especially in the climate-changing future.”
Seeing the Stars from a New Perspective
You can’t take a picture of a whole galaxy from inside it. To construct an image of the Milky Way in its entirety “would take millions of years,” said second-year astronomy Ph.D. student Xinlun Cheng. “It’s not going to happen.”
So instead, Cheng is collecting data about the individual stars within our galaxy: the elements inside them, their positions and the speed at which they’re moving. Ultimately, “we're trying to say, okay. What does our galaxy look like? What did it look like in the past and what will it be like in the future?” he explained.
The data science techniques Cheng learned in the dual degree program, like machine learning and expression statistics, have been very helpful as he attempts to answer that question, he said — given that the Milky Way has millions of stars, he is working with massive data sets.
But astronomy isn’t the only discipline where that’s the case. “We're gathering more and more data in every field,” Cheng noted. He believes the ability to analyze data at a more advanced level is essential, regardless of what you’re studying.
“Data science shows us new sides of nature and ourselves,” he said. “It’s important to pushing every field forward.”