Q&A with Data Science co-chairs John Bay and Anton Schick
BingUNews recently sat down with the co-chairs of the Data Science initiative, which is one of the four Road Map Renewal University Initiatives. Here’s an update of the initiative’s status:
Data Science Initiative
The application of data science theory and practice is rapidly expanding across almost every discipline in higher education. Recruiting faculty and building facilities to support this evolving area will enhance our research and education offerings across the campus.
Co-chairs: John Bay, associate dean for research and graduate studies at the Thomas J. Watson School of Engineering and Applied Science, and Anton Schick, professor of mathematical sciences
Project manager: JoAnn Navarro, vice president for operations
Can you explain what this initiative is trying to accomplish?
Anton Schick: The goal is to create an intellectual and physical infrastructure that addresses the many aspects of data science. This includes fundamental research in the core sciences underlying data science, such as statistics and computer sciences, as well as the study of the societal impacts of the ongoing data revolution. It requires venues to train domain researchers in modern data science techniques, to bring together domain researchers with colleagues possessing needed technical skills in statistical and computational analysis, and to form teams to deal with the transdisciplinary nature of data science. It demands a proper computing infrastructure for high performance computing, data storage and retrieval, and data security. It encompasses the training of future data scientists by creating appropriate degrees at all levels.
Why now?
John Bay: We currently have a committee working on a proposal for a Transdisciplinary Area of Excellence (TAE) in Data Science. It includes geographers, historians, electrical engineers, mathematicians, statisticians, biologists, anthropologists, political scientists and management faculty. Nobody is left out because although some people work in data science due to the technical issues that they study, the diversity of our working group demonstrates that if you’re a researcher in any field nowadays, you have web access at your fingertips of every bit of research published since 1900, and you need a way to make sense of all that. That’s why it doesn’t matter what your discipline is. You have to be able to reduce the data down to what is useable and we’re working to address that.
AS: Nowadays, in order to be funded, grant proposals in most fields need to address aspects of data science including statistical design, data analysis, data storage and computational complexities. The problem is, do domain researcher know what expertise is available on campus and how to find it? The recently created Statistical Consulting Services can play a role in this, but needs to be expanded significantly. The Data Salons organized by the working group of the proposed TAE in Data Science are another important step in this direction by bringing together researchers with diverse skills. What is needed is the creation of a structure that facilitates finding out about available resources and how to access them.
What is the current status of the initiative?
JB: It’s only been since October that we got started and we’ve come a long way. With the help of the working group, we’ve defined what the problems are and what the research areas are. I feel like we got a head start because of the efforts of the working group, though we are separate teams.
Part of what we’re doing with this initiative is using some of the seed funds to implement a service. Say a psychologist conducts interviews with 10,000 people. How can that scientist make sense of the data in a statistically meaningful way? We’ll provide the expertise and support. Initially this will be done with one-time funds, but eventually, we’ll have permanent consultants with that expertise.
AS: We already know who is interested, but we also need the proper computing environment with facilities. And we hope to form a bigger working group with smaller parts. There is good energy here with people feeling that this is important, and this energy will help in moving forward.
JB: We’ve already established a set of resources that people can access and the working group recently gave out small seed grants just to get the ball rolling and get people involved. And we already have the seminar series that we’ve been contributing to.
There are computing tools and hardware to be bought. We also found out that there are security issues when dealing with patient records, HIPAA (Health Insurance Portability and Accountability Act). For example, one of our faculty members has electronic records concerning children that can’t be on a system that is connected to the internet.
AS: Not everyone is in this situation, but for those who work with medical information or human subjects, security becomes an issue. And certain things are going on now with the proposed TAE. People are meeting to discuss these issues, so we’ll go in various directions at the same time. A data park may be part of discussions because we need security and lab space that is locked.
JB: We plan to talk about space and we have subcommittees looking at different aspects including one for infrastructure. There are NIST (National Institute of Standards and Technology) and agency requirements, such as two types of locks for the room and other security constraints. In addition to the infrastructure subcommittees, we have ones for research and the TAE working group will be a large part of the research component. There is also one for industrial outreach for service to local companies, economic development and data, and also one for educational degree programs. There is a student Data Science Club on campus with 300 members. Some say we’re at the point where all students should have to take some sort of computer programming course, and it could be that data analytics goes the same way.
AS: We need to create degrees at all levels. We have an MS in data analytics in process but there is a clear demand for additional degrees so we need to develop them.
JB: This creates big challenges though. How do we develop a graduate degree program with a common set of prerequisites when you have a computer scientist sitting next to a pharmacist?