Seed Grant Program | Data Science | Binghamton University

Seed grants are awarded with funding provided by the Binghamton University Road Map through the Provost's Office and the Division of Research.

The goal of these seed grants is to encourage faculty to develop collaborative projects that stimulate the advancement of new ideas that can build Binghamton University's expertise toward a national reputation in the broad area of data science. This competitive, peer-reviewed program is providing initial support for proposed long-term programs of collaborative research that have strong potential to attract external funding.

Information on how to apply for seed grant funding for the 2025–2026 academic year can be found on the TAE Seed Grant Program landing page. The deadline for proposals is 5 p.m. Friday, February 14, 2025. Completed proposal packages must be submitted electronically to the Binghamton University Internal Opportunities Portal ("InfoReady") platform.

Deadline for Data Science-TAE Letter of Intent, was December 16, 2024 and has passed.

The Transdisciplinary Area Of Excellence (TAE) invites letters of intent (LOI) for proposals for the 2025-26 TAE Seed Grant program. All proposals with a requested budget exceeding the common $15,000 limit must submit a letter of intent (LOI). It is strongly recommended that all teams submit an LOI regardless of the budget amount.

Completed LOI packages must be submitted electronically to the Binghamton University Internal Opportunities Portal ("InfoReady") at this link: Data Science-TAE LOI: https://binghamton.infoready4.com/#freeformCompetitionDetail/1956635

For the 2024-2025 academic year, the following seed grants were awarded, listed below.
Additionally, a "Thematic Programs" initiative was introduced; see the Thematic Programs at this link:
https://www.binghamton.edu/transdisciplinary-areas-of-excellence/data-science/call-for-ideas.html

Rural Romans: Modeling the Mobility of Ancient Romans using Biomolecular, Bioarchaeological, and Machine Learning Techniques

Matthew V. Emery, anthropology; Laure Spake, anthropology; Yuan Fang school of pharmacy/ pharmaceutical sciences; Michel Shamoon-Pour, first year research immersion, molecular anthropology

Traditional methods of investigating ancient migration patterns have relied on the analysis of the historic/epigraphic record, artifacts and grave goods, and the stable isotopic signatures obtained from bones and teeth. These approaches, while informative, offer limited insights into the complex patterns of migration and interaction in ancient societies, especially at the individual level. Our proposed research seeks to overcome these limitations by applying unsupervised machine learning (ML) techniques to synthesize data from genomic sequences, isotopic analyses, and bioarchaeological evidence. Focusing on Classical period sites at Vagnari (southern Italy), Leptiminus (Tunisia), and Apollonia (Bulgaria), this study will use ancient DNA analysis to explore ancestry and integrate these findings with isotopic data to infer geographic origins and mobility patterns. Our research aims to provide novel insights into the movements and interactions of ancient Mediterranean individuals by combining these diverse data sets. Our interdisciplinary project aligns with the transformative goals of data science, bridging the gap between the social and natural sciences, and showcases the potential of ML to unlock new perspectives in archaeological research.

Digital Echo Chambers: Parasocial Interactions and Their Role in Radicalizing Narratives

Yu Chen, electrical & computer engineering; Seden Akcinaroglu, political science; Thi Tran management information systems; Ekrem Karakoc political science

Social media is critical in disseminating information, particularly in circumventing censorship in non-democratic regimes, underscoring its capacity to empower movements. The complexities of social media's influence on radicalization, public attitudes, and behavior are evident in high-stakes conflicts, highlighting the pervasive nature of prejudicial messaging. Despite efforts by scholars, government agencies, and social media platform leaders to address these issues, the effectiveness of such initiatives is debatable, with varied social impacts observed. Leveraging the cross-disciplinary expertise of the investigators, including information authentication, pollical sciences, and machine learning (ML), this project seeks support to explore these dynamics, focusing on the recent Hamas-Israel conflict as a case study of international political tension. Based on the two theoretical pillars of social media’s echo chamber effects and the six components of the theory of influence, our work will unbox the underlying mechanisms of the flows of spreading and consuming hate speeches and extract insights to optimize efforts that minimize harm from digital hate speeches effectively. The success of this project will obtain a better understanding of the complex and dynamic characteristics of radicalizing narratives on social media and prepare the team to submit solid proposals to federal agencies. i.e., NSF and AFOSR.

Advancing a statistical framework for the design of optimal experiments to evaluate formal psychological models of human category learning

Rakhi Singh mathematics and statistics; Kenneth Kurtz, psychology

In cognitive science, a key aim is understanding how agents (humans and machines) form concepts from labeled examples and employ these concepts to interpret stimuli and make inferences beyond the available information. Currently, diverse theoretical perspectives exist, yet research often lacks systematicity; instead relying on somewhat arbitrary comparisons of models. To address this, we propose a statistical framework for designing optimal experiments to pinpoint the most effective cognitive models and machine learning classifiers and their optimal conditions. Successful implementation would galvanize research in computational cognition theory and practical applications by: promoting successful development and guiding selection of machine learning classifiers for prediction tasks, clarifying underlying design principles of human information processing, and providing a platform for the design of optimized training procedures for inductive learning in instructional settings.

Dual Optimization of Efficiency and Security of Modern Deep Learning Framework Running on FPGA Platform

Adnan Siraj Rakin, computer science; Wenfeng Zhao, electrical and computer engineering

Ubiquitous adoption of state-of-the-art deep learning (DL) models encounters unique challenges from both efficiency and security perspectives. Emerging multiplication-less deep learning models provide a feasible pathway to improve DL accelerators' hardware and energy efficiencies. Despite promising results, the security aspects of multiplication-less deep learning models still need to be explored. This TAE project seeks to delve into interdisciplinary research to establish the foundation of secure and efficient AI. In particular, our transdisciplinary team proposes to investigate emerging multiplier-less deep learning models, namely AdderNet, by adopting a software-hardware co-design approach. We focus on designing optimal hardware accelerators, uncovering the potential security loopholes, and studying promising countermeasures to enable secure and hardware-efficient AdderNet designs. This project possesses the potential to address the distinctive challenges in both secure and efficient AI, diversify the research landscape, and contribute to advancing the mission and vision of the Data Science TAE at BU. The proposed research will bring two PIs from transdisciplinary areas and provide the ideal platform to collaborate on future grant applications, which, in the long run, has the potential to transform AI security and efficiency research.

For the 2023-2024 academic year, the following seed grants were awarded:

Advanced data analytic approach toward enabling autonomous nanofabrication platform

Zimo Wang, systems science and industrial engineering; Rakhi Singh, mathematical sciences and statistics; Jia Deng, systems science and industrial engineering

The trend of rapidly shrinking electronic equipment, as Nano-electro-mechanical systems (NEMS) evolved from MEMs, has driven the capability of nanofabrication. The nanofabrication realized by the atomic force microscope (AFM) has been rapidly growing and gleaning popularity among researchers and industry because of its cost-effective nature and exquisite tunability. However, existing AFM-based nanofabrication has critical issues toward large-scale production due to the requirements of extensive experimentation for parameter searching and the need for real-time morphology characterizations. This is because, under the nanoscale, process uncertainties play a significant role in determining surface quality. However, conventional deterministic models fail to capture such nonstationarities, necessitating the implementation of machine learning approaches for autonomy-enabled cyber-physical nanomachining. This project aims to establish an autonomous nanomachining platform with a demonstration of the atomic force microscopy-based (AFM-based) nanofabrication process. The proposed task promotes the implementation of data science in nanomachining by 1) acting as the cognizant core to enable autonomous nanopatterning and 2) providing an analytic tool for understanding the profound mechanisms of the machining process in nanoscale, and 3) achieving self-learning for the platform to advance the setup capability, i.e., increasing nanofabrication resolution/precision and significantly improving the productivity.

A physics-informed machine learning approach to athletic footwear fit estimation

Congyu Wu, systems science and industrial engineering; Yu Jin, systems science and industrial engineering; Vipul Lugade, physical therapy

The design of sportswear and specialty occupational garments has largely failed to duly incorporate female anatomy and female-specific kinesiological patterns, resulting in higher risk of injuries for women. In this project, we propose to use running shoes as an example case to study gender differences in sportswear design. Specifically, we aim to (1) identify the differences in female participants’ self-reported fit and sensor-measured kinesiological patterns (plantar pressure, standing balance, walking/running pronation) wearing commercially available women’s vs. men’s running shoes, and (2) establish the feasibility of using hybrid physics-based machine learning to ascertain the relationship between physical measurements (e.g., foot characteristics, shoe design parameters) and kinesiological patterns. Under these aims we will gather currently limited quantitative evidence on women’s experience with commercially available footwear advertised for different genders and evaluate a novel data-driven solution that automatically estimates women’s kinesiological patterns based on physical measurements of the foot and the shoe, a relatively inexpensive and convenient source of data. Both aims are necessary as a preliminary step to establish a research framework that incorporates gender differences in the engineering design of sportswear or occupational garments, which goes beyond running shoes.

For the 2022-2023 academic year, the following seed grants were awarded:

Data-Driven Models of Polarization Dynamics over Realistic Networks

Emrah Akyol, electrical and computer engineering; Zeynap Ertem, systems science and industrial engineering; and Andreas Pape, economics

We propose a data-driven framework to quantify and analyze polarization dynamics within a community, as well as to construct methods to mitigate polarization. As a byproduct, we will generate a computational model of (mis-)information spread over social networks that takes the polarization problem specifics into account. Two features that substantially differentiate the proposed research from the recent studies are: 1. We consider the impact of relevant behavioral aspects of human communication such as confirmation bias, and 2. We take data-driven, real network scenarios, including dynamic, multilayer networks, whose parameters are inferred from real-life data.

Machine learning and computational assisted nonlinear ptychography for ultrahigh sensitive phase imaging

Kenneth Chiu, computer science; and Fake (Frank) Lu, biomedical engineering

This project will develop machine learning and computational methods for a new type of bioimaging that combines ptychography with Stimulated Raman Scattering (SRS). The resulting method will be able to detect the chemical/molecular structure within a single live cell without the use of additional stains or dyes, something that is currently unachievable. Such specimens are effectively transparent, and thus cannot be imaged with conventional techniques. Ptychography is a quantitative phase imaging technique that uses phase differences to image structure in otherwise transparent specimens. SRS is a nonlinear optical technique that selectively stimulates scattering from different chemical bonds, and is thus able to differentiate different molecules within cellular structures. Used together, these two modalities have the potential to provide an unprecedented level of detail of cellular structures. Fully realizing the potential, however, requires high-throughput identification (segmentation) and quantification of the cellular structures of interest, which are difficult and impractical for humans to do manually. In this project, we will apply deep learning and parallel computing techniques to enable high-throughput identification and quantification of non-linear, chemically-sensitive ptychographic imaging.

Longitudinal home-based assessment of balance in older adults

Vipul Lugade, physical therapy; Suzanne O'Brien, physical therapy; and Lijun Yin computer science

With the world’s elderly population rapidly increasing, falls among older adults are a significant health concern. Approximately one-third of older adults report one or more falls each year, with devastating physical, psychological, social, and financial consequences. Resulting fractures, reduced activity levels, and loss of function lead to greatly altered lifestyles, morbidity, and mortality. As falls routinely occur during locomotion, gait analysis has typically been utilized to assess balance control during ambulation. Conventional evaluations performed in a controlled laboratory environment utilize sophisticated measurement tools such as motion capture systems, instrumented walkways, or body-worn monitoring. Unfortunately, such tools are bulky, time-consuming, dependent on participants travelling to a research setting, and reliant on qualified technicians to properly collect and evaluate the data. Furthermore, these one-time gait and balance assessments cannot monitor longitudinal changes in ambulatory strategies and do not reflect performance in real-life environments, where falls commonly occur. While recent developments in smartphone-based evaluations have demonstrated great utility and accuracy in assessing gait, it is vital to evaluate participant compliance, ease-of-use, and feasibility of smartphone technology in the home environment. With the current burdens on the health care system and the burgeoning population of older adults, it is essential that tools be provided to older adults that are easy to follow, attractive, and improve balance performance.

Therefore, the objective of this project is to investigate the utility of a valid, easily accessible, smartphone-based tool to monitor balance for older adults as a stand-alone, field-based medical device. The aims of this proposal are to: 1) utilize a smartphone application to longitudinally evaluate gait, standing balance, and cognitive performance over a 12-month period in the home environment among 15 healthy older adults and 15 older adults with a history of falls; and 2) to assess fall outcomes in older adults for up to one year. An evaluation of compliance, fidelity, and predictive ability of psychosocial, muscle strength, and balance measures to fall outcomes will indicate the feasibility of using a smartphone-based protocol for assessing older adults in their home environment. The long-term goal of this project is to provide a holistic home-based gait monitoring and intervention tool for integration in routine clinical evaluations.

Accurate Characterization of the Heterogeneous Stiffness Map of the Human Brain White Matter

Mir Jalil Razavi, mechanical engineering; and Dehao Liu, mechanical engineering

The objective of this proposal is to accurately characterize and quantify the white matter stiffness map of the human brain. Finding localized stress/strain or stiffness maps in the white matter are of notable importance to many applications such as traumatic brain injury (TBI), diffusive axonal injury (DAI), brain tumor, and neurodegenerative brain disorders such as Alzheimer’s and Parkinson’s. Magnetic resonance elastography (MRE), as a current method to quantify the brain stiffness map, suffers from serious limitations which results in contradiction between the reported regional and global stiffness maps of the white matter. To fill the existing gap, this proposal innovatively aims to characterize and quantify the heterogeneous stiffness map of the white matter in the human brain by combining physics-based mechanical modeling and data-driven deep learning. For the first time, the heterogeneous stiffness map of the brain white matter will be characterized by using the independent material properties of the brain microstructure and available data from magnetic resonance imaging (MRI), diffusion tensor imaging (DTI), and fiber tractography of brain samples. The successful completion of the study will produce a promising pipeline as an alternative for MRE. The outcomes of the study will directly benefit the brain research community with various applications in TBI, DAI, normal aging, neurodegenerative diseases, and neurosurgery.

A Crowdsourcing Application for Historical Health Crises in Pre-Modern Mexico

Bradley Skopyk, history; and David Mixter (environmental studies and anthropology

This project creates a crowdsourcing application to build a historical database of geo-located health events for Mexico before the twentieth century, adding to an existing database of some 7,000 known pre-modern Mexican health events. Knowledge of historical health events is important to understand health outcomes and predictors in the pre-modern world. A larger and more evenly distributed set of records would enable geo-statistical tests against paleoclimatological and other datasets to explore correlations between pre-modern health and socio-ecological phenomena. The crowdsourced database will facilitate international collaboration on a data-intensive health-related project and will be an important teaching resource for undergraduate classrooms. The crowdsourcing application uses an Angular frontend with Spring Boot microservices connected to a postgresql database. In a subsequent phase of this project, the crowdsourcing application and database will be used to feed a website for data visualization and summary of key health crises in pre-modern Mexico. The same web portal will make the database freely available to researchers for purposes of analysis, download, and output of custom cartography and other visualizations. We believe that the proof-of-concept shown by the crowdsourcing application will attract external funds from the National Endowment for the Humanities and the American Council of Learned Societies to build such a sophisticated data-driven website.

A Multimodal Data-Driven Approach to Improving the Effectiveness of Virtual Group Collaborations for Entrepreneurship Development

Chou-Yu Tsai, School of Management; Cynthia Maupin, School of Management; and Hiroki Sayama, systems science and industrial engineering

The number of entrepreneurship programs worldwide has increased significantly with the advancement of computer information technology and the COVID-19 pandemic. This rise forces entrepreneurship educators to overcome challenges associated with technology-mediated communications that limit social participation and interactions that substantially constitute students’ entrepreneurship development. In the current proposal, we plan to promote inclusion in virtual group collaborations to enhance the quality of entrepreneurship development. Specifically, we investigate this objective via two studies: (1) we will apply a multimodal data-driven approach to unveil the mechanisms of virtual group collaborations, and (2) based on the results from the first study, we will implement an experimental design to explore the impact of different virtual tools on entrepreneurship development. We will leverage the insights gained from the multimodal and experimental data to develop strong external grant proposals (to be submitted to two NSF programs: Human Networks and Data Science and Network-to-Network Collaborations).

Diagnosis and evolutionary characterization of solitary pulmonary nodules via Machine- Learning enabled liquid biopsy and automated CT techniques

Yuan Wan, biomedical engineering; and Kenneth Chiu, computer science

The detection of malignant lung nodules (MLN) at an early stage is crucial, which can significantly improve patient’s survival rate. Currently, computed tomography (CT) imaging is a major approach used for lung disease screening. However, the differential diagnosis of MLN from benign lung nodules (BLN) is very challenging because both may demonstrate similar size and morphology. In general, the misdiagnosis rate can reach up to 10-30%. Moreover, depending on radiologist’s experience and skill, the reading quality of CT images is not consistent either. Therefore, we propose to develop a program that can assist radiologist reading CT images. It is noteworthy that traditional machine learning may not be able to identify and differentiate man-crafted features. Deep learning includes feature selection and classifier optimization, and thus has been widely used for medical image segmentation and detection of objects. In this study, we proposed a deep learning-based lung nodule classification for CT images. There are 2,765 total lung nodules (with subset 1 being 616 BLN and subset 2 being 2,149 MLN). Each subset was divided into training, validating, and testing datasets (according to the ratio of 60%: 10%: 30%). For each nodule, we picked one 2D slice going across the middle of nodule as the representative sample. To classify lung nodules, we first designed a UNet-like neural network for extracting the region of interest (ROI) of the lung nodule, and then the extracted ROI was used as an input for the classification network. The output from the classification network was the binary value of either MLN or BLN. Transfer learning was used here. A pre-trained GoogleNet was adopted and then was re-trained 300 times on our data sets for the purpose of lung nodule classification. Currently, our classification network can achieve an accuracy of 0.87 with a false positive rate of 0.14 and false negative rate of 0.10. Results show that deep learning can work well for lung nodule differential diagnosis. The classification performance can be further improved by optimizing the architecture of neural networks in the future.

For the 2021-2022 academic year, the following seed grants were awarded:

Deploying ArcGIS Server for Collaborative Geospatial Data and Open Maps

Lead scientists: Brad Skopyk, history; Carl Lipo, anthropology and environmental studies; and John Cheng, Asian and Asian American studies

This project builds new geospatial data infrastructure on campus by adding to our existing resources in ArcGIS Online and desktop applications like ArcGIS Pro. We will create an ArcGIS Server that will allow teams of researchers—located on campus or dispersed across the world—to simultaneously edit and add spatial data to the same database, while still maintaining the integrity of the data. The ArcGIS Server also makes available many other new computational capabilities, such as OGC-Compliant web map services and a customized portal for collectives of scholars working with spatial data. The first allows researchers at Binghamton University to create OGC-Compliant web map services integration into open-source web platforms. The second offers a portal to display and make available the web mapping services created by scholarly collectives on campus. A curated grid of such services will be on display for visitors to the university. The Portal for ArcGIS Server gives us the ability to customize web design and to add any necessary branding. We hope that the ArcGIS Server will undergird many collaborative projects within and beyond our campus.

An integration of physical simulation model and data-driven approaches on an Internet of Things platform for the realization of metal additive manufacturing digital twin

Lead scientists: Yu Jin and Fuda Ning, both systems science and industrial engineering

The objective of this project is to investigate the communication and prediction challenges and establish a fundamental framework for the real-time digitalization of a physical metal additive manufacturing (AM), or 3D printing system towards a digital twin (DT). To realize this objective, two research tasks will be performed in this project, including 1) establishing an Internet of Things (IoT) platform of a metal AM process, directed-energy deposition (DED), by connecting all the manufacturing and sensing equipment through the internet and integrating the various data sources on the cloud, and 2) developing an algorithm for sophisticated integration of physical simulation model and data-driven approaches especially for the geometric integrity and microstructure prediction. The proposed IoT platform integrates the AM resources within the university for supporting future research about the real-time monitoring and control of AM processes. Moreover, the IoT platform, as well as the algorithm for integrating different knowledge bases, will also be strong support for seeking external resources to finalize the steps towards DT and broaden the impact of metal AM in the era of industry 4.0 with more reliable performance in the cyber domain.

Mechanics of Brain Folding: Predicting the Cortical Folding Patterns of the Human Brain

Lead scientists: Mir Jalil Razavi, mechanical engineering; and Guifang Fu, mathematical sciences

The objective of this project is to discover the roles and hierarchy of the key parameters on the modulating morphology of the developing human brain. Currently, there is no available study in the comprehensive human brain scale to show how the interplay between different determinant factors, i.e. differential tangential growth and neural wiring, regulates the folding patterns in the developing human brain. Accordingly, there is a critical need to discover the role and hierarchy of the mechanistic parameters on the formation and modulation of brain folding patterns. The lack of knowledge of the physical interplay between cortical folding and its contributors is a critical barrier to a fundamental understanding of the relationship between cortical folding, brain connectivity, and brain function in different neurodevelopmental stages. To fill this gap, image-based computational simulations for the growth and folding of the brain accompanied by a machine learning technique offer a fundamentally novel method to predict the folded morphology of the human brain from its smooth fetal morphology. Outcomes of the proposed study are expected to have an important positive impact on the deeper understanding of cortical folding and its morphogenesis that is the key to interpreting the normal development of the human brain during the early stages of growth.

For the 2020-2021 academic year, the following seed grants were awarded:

Physics-Guided Machine Learning for Quantum Mechanical Problems

Lead scientists: Wei-Cheng Lee, physics; and Kenneth Chiu, computer science

The objective of this project is to develop fundamentally new paradigm of machine learning, namely the physics-guided machine learning (PGML), and to employ this new paradigm to assist theoretical physics research that has limited amount of data due to the high demands in computational resources. In the past years, while we have witnessed the great success in applying machine learning methods to a number of important commercial problems such as image and voice recognitions, the impact of machine learning on the scientific discovery remains limited. The key reason is that traditional ML methods rely solely on data and ignore existing scientific knowledge, which makes them susceptible to overfitting and may slow down learning. Overfitting is learning spurious relationships that do not generalize well outside the data they are trained for, and this problem is even worsened with data scarcity. To overcome these challenges, PGML is designed to incorporate guidance based on physical constraints from scientific knowledge into the learning process, enabling novel ML methods for more accurate predictions and better generalizability even with data scarcity. The framework of PGML developed in this project can be applied to other research areas in which the physical constraint can be written in a well-defined equation.

Data Science Modeling for Shape and Genome Data

Lead scientists: Guifang Fu, mathematical sciences; and Lijun Yin, computer science

Shape variation reflects both a response to, and a source of, natural selection. As shape has been reported to have high heritability, unraveling genetic mysteries of shape has attracted a lot of attention in various disciplines. Newest cutting-edge genotyping technologies dramatically revolutionize the landscape of contemporary Data Science research. With the number of single nucleotide polymorphisms (SNPs) increasing from thousands to millions, Shape-GWAS represents one of the newest but largely unexplored research directions because of a series of “double big data” challenges that could not be fully addressed in existing literature yet. Prevailing approaches either described shape by a loose number of landmark points, or individually modeled each SNP isolating others, and hence greatly limited their potential of new findings. The overarching objective of the proposed research is to develop novel statistical models that will provide ground-breaking methodological support to Shape-GWAS data analyses. The novelty includes new methodologies, new applications, newly collected human face datasets, and a new collaboration team. Successful accomplishment of this research will bridge the gap between theory and application, and ensure that data analytical strategies keep pace with high-end technologies that generate datasets, while boosting the progress of multidisciplinary collaborations.

Understanding and Predicting Chronic Absenteeism from School in Autism Spectrum Disorder: A Data-Driven Approach

Lead scientists: Daehan Won, systems science and industrial engineering; and Jennifer Gillis, psychology

Absenteeism from school is a serious public health issue for educators, mental health professionals, and families. Also, chronic absenteeism, which is defined as missing 10% or more of school days due to absence for any reason, makes it hard for a student to keep pace with school. Although there is much attention to chronic absenteeism (CA) at the middle and high school levels, recent studies indicate that chronic absenteeism is even more severe in pre and elementary school. Also, children with developmental disabilities are far more likely to be missing many schools, and they have more than two times higher chronic absenteeism than typically developing children (TD). However, school absenteeism with autism spectrum disorder (ASD) has received less attention, with only a small number of studies only examining older children with higher cognitive abilities and ASD. The proposed project will be the first study to analyze the school absenteeism in children with lower cognitive abilities while assessing their time-dependent behaviors and school performance, which is still in the unexplored area. Through the data-driven approaches including artificial intelligence, this work revolutionizes knowledge of how students with disabilities attend the schools, the relationship of in-school activities, and introduce predictive modeling to provide early intervention for improving attendance and educational outcomes.

Learning-aided Distributed Anomaly Detection in Internet-of-Things

Lead scientists: Jian Li, electrical and computer engineering; and Ping Yang, computer science

The Internet-of-Thing (IoT) refers to the networked interconnection of everyday objects such as physical devices, sensors, and home appliances, which are often equipped with ubiquitous intelligence. The advent of smart environments and a massive number of devices connected through IoT have led to an unprecedented generation of large amounts of data. The environment evolves rapidly and the capability of IoT systems may be degraded. Therefore, an IoT system must be able to quickly identify environmental changes, and actively adapt to avoid any disruption in its function and performance so as to be resilient to adversarial perturbations and robust to uncertainty. Detecting an abrupt change from data collected from nodes in IoT systems has been a fundamental problem in various applications, such as fraud detection and environmental monitoring. Sensors in IoT systems are observed sequentially with time, and an anomaly may occur at any time. State-of-the-art systems use a centralized approach with a fusion center gathering information from all nodes for decision making, which is inefficient and hard to implement. In addition, the fusion center is the single point of failure of the entire system, which makes the centralized approach even less feasible. This proposal aims to address these issues by developing real-time learning-aided distributed anomaly detection algorithms that are mathematically well-founded and robust in an adversarial environment.

For the 2018–2019 academic year, the following seed grant was awarded:

Using Data Science to Decipher Processing-Structure-Property-Performance Relationships of Additively manufactured metals

Lead scientist: Congrui Jin, mechanical engineering

During the last decade, various additive manufacturing techniques have been developed for the processing of complex metallic components. However, our understanding of the Processing-Structure-Property-Performance (PSPP) relationships of additively manufactured metals has not kept pace with the proliferation of the systems put into service. In particular, for additively manufactured high-temperature components, accurate prediction of their mechanical properties, such as creep rupture and fatigue strength, becomes a fundamentally significant issue. The overarching goal of the proposed research is to explore data science techniques to decipher PSPP relationships of additively manufactured metals, especially to predict creep rupture and fatigue strength of additively manufactured high-temperature components based on the processing parameters and material micro-structures. The proposed project will be the first application of data science techniques to study additively manufactured materials. Successful accomplishment of this research will result in highly reliable causal linkages among processing parameters, material micro-structures, and their mechanical properties, which can be utilized to provide us multiple optimal solutions for a specific application. This interdisciplinary effort couples the expertise of Congrui Jin and Pu Zhang in additive manufacturing and the expertise of Sanjeena Dang in data science. This work will provide the necessary preliminary results to aggressively seek external grants.

For the 2017-2018 academic year, the following seed grants were awarded:

Adaptive Network Modeling of Real-World Temporal Social Networks

Lead scientist: Hiroky Sayama, systems science and industrial engineering

The objective of this proposal is to develop algorithms and software that can overcome the challenges identified in existing temporal network analysis methods and effectively produce mechanistic, dynamical models from real-world temporal social network data. The data can involve temporarily varying network size and state-topology coevolution, which would not be captured in existing analytical methods.

Modeling and analysis of temporal social networks has attracted a lot of attention in various disciplines. A number of research methods have been proposed for temporal network analysis, but they are limited in capturing certain temporal dynamics, such as addition or removal of nodes, changes of node states, transitions of mesoscopic structures, and state-topology coevolution. An illustrative example is customers' network---new customers may join, some old customers may leave, their preferences may change because of social influence, and their social ties may also change based on their preferences. These temporal social network dynamics are essential in understanding the customers' behaviors, but they are not fully captured by existing methods. What is currently missing is a modeling/analysis tool for generating more detailed, more mechanistic dynamical models that can describe those nontrivial temporal social network dynamics in a uniform, tractable way.

To meet the aforementioned need, the PIs have adopted a unique, unconventional approach to model temporal network dynamics as a "computational" process, represented by repeated extraction and replacement of subgraphs. Prototype versions of algorithms and software have demonstrated promising results for small-scale, simulated network data, yet there are still algorithmic challenges: How can one handle a high volume of noise and temporal sparseness of real-world temporal social network data, and how can one automatically discover nontrivial dynamical models beyond user-provided ones and generalize them to unobserved situations? The proposed project aims to address these challenges.

Automated Generation of Urban Land Use Data by Integrating Remote Sensing and Social Sensing

Lead scientist: Chengbin Deng, geography

Land use and land cover (LULC) data provides invaluable spatial-explicit and functional information of urban lands transformed by human beings. There are a large number of detailed land use types in a heterogeneous urban environment, including single-family, multi-family, commercial, industrial, transportation, and civic land. Such information is helpful to city administrators, scholars and researchers, public health officials, and especially, urban planners for a variety of purposes. Detailed land use data has served as an important input in socioeconomic studies and planning practices. Nowadays, detailed urban land use information relies heavily on manual digitizing, local knowledge from field surveys, as well as other data sources (e.g., building permit records, appraisal materials, census information). Rapid urban expansion requires frequent updates of urban land use data, which is always time consuming and labor intensive. Public information such as tax payment or tax status are also updated and included in the latest databases. It is still very difficult, and almost impossible, to implement automated urban land use updates. Therefore, generating accurate and timely urban land use products in a more manageable time framework can provide a more intelligent approach for a variety of applied practices and urban studies.

In this proposal, we proposed a new method to address the major gaps in traditional urban land use acquisition. This will be done by state-of-the-art statistical learning methods, including random forests, to integrate and analyze geospatial and social big data. On the one hand, remote sensing data provides environmental information of urban physical environments. On the other hand, social media data provides sufficient information of human activities. Eventually, our long term goal is to automatically generate and update land use products by integrating such geospatial open datasets. This will significantly improve the efficiency of LULC mapping to support sustainable urban planning and other practices.

Development of an Intelligent Mental Disease Prediction System Prototype based on Dietary Pattern Analysis: a Pilot Study

Lead scientist: Lina Begdache, health and wellness studies

Nutrition and mental health research is an emerging interdisciplinary field. Nutrition is one of the modifiable risk factors for mental health. Traditionally, studies on the association of diet and mental distress have focused on single nutrients; however current trends in nutritional epidemiology research is leaning toward assessing dietary patterns in relation to comorbidities. This rationale considers the complexity of nutrient interaction and the daily variation in diet. The human brain is continuously changing during development or with age. Therefore, dietary changes may necessitate with age. Our lab has established a prototype that describes the relationship between a healthy diet, exercise, healthy practices and mental wellbeing. Eating healthy may promote healthy habits and mental wellbeing by elevating dopamine levels in the brain. Mental wellbeing then acts as a positive reinforcement to further healthy diet, healthy practices and exercise to improve health. This loop can become a virtuous cycle optimizing mental health. When healthy diet, exercise or healthy practices are absent, lower dopamine levels depresses mood which in turn reduces healthy diet, exercise and healthy practices resulting in a vicious cycle which reflects that mental distress is multidimensional. In addition, individuals have genetic variations, and so the approach of "one size fits all" is losing ground as often medications don't work effectively. Personalized therapy is at the forefront of Precision Medicine, an emerging approach for disease treatment. The significance of this research is that it will support development of targeted nutritional interventions to better mood which will increase precision of other therapies.

Multitask Transfer Learning Enhanced Rare Event Detection using Sensing Data

Lead scientist: Changqing Cheng, systems science and industrial engineering

Rare events are those that often occur at low frequency but with catastrophic consequence, e.g., seismic activity, stock market flash crash, and terrorism attacks. While most of such events are not preventable, the accurate and timely detection will enable promote actions to significantly reduce the severity of the effect and the associated cost. Recently, the widespread of wireless sensors and smart devices have offered an unprecedented opportunity to monitor various complex systems, from manufacturing to healthcare. Remarkably, the time series sensing data contain considerable causal information about the underlying dynamics, and enable us to harness fundamental patterns for diagnosis, prognosis and decision making. Thus, the objective of this study is to design an integrated platform for process monitoring, particularly the rare event detection, using the sensing data. Nonetheless, the inherent nonlinearity and nonstationarity of the sensing data have increasingly become a persistent challenge for sensing-driven process monitoring. Therefore, we propose to design a multitask transfer learning approach to fuse information from multiple sensing sources to enhance the monitoring resolution.

Particularly, abrupt changes in ultra-precision machining exemplify an immense challenge faced by the modern advanced manufacturing. As shown in the Figure, the machining process experiences a scratch on the workpiece surface at time index 10,000. Offline measurement indicates that the surface roughness deteriorates to 82 nm from 35 nm before the scratch occurs. A timely detection of such events from the in situ vibration signals will enable corrective actions to avoid escalating cost.

The (Data) Science of Gerrymandering

Lead scientist: Daniel B. Magleby, political science

In early October 2017, the Supreme Court of the United States heard oral arguments in the case of Gill v. Whitford. The case calls into question the constitutionality of partisan gerrymandering -- the practice of drawing boundaries of districts in such a way that one political party receives an unfair advantage. The plaintiffs in the case argued that data scientists brought to bear a set of tools that allowed Wisconsin's legislature and governor to draw and enact maps that favored the Republican party. By every estimation, the Republicans' strategy was extremely effective. For example, in the 2012 elections, Republican candidates received just 48% of the vote while managing to carry 60% of the districts in the state.

Just as data science can be used to build an unfair advantage, data science can be used to identify and remedy these unfair redistricting practices in Wisconsin and elsewhere. With the seed grant from the Binghamton University Data Science Transdisciplinary Working Group, the team will implement an algorithm that PI Magleby developed with Daniel Mosesson on a high performance computing cluster. In a forthcoming paper, the team show that the algorithm produces maps without any indication of bias. Moreover, the method the team propose is vastly more efficient that alternatives. Access to a cluster will allow to use the algorithm to draw hundreds of millions of hypothetical maps. That large number of counter-factual maps will allow to make inferences about the impact of certain redistricting criteria that mapmakers have used as a defense of partisan outcomes. In particular, it will allow to understand the ways that considerations like race, communities of interest, and other political jurisdictions interact with the biased, partisan outcomes that analysts have observed in recent decades.