Projects

Funded in the 2018-2019 cycle

Using Data Science to Decipher Processing-Structure-Property-Performance Relationships of Additively Manufactured Metals

Lead Data Scientist: Congrui Jin (Mechanical Engineering)

Figure 1 

The overarching goal of the proposed research is to explore data science techniques to decipher Processing-Structure-Property-Performance (PSPP) relationships of additively manufactured metals, especially to predict creep rupture and fatigue strength of additively manufactured high-temperature components based on the manufacturing processing parameters and material micro-structures.  The PSPP relationships of additively manufactured materials are shown in the figure, in which the deductive science relationships of cause and effect flow from left to right, whereas the inductive engineering relationships of goals and means flow from right to left. Note that each relationship from left to right is many-to-one, and consequently the ones from right to left are one-to-many. In other words, different processing routes can potentially lead to the same structure of the material, whereas the same material property could be potentially achieved by different structures. Each experimental observation or simulation result can be thought of as a data point for a forward model, e.g., a measurement or calculation of a property based on the given processing and structure parameters. A database of such data points can be effectively utilized by data science techniques to build data-driven forward models that can run in a small fraction of the time it takes to conduct experiments and/or simulations. This acceleration of forward models can not only help guide future simulations and experiments, but also make it possible to realize the inverse models, which are far more challenging and critical for material design. The construction of inverse models is typically formulated as an optimization problem, in which a property or performance metric of interest is intended to be maximized or minimized subject to the various constraints on the processing and/or structure of the material. The optimization process usually involves multiple invocations of the forward model, and thus having a fast forward model is extremely valuable. 

The proposed project will be the first application of data science techniques to study creep rupture of additively manufactured materials. Successful accomplishment of this research will result in highly reliable causal linkages among processing parameters, material micro-structures, and their mechanical properties, which can be utilized to provide us multiple optimal solutions for a specific application. This interdisciplinary effort couples the expertise of Dr. Congrui Jin and Dr. Pu Zhang in additive manufacturing and the expertise of Dr. Sanjeena Dang in data science. This work will provide the necessary preliminary results to aggressively seek external grants.     

 Funded in the 2017-2018 cycle

Adaptive Network Modeling of Real-World Temporal Social Networks

Lead Data Scientist: Hiroki Sayama (Systems Science and Industrial Engineering)

network

The objective of this proposal is to develop algorithms and software that can overcome the challenges identified in existing temporal network analysis methods and effectively produce mechanistic, dynamical models from real-world temporal social network data. The data can involve temporarily varying network size and state-topology coevolution, which would not be captured in existing analytical methods.

Modeling and analysis of temporal social networks has attracted a lot of attention in various disciplines. A number of research methods have been proposed for temporal network analysis, but they are limited in capturing certain temporal dynamics, such as addition or removal of nodes, changes of node states, transitions of mesoscopic structures, and state-topology coevolution. An illustrative example is customers' network---new customers may join, some old customers may leave, their preferences may change because of social influence, and their social ties may also change based on their preferences. These temporal social network dynamics are essential in understanding the customers' behaviors, but they are not fully captured by existing methods. What is currently missing is a modeling/analysis tool for generating more detailed, more mechanistic dynamical models that can describe those nontrivial temporal social network dynamics in a uniform, tractable way.

To meet the aforementioned need, the PIs have adopted a unique, unconventional approach to model temporal network dynamics as a "computational" process, represented by repeated extraction and replacement of subgraphs. Prototype versions of algorithms and software have demonstrated promising results for small-scale, simulated network data, yet there are still algorithmic challenges: How can one handle a high volume of noise and temporal sparseness of real-world temporal social network data, and how can one automatically discover nontrivial dynamical models beyond user-provided ones and generalize them to unobserved situations? The proposed project aims to address these challenges.

Automated Generation of Urban Land Use Data by Integrating Remote Sensing and Social Sensing

Lead Data Scientist: Chengbin Deng (Geography)

Land use and land cover (LULC) data provides invaluable spatial-explicit and functional information of urban lands transformed by human beings. There are a large number of detailed land use types in a heterogeneous urban environment, including single-family, multi-family, commercial, industrial, transportation, and civic land. Such information is helpful to city administrators, scholars and researchers, public health officials, and especially, urban planners for a variety of purposes. Detailed land use data has served as an important input in socioeconomic studies and planning practices. Nowadays, detailed urban land use information relies heavily on manual digitizing, local knowledge from field surveys, as well as other data sources (e.g., building permit records, appraisal materials, census information). Rapid urban expansion requires frequent updates of urban land use data, which is always time consuming and labor intensive. Public information such as tax payment or tax status are also updated and included in the latest databases. It is still very difficult, and almost impossible, to implement automated urban land use updates. Therefore, generating accurate and timely urban land use products in a more manageable time framework can provide a more intelligent approach for a variety of applied practices and urban studies.

In this proposal, we proposed a new method to address the major gaps in traditional urban land use acquisition. This will be done by state-of-the-art statistical learning methods, including random forests, to integrate and analyze geospatial and social big data. On the one hand, remote sensing data provides environmental information of urban physical environments. On the other hand, social media data provides sufficient information of human activities. Eventually, our long term goal is to automatically generate and update land use products by integrating such geospatial open datasets. This will significantly improve the efficiency of LULC mapping to support sustainable urban planning and other practices.

Development of an Intelligent Mental Disease Prediction System Prototype based on Dietary Pattern Analysis: a Pilot Study

Lead Data Scientist: Lina Begdache (Decker College of Nursing and Health Sciences)

Reinforcing loops that describe mental health, healthy diet, exercise and healthy practicesNutrition and mental health research is an emerging interdisciplinary field. Nutrition is one of the modifiable risk factors for mental health. Traditionally, studies on the association of diet and mental distress have focused on single nutrients; however current trends in nutritional epidemiology research is leaning toward assessing dietary patterns in relation to comorbidities. This rationale considers the complexity of nutrient interaction and the daily variation in diet. The human brain is continuously changing during development or with age. Therefore, dietary changes may necessitate with age. Our lab has established a prototype that describes the relationship between a healthy diet, exercise, healthy practices and mental wellbeing. Eating healthy may promote healthy habits and mental wellbeing by elevating dopamine levels in the brain. Mental wellbeing then acts as a positive reinforcement to further healthy diet, healthy practices and exercise to improve health. This loop can become a virtuous cycle optimizing mental health. When healthy diet, exercise or healthy practices are absent, lower dopamine levels depresses mood which in turn reduces healthy diet, exercise and healthy practices resulting in a vicious cycle which reflects that mental distress is multidimensional. In addition, individuals have genetic variations, and so the approach of "one size fits all" is losing ground as often medications don't work effectively. Personalized therapy is at the forefront of Precision Medicine, an emerging approach for disease treatment. The significance of this research is that it will support development of targeted nutritional interventions to better mood which will increase precision of other therapies.

 

Multitask Transfer Learning Enhanced Rare Event Detection using Sensing Data

Lead Data Scientist: Changqing Cheng (Systems Science and Industrial Engineering)

Rare events are those that often occur at low frequency but with catastrophic consequence, e.g., seismic activity, stock market flash crash, and terrorism attacks. While most of such events are not preventable, the accurate and timely detection will enable promote actions to significantly reduce the severity of the effect and the associated cost. Recently, the widespread of wireless sensors and smart devices have offered an unprecedented opportunity to monitor various complex systems, from manufacturing to healthcare. Remarkably, the time series sensing data contain considerable causal information about the underlying dynamics, and enable us to harness fundamental patterns for diagnosis, prognosis and decision making. Thus, the objective of this study is to design an integrated platform for process monitoring, particularly the rare event detection, using the sensing data. Nonetheless, the inherent nonlinearity and nonstationarity of the sensing data have increasingly become a persistent challenge for sensing-driven process monitoring. Therefore, we propose to design a multitask transfer learning approach to fuse information from multiple sensing sources to enhance the monitoring resolution.

Particularly, abrupt changes in ultra-precision machining exemplify an immense challenge faced by the modern advanced manufacturing. As shown in the Figure, the machining process experiences a scratch on the workpiece surface at time index 10,000. Offline measurement indicates that the surface roughness deteriorates to 82 nm from 35 nm before the scratch occurs. A timely detection of such events from the in situ vibration signals will enable corrective actions to avoid escalating cost.

The (Data) Science of Gerrymandering

Lead Data Scientist: Daniel B. Magleby (Political Science)

michigan map

In early October 2017, the Supreme Court of the United States heard oral arguments in the case of Gill v. Whitford. The case calls into question the constitutionality of partisan gerrymandering -- the practice of drawing boundaries of districts in such a way that one political party receives an unfair advantage. The plaintiffs in the case argued that data scientists brought to bear a set of tools that allowed Wisconsin's legislature and governor to draw and enact maps that favored the Republican party. By every estimation, the Republicans' strategy was extremely effective. For example, in the 2012 elections, Republican candidates received just 48% of the vote while managing to carry 60% of the districts in the state.

Just as data science can be used to build an unfair advantage, data science can be used to identify and remedy these unfair redistricting practices in Wisconsin and elsewhere. With the seed grant from the Binghamton University Data Science Transdisciplinary Working Group, the team will implement an algorithm that PI Magleby developed with Daniel Mosesson on a high performance computing cluster. In a forthcoming paper, the team show that the algorithm produces maps without any indication of bias. Moreover, the method the team propose is vastly more efficient that alternatives. Access to a cluster will allow to use the algorithm to draw hundreds of millions of hypothetical maps. That large number of counter-factual maps will allow to make inferences about the impact of certain redistricting criteria that mapmakers have used as a defense of partisan outcomes. In particular, it will allow to understand the ways that considerations like race, communities of interest, and other political jurisdictions interact with the biased, partisan outcomes that analysts have observed in recent decades.