Algorithms satisfy hunger for real-time data
The world today moves at a fast pace, and most of us don’t have time to wait around. Twitter users monitor what’s trending now, not last month. Drivers check the road conditions for the morning commute. Air-traffic controllers track the locations of thousands of planes simultaneously, and investors conduct high-frequency trading. Much of the data that’s collected can’t be sent to sit in a warehouse; it requires nearly instantaneous computer processing and feedback.
“Real-time data is very dynamic and unpredictable,” says Kyoung-Don Kang, associate professor of computer science at Binghamton University. For example, Kang explains, a traffic-monitoring system might not see much activity at midnight Sunday, but it will generate tremendous amounts of data during Monday’s morning rush hour. That data could slow down the processing system, right when it’s needed most. “It is very challenging to process this data in a timely manner,” he says.
When real-time computing fails, it can compromise safety or lead to financial loss. That’s why Kang is working to make this fast-paced data processing more efficient, with help from a National Science Foundation grant of nearly $250,000.
“It is an important research area, especially at this point when we have lots of critical systems depending on continuous streams of real-time data from zillions of sensors deployed in the environment,” says Sang H. Son, a computer scientist at the University of Virginia.
Why not just design systems that are capable of processing massive amounts of data all the time? It’s not practical, Kang says, because most of the time a system will need to process only sparse amounts of data — and when it sits idle, that’s a waste of resources. And data is always increasing in volume, so even today’s top-notch system will be outpaced eventually.
Kang says the key to using real-time data applications is to cut your losses. If the amount of data is more than the system can handle, then some of it must be dropped, he says: “Some data is more important than others.”
That’s why he’ll be developing algorithms and software solutions that process the most vital data stream first. Kang will use simple yet powerful rules to prioritize some operations over others — for instance, if an input data stream is important, then the query processing output from that data stream is likely to be important as well — to build more efficient load-shedding and continuous query processing techniques. This approach can be applied to detect important events, such as unusual traffic patterns or homeland security issues, in real time.
More efficient processing of real-time data could one day enable other technological advances, including directing intelligent transportation or managing green buildings and smart grids. “There are many potential applications,” Kang says. “The challenge is being ready for anything.”