Efficient Machine Learning for Time-Varying Data

20260 citationsDissertationgold Open Access

Authors

Abstract

In today's digital era, an immense volume of data is generated every second by an ever-growing network of sensors embedded in an array of devices ranging from smartphones and home appliances to industrial machinery and environmental monitoring systems. Machine Learning (ML) plays a critical role in decoding these time-series data, predicting future trends, monitoring systems, diagnosing issues, and optimizing performance across various sectors. However, the complexities associated with time-varying data, wherein not only the sensor measurements but also the number of sensors and their underlying patterns can change over time, make it a demanding task to derive useful predictions through ML. This thesis is a deep dive into ML and transfer learning techniques, specifically applied to time-varying data. We tackled three significant challenges in the field: imbalances in time-series regression, increasing input dimensions, and covariate shift. To each challenge, we provided innovative solutions, rigorously testing them in real-world scenarios. The first part of our research focused on comparing traditional ML models with deep learning techniques applied to a predictive maintenance task, elucidating the strengths and weaknesses of the state of the art in both categories. We also conducted a similar study focused on transfer learning, analyzing its shortcomings in predicting extubation outcomes for COVID patients early in the pandemic, when only limited data were available. One of the critical gaps we observed was in dealing with imbalance in time-series regression. To address this, we developed a robust iterative framework, allowing practitioners to balance their predictions' focus and interact with domain experts at each step. This framework uses sampling methods and model performance comparisons to efficiently handle data imbalance, adding substantial flexibility not seen in existing methods. This thesis also sheds light on a prevalent issue of expanding input dimensions in datasets, offering a novel solution. We introduced a transfer learning method to add new inputs to an existing prediction task, separating historical data and new data into source and target datasets. This approach, theoretically sound and robust against negative transfer learning, demonstrated superior performance on multiple real-life datasets. It proves easy to implement and can handle tasks even when the new input data is scarce, making it a valuable addition to the field. The third challenge we tackled was the covariate shift, which originates when input data changes its distribution over time. Our research led to the creation of a transfer learning method using the minimum error entropy (MEE) criterion as a loss function, known for its robustness against various noise types. This method effectively handled covariate shift, offering promising results in time-series regression tasks while also being robust to various types of noise, a shortcoming of other similar methods. Finally, we validated our proposed methodologies on real-world applications, such as predicting the temperature of a conveyor belt engine, illustrating our iterative framework's effectiveness in handling imbalanced time-series forecasting. Our MEE-based transfer learning approach was also applied in several challenging transfer-learning datasets.

Topics & Keywords

Machine Learning in Healthcare Imbalanced Data Classification Techniques COVID-19 diagnosis using AI

UN Sustainable Development Goals

Industry, innovation and infrastructure

Publication Details

DOI: 10.5463/thesis.1570

Command Palette

Efficient Machine Learning for Time-Varying Data

Authors

Abstract

Topics & Keywords

UN Sustainable Development Goals

Publication Details