TECNALIA Publications Repository :: Browsing by Keyword "Concept drift"

Browsing by Keyword "Concept drift"

Now showing 1 - 13 of 13

Concept tracking and adaptation for drifting data streams under extreme verification latency
(Springer Verlag, 2018) Arostegi, Maria; Torre-Bastida, Ana I.; Lobo, Jesus L.; Bilbao, Miren Nekane; Del Ser, Javier; IA; Tecnalia Research & Innovation; HPA
When analyzing large-scale streaming data towards resolving classification problems, it is often assumed that true labels of the incoming data are available right after being predicted. This assumption allows online learning models to efficiently detect and accommodate non-stationarities in the distribution of the arriving data (concept drift). However, this assumption does not hold in many practical scenarios where a delay exists between predicted and class labels, to the point of lacking this supervision for an infinite period of time (extreme verification latency). In this case, the development of learning algorithms capable of adapting to drifting environments without any external supervision remains a challenging research area to date. In this context, this work proposes a simple yet effective learning technique to classify non-stationary data streams under extreme verification latency. The intuition motivating the design of our technique is to predict the trajectory of concepts in the feature space. The estimation of the region where concepts may reside in the future can be then exploited for producing more updated predictions for newly arriving examples, ultimately enhancing its accuracy during this unsupervised drifting period. Our approach is compared to a benchmark of incremental and static learning methods over a set of public non-stationary synthetic datasets. Results obtained by our passive learning method are promising and encourage further research aimed at generalizing its applicability to other types of drifts.
CURIE: a cellular automaton for concept drift detection: a cellular automaton for concept drift detection
(2021-11) Lobo, Jesus L.; Del Ser, Javier; Osaba, Eneko; Bifet, Albert; Herrera, Francisco; IA; Quantum
Data stream mining extracts information from large quantities of data flowing fast and continuously (data streams). They are usually affected by changes in the data distribution, giving rise to a phenomenon referred to as concept drift. Thus, learning models must detect and adapt to such changes, so as to exhibit a good predictive performance after a drift has occurred. In this regard, the development of effective drift detection algorithms becomes a key factor in data stream mining. In this work we propose CURIECURIE, a drift detector relying on cellular automata. Specifically, in CURIECURIE the distribution of the data stream is represented in the grid of a cellular automata, whose neighborhood rule can then be utilized to detect possible distribution changes over the stream. Computer simulations are presented and discussed to show that CURIECURIE, when hybridized with other base learners, renders a competitive behavior in terms of detection metrics and classification accuracy. CURIECURIE is compared with well-established drift detectors over synthetic datasets with varying drift characteristics.
DRED: An evolutionary diversity generation method for concept drift adaptation in online learning environments
(2018-07) L. Lobo, Jesus; Del Ser, Javier; Bilbao, Miren Nekane; Perfecto, Cristina; Salcedo-Sanz, Sancho; IA
Nowadays fast-arriving information flows lay the basis of many data mining applications. Such data streams are usually affected by non-stationary events that eventually change their distribution (concept drift), causing that predictive models trained over these data become obsolete and do not adapt suitably to the new distribution. Specially in online learning scenarios, there is a pressing need for new algorithms that adapt to this change as fast as possible, while maintaining good performance scores. Recent studies have revealed that a good strategy is to construct highly diverse ensembles towards utilizing them shortly after the drift (independently from the type of drift) to obtain good performance scores. However, the existence of the so-called trade-off between stability (performance over stable data concepts) and plasticity (recovery and adaptation after drift events) implies that the construction of the ensemble model should account simultaneously for these two conflicting objectives. In this regard, this work presents a new approach to artificially generate an optimal diversity level when building prediction ensembles once shortly after a drift occurs. The approach uses a Kernel Density Estimation (KDE) method to generate synthetic data, which are subsequently labeled by means a multi-objective optimization method that allows training each model of the ensemble with a different subset of synthetic samples. Computational experiments reveal that the proposed approach can be hybridized with other traditional diversity generation approaches, yielding optimized levels of diversity that render an enhanced recovery from drifts.
Drift detection over non-stationary data streams using evolving spiking neural networks
(Springer Verlag, 2018) Lobo, Jesus L.; Del Ser, Javier; Laña, Ibai; Bilbao, Miren Nekane; Kasabov, Nikola; IA
Drift detection in changing environments is a key factor for those active adaptive methods which require trigger mechanisms for drift adaptation. Most approaches are relied on a base learner that provides accuracies or error rates to be analyzed by an algorithm. In this work we propose the use of evolving spiking neural networks as a new form of drift detection, which resorts to the own architectural changes of this particular class of models to estimate the drift location without requiring any external base learner. By virtue of its inherent simplicity and lower computational cost, this embedded approach can be suitable for its adoption in online learning scenarios with severe resource constraints. Experiments with synthetic datasets show that the proposed technique is very competitive when compared to other drift detection techniques.
Evolving Spiking Neural Networks for online learning over drifting data streams
(2018-12) Lobo, Jesus L.; Laña, Ibai; Del Ser, Javier; Bilbao, Miren Nekane; Kasabov, Nikola; IA
Nowadays huge volumes of data are produced in the form of fast streams, which are further affected by non-stationary phenomena. The resulting lack of stationarity in the distribution of the produced data calls for efficient and scalable algorithms for online analysis capable of adapting to such changes (concept drift). The online learning field has lately turned its focus on this challenging scenario, by designing incremental learning algorithms that avoid becoming obsolete after a concept drift occurs. Despite the noted activity in the literature, a need for new efficient and scalable algorithms that adapt to the drift still prevails as a research topic deserving further effort. Surprisingly, Spiking Neural Networks, one of the major exponents of the third generation of artificial neural networks, have not been thoroughly studied as an online learning approach, even though they are naturally suited to easily and quickly adapting to changing environments. This work covers this research gap by adapting Spiking Neural Networks to meet the processing requirements that online learning scenarios impose. In particular the work focuses on limiting the size of the neuron repository and making the most of this limited size by resorting to data reduction techniques. Experiments with synthetic and real data sets are discussed, leading to the empirically validated assertion that, by virtue of a tailored exploitation of the neuron repository, Spiking Neural Networks adapt better to drifts, obtaining higher accuracy scores than naive versions of Spiking Neural Networks for online learning environments.
LUNAR: Cellular automata for drifting data streams
(2021-01-08) L. Lobo, Jesus; Del Ser, Javier; Herrera, Francisco; IA
With the advent of fast data streams, real-time machine learning has become a challenging task, demanding many processing resources. In addition, they can be affected by the concept drift effect, by which learning methods have to detect changes in the data distribution and adapt to these evolving conditions. Several emerging paradigms such as the so-called Smart Dust, Utility Fog, or Swarm Robotics are in need for efficient and scalable solutions in real-time scenarios, and where usually computing resources are constrained. Cellular automata, as low-bias and robust-to-noise pattern recognition methods with competitive classification performance, meet the requirements imposed by the aforementioned paradigms mainly due to their simplicity and parallel nature. In this work we propose LUNAR, a streamified version of cellular automata devised to successfully meet the aforementioned requirements. LUNAR is able to act as a real incremental learner while adapting to drifting conditions. Furthermore, LUNAR is highly interpretable, as its cellular structure represents directly the mapping between the feature space and the labels to be predicted. Extensive simulations with synthetic and real data will provide evidence of its competitive behavior in terms of classification performance when compared to long-established and successful online learning methods.
On the Connection between Concept Drift and Uncertainty in Industrial Artificial Intelligence
(Institute of Electrical and Electronics Engineers Inc., 2023) Lobo, Jesus L.; Lana, Ibai; Osaba, Eneko; Del Ser, Javier; IA; Quantum
AI-based digital twins are at the leading edge of the Industry 4.0 revolution, which are technologically empowered by the Internet of Things and real-time data analysis. Information collected from industrial assets is produced in a continuous fashion, yielding data streams that must be processed under stringent timing constraints. Such data streams are usually subject to non-stationary phenomena, causing that the data distribution of the streams may change, and thus the knowledge captured by models used for data analysis may become obsolete (leading to the so-called concept drift effect). The early detection of the change (drift) is crucial for updating the model's knowledge, which is challenging especially in scenarios where the ground truth associated to the stream data is not readily available. Among many other techniques, the estimation of the model's confidence has been timidly suggested in a few studies as a criterion for detecting drifts in unsupervised settings. The goal of this manuscript is to confirm and expose solidly the connection between the model's confidence in its output and the presence of a concept drift, showcasing it experimentally and advocating for a major consideration of uncertainty estimation in comparative studies to be reported in the future.
On the creation of diverse ensembles for nonstationary environments using bio-inspired heuristics
(Springer Verlag, 2017) Lobo, Jesus L.; Del Ser, Javier; Villar-Rodriguez, Esther; Bilbao, Miren Nekane; Salcedo-Sanz, Sancho; Del Ser, Javier; IA; Quantum
Recently the relevance of adaptive models for dynamic data environments has turned into a hot topic due to the vast number of scenarios generating nonstationary data streams. When a change (concept drift) in data distribution occurs, the ensembles of models trained over these data sources are obsolete and do not adapt suitably to the new distribution of the data. Although most of the research on the field is focused on the detection of this drift to re-train the ensemble, it is widely known the importance of the diversity in the ensemble shortly after the drift in order to reduce the initial drop in accuracy. In a Big Data scenario in which data can be huge (and also the number of past models), achieving the most diverse ensemble implies the calculus of all possible combinations of models, which is not an easy task to carry out quickly in the long term. This challenge can be formulated as an optimization problem, for which bio-inspired algorithms can play one of the key roles in these adaptive algorithms. Precisely this is the goal of this manuscript: to validate the relevance of the diversity right after drifts, and to unveil how to achieve a highly diverse ensemble by using a self-learning optimization technique.
Optimization and Prediction Techniques for Self-Healing and Self-Learning Applications in a Trustworthy Cloud Continuum
(2021-07-30) Alonso, Juncal; Orue-Echevarria, Leire; Osaba, Eneko; López Lobo, Jesús; Martinez, Iñigo; Diaz de Arcaya, Josu; Etxaniz, Iñaki; Tecnalia Research & Innovation; HPA; Quantum; IA
The current IT market is more and more dominated by the “cloud continuum”. In the “traditional” cloud, computing resources are typically homogeneous in order to facilitate economies of scale. In contrast, in edge computing, computational resources are widely diverse, commonly with scarce capacities and must be managed very efficiently due to battery constraints or other limitations. A combination of resources and services at the edge (edge computing), in the core (cloud computing), and along the data path (fog computing) is needed through a trusted cloud continuum. This requires novel solutions for the creation, optimization, management, and automatic operation of such infrastructure through new approaches such as infrastructure as code (IaC). In this paper, we analyze how artificial intelligence (AI)-based techniques and tools can enhance the operation of complex applications to support the broad and multi-stage heterogeneity of the infrastructural layer in the “computing continuum” through the enhancement of IaC optimization, IaC self-learning, and IaC self-healing. To this extent, the presented work proposes a set of tools, methods, and techniques for applications’ operators to seamlessly select, combine, configure, and adapt computation resources all along the data path and support the complete service lifecycle covering: (1) optimized distributed application deployment over heterogeneous computing resources; (2) monitoring of execution platforms in real time including continuous control and trust of the infrastructural services; (3) application deployment and adaptation while optimizing the execution; and (4) application self-recovery to avoid compromising situations that may lead to an unexpected failure.
Predictive Maintenance on the Machining Process and Machine Tool
(2020-01-01) Jimenez-Cortadi, Alberto; Irigoien, Itziar; Boto, Fernando; Sierra, Basilio; Rodriguez, German; Tecnalia Research & Innovation; FACTORY; FABRIC_INTEL
This paper presents the process required to implement a data driven Predictive Maintenance (PdM) not only in the machine decision making, but also in data acquisition and processing. A short review of the different approaches and techniques in maintenance is given. The main contribution of this paper is a solution for the predictive maintenance problem in a real machining process. Several steps are needed to reach the solution, which are carefully explained. The obtained results show that the Preventive Maintenance (PM), which was carried out in a real machining process, could be changed into a PdM approach. A decision making application was developed to provide a visual analysis of the Remaining Useful Life (RUL) of the machining tool. This work is a proof of concept of the methodology presented in one process, but replicable for most of the process for serial productions of pieces.
Rank Aggregation for Non-stationary Data Streams
(Springer, 2021-09-11) Irurozki, Ekhine; Perez, Aritz; Lobo, Jesus; Del Ser, Javier; Oliver, Nuria; Pérez-Cruz, Fernando; Kramer, Stefan; Read, Jesse; Lozano, Jose A.; IA
The problem of learning over non-stationary ranking streams arises naturally, particularly in recommender systems. The rankings represent the preferences of a population, and the non-stationarity means that the distribution of preferences changes over time. We propose an algorithm that learns the current distribution of ranking in an online manner. The bottleneck of this process is a rank aggregation problem. We propose a generalization of the Borda algorithm for non-stationary ranking streams. As a main result, we bound the minimum number of samples required to output the ground truth with high probability. Besides, we show how the optimal parameters are set. Then, we generalize the whole family of weighted voting rules (the family to which Borda belongs) to situations in which some rankings are more reliable than others. We show that, under mild assumptions, this generalization can solve the problem of rank aggregation over non-stationary data streams.
SLAYER: A Semi-supervised Learning Approach for Drifting Data Streams under Extreme
(2021) Arostegi, Maria; Lobo, Jesus L.; Del Ser, Javier; IA; Tecnalia Research & Innovation
Classification models learned from data streams often assume the availability of true labels after predicting new examples, either instantly or with some delay with respect to inference time. However, in many real-world scenarios comprising sensors, actuators and robotic swarms, this assumption may not realistically hold, since the supervision of newly classified samples can be unfeasible to achieve in practice. The extreme case where such a supervision is never available is referred to as extreme verification latency. Furthermore, streaming data is also known to undergo the effects of exogenous non-stationary phenomena, by which patterns to be learned from the streams can evolve over time, thereby requiring the adaptation of the classifier for its knowledge to match to the prevailing concept. When these two circumstances (extreme verification latency and concept drift) concur in a given scenario, adapting the model to the evolving dynamics of stream data becomes a challenging task, as the lack of supervision requires rethinking this functionality from a semi-supervised perspective. In this context we present SLAYER, a semi-supervised learning approach capable of tracking the evolution of concepts in the feature space, and analyzing their characteristics towards alleviating the effects of concept drift in the classification accuracy. Besides its continuous adaptation to evolving concepts, another advantage of SLAYER is its resilience against the appearance and disappearance of concepts over time, adapting its knowledge seamlessly when it occurs. We assess the performance of SLAYER over several datasets and compare it to that of state-of-the-art approaches proposed to deal with this stream learning setup. The discussion on the obtained results is conclusive: SLAYER offers a competitive behavior, performing best over several of the datasets considered in the benchmark.
Spiking Neural Networks and online learning: An overview and perspectives
(2020-01) Lobo, Jesus L.; Del Ser, Javier; Bifet, Albert; Kasabov, Nikola; IA
Applications that generate huge amounts of data in the form of fast streams are becoming increasingly prevalent, being therefore necessary to learn in an online manner. These conditions usually impose memory and processing time restrictions, and they often turn into evolving environments where a change may affect the input data distribution. Such a change causes that predictive models trained over these stream data become obsolete and do not adapt suitably to new distributions. Specially in these non-stationary scenarios, there is a pressing need for new algorithms that adapt to these changes as fast as possible, while maintaining good performance scores. Unfortunately, most off-the-shelf classification models need to be retrained if they are used in changing environments, and fail to scale properly. Spiking Neural Networks have revealed themselves as one of the most successful approaches to model the behavior and learning potential of the brain, and exploit them to undertake practical online learning tasks. Besides, some specific flavors of Spiking Neural Networks can overcome the necessity of retraining after a drift occurs. This work intends to merge both fields by serving as a comprehensive overview, motivating further developments that embrace Spiking Neural Networks for online learning scenarios, and being a friendly entry point for non-experts.

Browsing by Keyword "Concept drift"

Results Per Page

Sort Options