Browsing by Keyword "Supervised learning"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item A feature selection method for author identification in interactive communications based on supervised learning and language typicality(2016-11-01) Villar-Rodriguez, Esther; Del Ser, Javier; Bilbao, Miren Nekane; Salcedo-Sanz, Sancho; Tecnalia Research & Innovation; Quantum; IAAuthorship attribution, conceived as the identification of the origin of a text between different authors, has been a very active area of research in the scientific community mainly supported by advances in Natural Language Processing (NLP), machine learning and Computational Intelligence. This paradigm has been mostly addressed from a literary perspective, aiming at identifying the stylometric features and writeprints which unequivocally typify the writer patterns and allow their unique identification. On the other hand, the upsurge of social networking platforms and interactive messaging have undoubtedly made the anonymous expression of feelings, the sharing of experiences and social relationships much easier than in other traditional communication media. Unfortunately, the popularity of such communities and the virtual identification of their users deploy a rich substrate for cybercrimes against unsuspecting victims and other forms of illegal uses of social networks that call for the activity tracing of accounts. In the context of one-to-one communications this manuscript postulates the identification of the sender of a message as a useful approach to detect impersonation attacks in interactive communication scenarios. In particular this work proposes to select linguistic features extracted from messages via NLP techniques by means of a novel feature selection algorithm based on the dissociation between essential traits of the sender and receiver influences. The performance and computational efficiency of different supervised learning models when incorporating the proposed feature selection method is shown to be promising with real SMS data in terms of identification accuracy, and paves the way towards future research lines focused on applying the concept of language typicality in the discourse analysis field.Item Hybridizing Cartesian Genetic Programming and Harmony Search for adaptive feature construction in supervised learning problems(2016) Elola, Andoni; Del Ser, Javier; Bilbao, Miren Nekane; Perfecto, Cristina; Alexandre, Enrique; Salcedo-Sanz, Sancho; IAThe advent of the so-called Big Data paradigm has motivated a flurry of research aimed at enhancing machine learning models by following very diverse approaches. In this context this work focuses on the automatic construction of features in supervised learning problems, which differs from the conventional selection of features in that new characteristics with enhanced predictive power are inferred from the original dataset. In particular this manuscript proposes a new iterative feature construction approach based on a self-learning meta-heuristic algorithm (Harmony Search) and a solution encoding strategy (correspondingly, Cartesian Genetic Programming) suited to represent combinations of features by means of constant-length solution vectors. The proposed feature construction algorithm, coined as Adaptive Cartesian Harmony Search (ACHS), incorporates modifications that allow exploiting the estimated predictive importance of intermediate solutions and, ultimately, attaining better convergence rate in its iterative learning procedure. The performance of the proposed ACHS scheme is assessed and compared to that rendered by the state of the art in a toy example and three practical use cases from the literature. The excellent performance figures obtained in these problems shed light on the widespread applicability of the proposed scheme to supervised learning with legacy datasets composed by already refined characteristics.Item The role of local urban traffic and meteorological conditions in air pollution: A data-based case study in Madrid, Spain: A data-based case study in Madrid, Spain(2016-11-01) Laña, Ibai; Del Ser, Javier; Padró, Ales; Vélez, Manuel; Casanova-Mateo, Carlos; IA; CALIDAD Y CONFORT AMBIENTALUrban air pollution is a matter of growing concern for both public administrations and citizens. Road traffic is one of the main sources of air pollutants, though topography characteristics and meteorological conditions can make pollution levels increase or diminish dramatically. In this context an upsurge of research has been conducted towards functionally linking variables of such domains to measured pollution data, with studies dealing with up to one-hour resolution meteorological data. However, the majority of such reported contributions do not deal with traffic data or, at most, simulate traffic conditions jointly with the consideration of different topographical features. The aim of this study is to further explore this relationship by using high-resolution real traffic data. This paper describes a methodology based on the construction of regression models to predict levels of different pollutants (i.e. CO, NO, NO2, O3 and PM10) based on traffic data and meteorological conditions, from which an estimation of the predictive relevance (importance) of each utilized feature can be estimated by virtue of their particular training procedure. The study was made with one hour resolution meteorological, traffic and pollution historic data in roadside and background locations of the city of Madrid (Spain) captured over 2015. The obtained results reveal that the impact of vehicular emissions on the pollution levels is overshadowed by the effects of stable meteorological conditions of this city.