A feature selection method for author identification in interactive communications based on supervised learning and language typicality

dc.contributor.authorVillar-Rodriguez, Esther
dc.contributor.authorDel Ser, Javier
dc.contributor.authorBilbao, Miren Nekane
dc.contributor.authorSalcedo-Sanz, Sancho
dc.contributor.institutionTecnalia Research & Innovation
dc.contributor.institutionQuantum
dc.contributor.institutionIA
dc.date.issued2016-11-01
dc.descriptionPublisher Copyright: © 2016 Elsevier Ltd
dc.description.abstractAuthorship attribution, conceived as the identification of the origin of a text between different authors, has been a very active area of research in the scientific community mainly supported by advances in Natural Language Processing (NLP), machine learning and Computational Intelligence. This paradigm has been mostly addressed from a literary perspective, aiming at identifying the stylometric features and writeprints which unequivocally typify the writer patterns and allow their unique identification. On the other hand, the upsurge of social networking platforms and interactive messaging have undoubtedly made the anonymous expression of feelings, the sharing of experiences and social relationships much easier than in other traditional communication media. Unfortunately, the popularity of such communities and the virtual identification of their users deploy a rich substrate for cybercrimes against unsuspecting victims and other forms of illegal uses of social networks that call for the activity tracing of accounts. In the context of one-to-one communications this manuscript postulates the identification of the sender of a message as a useful approach to detect impersonation attacks in interactive communication scenarios. In particular this work proposes to select linguistic features extracted from messages via NLP techniques by means of a novel feature selection algorithm based on the dissociation between essential traits of the sender and receiver influences. The performance and computational efficiency of different supervised learning models when incorporating the proposed feature selection method is shown to be promising with real SMS data in terms of identification accuracy, and paves the way towards future research lines focused on applying the concept of language typicality in the discourse analysis field.en
dc.description.statusPeer reviewed
dc.format.extent10
dc.format.extent566454
dc.identifier.citationVillar-Rodriguez , E , Del Ser , J , Bilbao , M N & Salcedo-Sanz , S 2016 , ' A feature selection method for author identification in interactive communications based on supervised learning and language typicality ' , Engineering Applications of Artificial Intelligence , vol. 56 , pp. 175-184 . https://doi.org/10.1016/j.engappai.2016.09.004
dc.identifier.doi10.1016/j.engappai.2016.09.004
dc.identifier.issn1873-6769
dc.identifier.otherresearchoutputwizard: 11556/354
dc.identifier.urlhttp://www.scopus.com/inward/record.url?scp=84988025909&partnerID=8YFLogxK
dc.language.isoeng
dc.relation.ispartofEngineering Applications of Artificial Intelligence
dc.rightsinfo:eu-repo/semantics/openAccess
dc.subject.keywordsAuthorship identification
dc.subject.keywordsNatural Language Processing
dc.subject.keywordsSupervised learning
dc.subject.keywordsFeature selection
dc.subject.keywordsImpersonation
dc.subject.keywordsRole identification
dc.subject.keywordsAuthorship identification
dc.subject.keywordsNatural Language Processing
dc.subject.keywordsSupervised learning
dc.subject.keywordsFeature selection
dc.subject.keywordsImpersonation
dc.subject.keywordsRole identification
dc.subject.keywordsControl and Systems Engineering
dc.subject.keywordsArtificial Intelligence
dc.subject.keywordsElectrical and Electronic Engineering
dc.subject.keywordsFunding Info
dc.subject.keywordsThis work has been partially supported by the Basque Government under the ETORTEK (Grant IE14-382) and the ELKARTEK (BID3A project, Grant ref. KK-2015/0000080) funding programs.
dc.subject.keywordsThis work has been partially supported by the Basque Government under the ETORTEK (Grant IE14-382) and the ELKARTEK (BID3A project, Grant ref. KK-2015/0000080) funding programs.
dc.titleA feature selection method for author identification in interactive communications based on supervised learning and language typicalityen
dc.typejournal article
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
EAAI-evillar-second submission-29072016.pdf
Size:
553.18 KB
Format:
Adobe Portable Document Format
Description:
Main article