RT Dissertation/Thesis T1 Advanced Machine Learning Techniques and Meta-Heuristic Optimization for the Detection of Masquerading Attacks in Social Networks A1 Villar-Rodriguez, Esther AB According to the report published by the online protection firm Iovation in 2012,cyber fraud ranged from 1 percent of the Internet transactions in North AmericaAfrica to a 7 percent in Africa, most of them involving credit card fraud, identitytheft, and account takeover or h¼acking attempts. This kind of crime is still growingdue to the advantages offered by a non face-to-face channel where a increasingnumber of unsuspecting victims divulges sensitive information. Interpol classifiesthese illegal activities into 3 types:• Attacks against computer hardware and software.• Financial crimes and corruption.• Abuse, in the form of grooming or “sexploitation”.Most research efforts have been focused on the target of the crime developing differentstrategies depending on the casuistic. Thus, for the well-known phising, storedblacklist or crime signals through the text are employed eventually designing adhocdetectors hardly conveyed to other scenarios even if the background is widelyshared. Identity theft or masquerading can be described as a criminal activity orientedtowards the misuse of those stolen credentials to obtain goods or services bydeception. On March 4, 2005, a million of personal and sensitive information suchas credit card and social security numbers was collected by White Hat hackers atSeattle University who just surfed the Web for less than 60 minutes by means ofthe Google search engine. As a consequence they proved the vulnerability and lackof protection with a mere group of sophisticated search terms typed in the enginewhose large data warehouse still allowed showing company or government websitesdata temporarily cached.As aforementioned, platforms to connect distant people in which the interaction isundirected pose a forcible entry for unauthorized thirds who impersonate the licituser in a attempt to go unnoticed with some malicious, not necessarily economic,interests. In fact, the last point in the list above regarding abuses has become amajor and a terrible risk along with the bullying being both by means of threats,harassment or even self-incrimination likely to drive someone to suicide, depressionor helplessness. California Penal Code Section 528.5 states:“Notwithstanding any other provision of law, any person who knowinglyand without consent credibly impersonates another actual person throughor on an Internet Web site or by other electronic means for purposes ofharming, intimidating, threatening, or defrauding another person is guiltyof a public offense punishable pursuant to subdivision [...]”.IVTherefore, impersonation consists of any criminal activity in which someone assumesa false identity and acts as his or her assumed character with intent to geta pecuniary benefit or cause some harm. User profiling, in turn, is the process ofharvesting user information in order to construct a rich template with all the advantageousattributes in the field at hand and with specific purposes. User profiling isoften employed as a mechanism for recommendation of items or useful informationwhich has not yet considered by the client. Nevertheless, deriving user tendency orpreferences can be also exploited to define the inherent behavior and address theproblem of impersonation by detecting outliers or strange deviations prone to entaila potential attack.This dissertation is meant to elaborate on impersonation attacks from a profilingperspective, eventually developing a 2-stage environment which consequently embraces2 levels of privacy intrusion, thus providing the following contributions:• The inference of behavioral patterns from the connection time traces aiming atavoiding the usurpation of more confidential information. When compared toprevious approaches, this procedure abstains from impinging on the user privacyby taking over the messages content, since it only relies on time statisticsof the user sessions rather than on their content.• The application and subsequent discussion of two selected algorithms for theprevious point resolution:– A commonly employed supervised algorithm executed as a binary classifierwhich thereafter has forced us to figure out a method to deal with theabsence of labeled instances representing an identity theft.– And a meta-heuristic algorithm in the search for the most convenient parametersto array the instances within a high dimensional space into properlydelimited clusters so as to finally apply an unsupervised clusteringalgorithm.• The analysis of message content encroaching on more private information buteasing the user identification by mining discriminative features by NaturalLanguage Processing (NLP) techniques. As a consequence, the development ofa new feature extraction algorithm based on linguistic theories motivated bythe massive quantity of features often gathered when it comes to texts.In summary, this dissertation means to go beyond typical, ad-hoc approachesadopted by previous identity theft and authorship attribution research. Specificallyit proposes tailored solutions to this particular and extensively studied paradigmwith the aim at introducing a generic approach from a profiling view, not tightlybound to a unique application field. In addition technical contributions have beenmade in the course of the solution formulation intending to optimize familiar methodsfor a better versatility towards the problem at hand. In summary: this Thesisestablishes an encouraging research basis towards unveiling subtle impersonationattacks in Social Networks by means of intelligent learning techniques. PB Universidad de Alcalá YR 2015 FD 2015-12-11 LK http://hdl.handle.net/11556/182 UL http://hdl.handle.net/11556/182 LA eng DS TECNALIA Publications RD 3 jul 2024