Scalable Data Profiling for Quality Analytics Extraction

dc.contributor.authorNikolakopoulos, Anastasios
dc.contributor.authorChondrogiannis, Efthymios
dc.contributor.authorKaranastasis, Efstathios
dc.contributor.authorOsa, María José López
dc.contributor.authorAroca, Jordi Arjona
dc.contributor.authorKefalogiannis, Michalis
dc.contributor.authorApostolopoulou, Vasiliki
dc.contributor.authorDeligeorgi, Efstathia
dc.contributor.authorSiopidis, Vasileios
dc.contributor.authorVarvarigou, Theodora
dc.contributor.editorMaglogiannis, Ilias
dc.contributor.editorIliadis, Lazaros
dc.contributor.editorKarydis, Ioannis
dc.contributor.editorPapaleonidas, Antonios
dc.contributor.editorChochliouros, Ioannis
dc.contributor.institutionHPA
dc.date.accessioned2024-09-06T09:35:06Z
dc.date.available2024-09-06T09:35:06Z
dc.date.issued2024
dc.descriptionPublisher Copyright: © IFIP International Federation for Information Processing 2024.
dc.description.abstractIn today’s modern society, data play an integral role in the development global industry, since they have become a valuable asset for companies, institutions, governments, and others. At the same time, data generated daily, at a global scale, require significant resources to pre-process, filter and store. When it comes to acquiring such stored data, it is essential to understand which dataset fits to the needs of the user beforehand. One particularly important factor is the quality of a dataset, which could be determined based on a series of quality related attributes generated by it. Such attributes constitute “Profiling”, the process of obtaining information from a data sample, related to the complete dataset’s quality. However, in the era of Big Data, the ability to apply profiling techniques in complete large datasets should also be considered, in order to obtain complete quality insights. This paper attempts to provide a solution for this consideration by presenting “DaQuE”, a scalable framework for efficient profiling and quality analytics extraction in complete datasets of all volumes.en
dc.description.statusPeer reviewed
dc.format.extent13
dc.identifier.citationNikolakopoulos , A , Chondrogiannis , E , Karanastasis , E , Osa , M J L , Aroca , J A , Kefalogiannis , M , Apostolopoulou , V , Deligeorgi , E , Siopidis , V & Varvarigou , T 2024 , Scalable Data Profiling for Quality Analytics Extraction . in I Maglogiannis , L Iliadis , I Karydis , A Papaleonidas & I Chochliouros (eds) , Artificial Intelligence Applications and Innovations. AIAI 2024 IFIP WG 12.5 International Workshops - MHDW 2024, 5G-PINE 2024, and AI4GD 2024, Proceedings . IFIP Advances in Information and Communication Technology , vol. 715 IFIPAICT , Springer Science and Business Media Deutschland GmbH , pp. 177-189 , 13th Mining Humanistic Data Workshop, MHDW 2024, 9th Workshop on 5G-Putting Intelligence to the Network Edge, 5G-PINE 2024 and 1st Workshop on AI in Applications for Achieving the Green Deal Targets, AI4GD 2024 held as parallel events of the IFIP WG 12.5 International Workshops on Artificial Intelligence Applications and Innovations, AIAI 2024 , Corfu , Greece , 27/06/24 . https://doi.org/10.1007/978-3-031-63227-3_12
dc.identifier.citationconference
dc.identifier.doi10.1007/978-3-031-63227-3_12
dc.identifier.isbn9783031632266
dc.identifier.issn1868-4238
dc.identifier.urihttps://hdl.handle.net/11556/4844
dc.identifier.urlhttp://www.scopus.com/inward/record.url?scp=85199206369&partnerID=8YFLogxK
dc.language.isoeng
dc.publisherSpringer Science and Business Media Deutschland GmbH
dc.relation.ispartofArtificial Intelligence Applications and Innovations. AIAI 2024 IFIP WG 12.5 International Workshops - MHDW 2024, 5G-PINE 2024, and AI4GD 2024, Proceedings
dc.relation.ispartofseriesIFIP Advances in Information and Communication Technology
dc.rightsinfo:eu-repo/semantics/restrictedAccess
dc.subject.keywordsBig Data
dc.subject.keywordsBig Data analysis
dc.subject.keywordsData profiling
dc.subject.keywordsData quality
dc.subject.keywordsInformation Systems and Management
dc.titleScalable Data Profiling for Quality Analytics Extractionen
dc.typeconference output
Files