RT Conference Proceedings T1 Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy A1 Niño-Adan, Iratxe A1 Landa-Torres, Itziar A1 Portillo, Eva A1 Manjarres, Diana A2 Martínez Álvarez, Francisco A2 Troncoso Lora, Alicia A2 Quintián, Héctor A2 Sáez Muñoz, José António A2 Corchado, Emilio AB Normalization methods are widely employed for transforming the variables or features of a given dataset. In this paper three classical feature normalization methods, Standardization (St), Min-Max (MM) and Median Absolute Deviation (MAD), are studied in different synthetic datasets from UCI repository. An exhaustive analysis of the transformed features’ ranges and their influence on the Euclidean distance is performed, concluding that knowledge about the group structure gathered by each feature is needed to select the best normalization method for a given dataset. In order to effectively collect the features’ importance and adjust their contribution, this paper proposes a two-stage methodology for normalization and supervised feature weighting based on a Pearson correlation coefficient and on a Random Forest Feature Importance estimation method. Simulations on five different datasets reveal that our two-stage proposed methodology, in terms of accuracy, outperforms or at least maintains the K-means performance obtained if only normalization is applied. PB Springer Verlag SN 9783030200541 SN 2194-5357 YR 2020 FD 2020 LK https://hdl.handle.net/11556/2102 UL https://hdl.handle.net/11556/2102 LA eng NO Niño-Adan , I , Landa-Torres , I , Portillo , E & Manjarres , D 2020 , Analysis and Application of Normalization Methods with Supervised Feature Weighting to Improve K-means Accuracy . in F Martínez Álvarez , A Troncoso Lora , H Quintián , J A Sáez Muñoz & E Corchado (eds) , 14th International Conference on Soft Computing Models in Industrial and Environmental Applications SOCO 2019, Proceedings . Advances in Intelligent Systems and Computing , vol. 950 , Springer Verlag , pp. 14-24 , 14th International Conference on Soft Computing Models in Industrial and Environmental Applications, SOCO 2019 , Seville , Spain , 13/05/19 . https://doi.org/10.1007/978-3-030-20055-8_2 NO conference NO Publisher Copyright: © 2020, Springer Nature Switzerland AG. NO Acknowledgement. This work has been supported in part by the ELKARTEK program (SeNDANEU KK-2018/00032), the HAZITEK program (DATALYSE ZL-2018/00765) of the Basque Government and a TECNALIA Research and Innovation PhD Scholarship. DS TECNALIA Publications RD 28 jul 2024