Name: | Description: | Size: | Format: | |
---|---|---|---|---|
1.6 MB | Adobe PDF |
Authors
Abstract(s)
A Diabetes Mellitus é uma das doenças crónicas com crescimento mais acele rado no mundo, demandando soluções eficazes para diagnóstico e prevenção. Neste
contexto, técnicas de Machine Learning (ML) apresentam potencial significativo na
identificação de padrões relevantes ao controlo da doença. Este estudo utilizou a
metodologia CRISP-DM para analisar dados do Diabetes Health Indicators Dataset,
contendo informações sociodemográficas, clínicas e comportamentais.
Na fase de pré-processamento, aplicou-se o equilíbrio de classes por subamos tragem (NearMiss) devido à baixa proporção de indivíduos diabéticos. Técnicas de
seleção de características, como Eliminação Recursiva de Características (RFE) e
Análise de Componentes Principais (PCA), foram utilizadas para avaliar a relevân cia das variáveis e reduzir a dimensionalidade. Avaliaram-se seis modelos: Floresta
Aleatória, Gradient Boosting, KNN, Regressão Logística, Perceptron Multicamadas
(MLP) e Redes Neuronais Recorrentes (RNN).
Os resultados mostraram que o equilíbrio das classes melhorou significativamente
o desempenho, destacando-se a RNN, com acurácia acima de 86% e F1-score próximo
a 0,87. A combinação da seleção RFE com MLP também apresentou resultados
robustos. Conclui-se que ML e DL são promissores para priorizar acompanhamento
clínico e apoiar políticas públicas, sendo necessário ampliar a representatividade
dos dados, incorporar técnicas de Explainable AI para maior interpretabilidade, e
ajustar limiares decisórios visando minimizar falsos negativos.
Diabetes Mellitus is one of the fastest growing chronic diseases in the world, requiring effective solutions for diagnosis and prevention. In this context, Machine Learning (ML) techniques have significant potential for identifying patterns relevant to disease control. This study used the CRISP-DM methodology to analyze data from the Diabetes Health Indicators Dataset, containing sociodemographic, clinical and behavioral information. In the pre-processing phase, class balancing by undersampling (NearMiss) was applied due to the low proportion of diabetic individuals. Feature selection te chniques, such as Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA), were used to assess the relevance of the variables and reduce di mensionality. Six models were evaluated: Random Forest, Gradient Boosting, KNN, Logistic Regression, Multilayer Perceptron (MLP) and Recurrent Neural Networks (RNN). The results showed that class balancing significantly improved performance, with RNN standing out with accuracy above 86% and an F1-score near 0.87. The combi nation of RFE feature selection with MLP also yielded robust results. It is concluded that ML and DL are promising for prioritizing clinical follow-up and supporting pu blic policies. However, it is necessary to increase data representativeness, incorporate Explainable AI techniques for greater interpretability, and adjust decision-making thresholds aiming to minimize false negatives.
Diabetes Mellitus is one of the fastest growing chronic diseases in the world, requiring effective solutions for diagnosis and prevention. In this context, Machine Learning (ML) techniques have significant potential for identifying patterns relevant to disease control. This study used the CRISP-DM methodology to analyze data from the Diabetes Health Indicators Dataset, containing sociodemographic, clinical and behavioral information. In the pre-processing phase, class balancing by undersampling (NearMiss) was applied due to the low proportion of diabetic individuals. Feature selection te chniques, such as Recursive Feature Elimination (RFE) and Principal Component Analysis (PCA), were used to assess the relevance of the variables and reduce di mensionality. Six models were evaluated: Random Forest, Gradient Boosting, KNN, Logistic Regression, Multilayer Perceptron (MLP) and Recurrent Neural Networks (RNN). The results showed that class balancing significantly improved performance, with RNN standing out with accuracy above 86% and an F1-score near 0.87. The combi nation of RFE feature selection with MLP also yielded robust results. It is concluded that ML and DL are promising for prioritizing clinical follow-up and supporting pu blic policies. However, it is necessary to increase data representativeness, incorporate Explainable AI techniques for greater interpretability, and adjust decision-making thresholds aiming to minimize false negatives.
Description
Keywords
Diabetes Mellitus Deep Learning Machine Learning Redes Neuronais Recorrentes Seleção de Características Diabetes Mellitus Machine Learning Deep Learning Recurrent Neural Networks Feature Selection