ESTGV - DEMGI - Artigo em revista científica, indexada ao WoS/Scopus

URI permanente para esta coleção:

http://hdl.handle.net/10400.19/2475

Navegar

A mostrar 1 - 10 de 13

Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study
Publication . Abbasi, Maryam; Bernardo, Marco V.; Vaz, Paulo; Silva, José; Martins, Pedro; ANTUNES VAZ, PAULO JOAQUIM; Silva, José
The increasing complexity of managing modern database systems, particularly in terms of optimizing query performance for large datasets, presents significant challenges that traditional methods often fail to address. This paper proposes a comprehensive framework for integrating advanced machine learning (ML) models within the architecture of a database management system (DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised and unsupervised learning techniques to predict query execution times, optimize performance, and dynamically manage workloads. Unlike existing solutions that address specific optimization tasks in isolation, our framework provides a unified platform that supports real-time model inference and automatic database configuration adjustments based on workload patterns. A key contribution of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless interaction between the ML models and the query optimization process. This integration allows for the automatic retraining of models and dynamic workload management, resulting in substantial improvements in both query response times and overall system throughput. Our evaluations using the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution times and a 74% improvement in throughput compared with traditional approaches. Additionally, we address challenges such as potential conflicts in tuning recommendations and the performance overhead associated with ML integration, providing insights for future research directions. This study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can machine learning models be integrated into a DBMS to improve query optimization and workload management? (2) What performance improvements can be achieved through dynamic configuration tuning based on real-time workload patterns? Our results suggest that the proposed framework significantly reduces the need for manual database administration while effectively adapting to evolving workloads, offering a robust solution for modern large-scale data environments.
2024-09-18Texto Acesso aberto Ver mais
Comprehensive Evaluation of Deepfake Detection Models: Accuracy, Generalization, and Resilience to Adversarial Attacks
Publication . Abbasi, Maryam; ANTUNES VAZ, PAULO JOAQUIM; Silva, José; Martins, Pedro
The rise of deepfakes—synthetic media generated using artificial intelli gence—threatens digital content authenticity, facilitating misinformation and manipu lation. However, deepfakes can also depict real or entirely fictitious individuals, leveraging state-of-the-art techniques such as generative adversarial networks (GANs) and emerging diffusion-based models. Existing detection methods face challenges with generalization across datasets and vulnerability to adversarial attacks. This study focuses on subsets of frames extracted from the DeepFake Detection Challenge (DFDC) and FaceForensics++ videos to evaluate three convolutional neural network architectures—XCeption, ResNet, and VGG16—for deepfake detection. Performance metrics include accuracy, precision, F1-score, AUC-ROC, and Matthews Correlation Coefficient (MCC), combined with an assessment of resilience to adversarial perturbations via the Fast Gradient Sign Method (FGSM). Among the tested models, XCeption achieves the highest accuracy (89.2% on DFDC), strong generalization, and real-time suitability, while VGG16 excels in precision and ResNet provides faster inference. However, all models exhibit reduced performance under adversarial conditions, underscoring the need for enhanced resilience. These find ings indicate that robust detection systems must consider advanced generative approaches, adversarial defenses, and cross-dataset adaptation to effectively counter evolving deep fake threats
2025-01-25Texto Acesso aberto Ver mais
Data Privacy and Ethical Considerations in Database Management
Publication . Pina, Eduardo; Ramos, José; Jorge, Henrique; ANTUNES VAZ, PAULO JOAQUIM; Vaz, Paulo; Silva, José; Wanzeller, Cristina; Abbasi, Maryam; Martins, Pedro; Silva, José; Wanzeller Guedes de Lacerda, Ana Cristina
Data privacy and ethical considerations ensure the security of databases by respecting individual rights while upholding ethical considerations when collecting, managing, and using information. Nowadays, despite having regulations that help to protect citizens and organizations, we have been presented with thousands of instances of data breaches, unauthorized access, and misuse of data related to such individuals and organizations. In this paper, we propose ethical considerations and best practices associated with critical data and the role of the database administrator who helps protect data. First, we suggest best practices for database administrators regarding data minimization, anonymization, pseudonymization and encryption, access controls, data retention guidelines, and stakeholder communication. Then, we present a case study that illustrates the application of these ethical implementations and best practices in a real-world scenario, showing the approach in action and the benefits of privacy. Finally, the study highlights the importance of a comprehensive approach to deal with data protection challenges and provides valuable insights for future research and developments in this field
2024-07-26Texto Acesso aberto Ver mais
Enhancing Visual Perception in Immersive VR and AR Environments: AI-Driven Color and Clarity Adjustments Under Dynamic Lighting Conditions
Publication . Abbasi, Maryam; Silva, José; Martins, Pedro; ANTUNES VAZ, PAULO JOAQUIM; Silva, José
The visual fidelity of virtual reality (VR) and augmented reality (AR) environments is essential for user immersion and comfort. Dynamic lighting often leads to chromatic distortions and reduced clarity, causing discomfort and disrupting user experience. This paper introduces an AI-driven chromatic adjustment system based on a modified U-Net architecture, optimized for real-time applications in VR/AR. This system adapts to dynamic lighting conditions, addressing the shortcomings of traditional methods like histogram equalization and gamma correction, which struggle with rapid lighting changes and real-time user interactions. We compared our approach with state-of-the-art color constancy algorithms, including Barron’s Convolutional Color Constancy and STAR, demonstrating superior performance. Experimental results from 60 participants show significant improvements, with up to 41% better color accuracy and 39% enhanced clarity under dynamic lighting conditions. The study also included eye-tracking data, which confirmed increased user engagement with AI-enhanced images. Our system provides a practical solution for developers aiming to improve image quality, reduce visual discomfort, and enhance overall user satisfaction in immersive environments. Future work will focus on extending the model’s capability to handle more complex lighting scenarios.
2024-11-03Texto Acesso aberto Ver mais
Head-to-Head Evaluation of FDM and SLA in Additive Manufacturing: Performance, Cost, and Environmental Perspectives
Publication . Abbasi, Maryam; ANTUNES VAZ, PAULO JOAQUIM; Martins, Pedro; Silva, José
This paper conducts a comprehensive experimental comparison of two widely used additive manufacturing (AM) processes, Fused Deposition Modeling (FDM) and Stereolithography (SLA), under standardized conditions using the same test geometries and protocols. FDM parts were printed with both Polylactic Acid (PLA) and Acryloni trile Butadiene Styrene (ABS) filaments, while SLA used a general-purpose photopolymer resin. Quantitative evaluations included surface roughness, dimensional accuracy, ten sile properties, production cost, and energy consumption. Additionally, environmental considerations and process reliability were assessed by examining waste streams, recy clability, and failure rates. The results indicate that SLA achieves superior surface quality (Ra ≈ 2 µm vs. 12–13 µm) and dimensional tolerances (±0.05 mm vs. ±0.15–0.20 mm), along with higher tensile strength (up to 70 MPa). However, FDM provides notable ad vantages in cost (approximately 60% lower on a per-part basis), production speed, and energy efficiency. Moreover, from an environmental perspective, FDM is more favorable when using biodegradable PLA or recyclable ABS, whereas SLA resin waste is hazardous. Overall, the study highlights that no single process is universally superior. FDM offers a rapid, cost-effective solution for prototyping, while SLA excels in precision and surface finish. By presenting a detailed, data-driven comparison, this work guides engineers, product designers, and researchers in choosing the most suitable AM technology for their specific needs.
2025-02-19Texto Acesso aberto Ver mais
Machine Learning Approaches for Predicting Maize Biomass Yield: Leveraging Feature Engineering and Comprehensive Data Integration
Publication . Abbasi, Maryam; Vaz, Paulo; Silva, José; Martins, Pedro; Silva, José; ANTUNES VAZ, PAULO JOAQUIM
The efficient prediction of corn biomass yield is critical for optimizing crop production and addressing global challenges in sustainable agriculture and renewable energy. This study employs advanced machine learning techniques, including Gradient Boosting Machines (GBMs), Random Forests (RFs), Support Vector Machines (SVMs), and Artificial Neural Networks (ANNs), integrated with comprehensive environmental, soil, and crop management data from key agricultural regions in the United States. A novel framework combines feature engineering, such as the creation of a Soil Fertility Index (SFI) and Growing Degree Days (GDDs), and the incorporation of interaction terms to address complex non-linear relationships between input variables and biomass yield. We conduct extensive sensitivity analysis and employ SHAP (SHapley Additive exPlanations) values to enhance model interpretability, identifying SFI, GDDs, and cumulative rainfall as the most influential features driving yield outcomes. Our findings highlight significant synergies among these variables, emphasizing their critical role in rural environmental governance and precision agriculture. Furthermore, an ensemble approach combining GBMs, RFs, and ANNs outperformed individual models, achieving an RMSE of 0.80 t/ha and R2 of 0.89. These results underscore the potential of hybrid modeling for real-world applications in sustainable farming practices. Addressing the concerns of passive farmer participation, we propose targeted incentives, education, and institutional support mechanisms to enhance stakeholder collaboration in rural environmental governance. While the models assume rational decision-making, the inclusion of cultural and political factors warrants further investigation to improve the robustness of the framework. Additionally, a map of the study region and improved visualizations of feature importance enhance the clarity and relevance of our findings. This research contributes to the growing body of knowledge on predictive modeling in agriculture, combining theoretical rigor with practical insights to support policymakers and stakeholders in optimizing resource use and addressing environ mental challenges. By improving the interpretability and applicability of machine learning models, this study provides actionable strategies for enhancing crop yield predictions and advancing rural environmental governance.
2025-01-02Texto Acesso aberto Ver mais
Optimizing Database Performance in Complex Event Processing through Indexing Strategies
Publication . Abbasi, Maryam; Bernardo, Marco V.; ANTUNES VAZ, PAULO JOAQUIM; Silva, José; Martins, Pedro
Complex event processing (CEP) systems have gained significant importance in various domains, such as finance, logistics, and security, where the real-time analysis of event streams is crucial. However, as the volume and complexity of event data continue to grow, optimizing the performance of CEP systems becomes a critical challenge. This paper investigates the impact of indexing strategies on the performance of databases handling complex event processing. We propose a novel indexing technique, called Hierarchical Temporal Indexing (HTI), specifically designed for the efficient processing of complex event queries. HTI leverages the temporal nature of event data and employs a multi-level indexing approach to optimize query execution. By combining temporal indexing with spatial- and attribute-based indexing, HTI aims to accelerate the retrieval and processing of relevant events, thereby improving overall query performance. In this study, we evaluate the effectiveness of HTI by implementing complex event queries on various CEP systems with different indexing strategies. We conduct a comprehensive performance analysis, measuring the query execution times and resource utilization (CPU, memory, etc.), and analyzing the execution plans and query optimization techniques employed by each system. Our experimental results demonstrate that the proposed HTI indexing strategy outperforms traditional indexing approaches, particularly for complex event queries involving temporal constraints and multi-dimensional event attributes. We provide insights into the strengths and weaknesses of each indexing strategy, identifying the factors that influence performance, such as data volume, query complexity, and event characteristics. Furthermore, we discuss the implications of our findings for the design and optimization of CEP systems, offering recommendations for indexing strategy selection based on the specific requirements and workload characteristics. Finally, we outline the potential limitations of our study and suggest future research directions in this domain.
2024Texto Acesso aberto Ver mais
Performance and Scalability of Data Cleaning and Preprocessing Tools: A Benchmark on Large Real-World Datasets
Publication . Martins, Pedro; Cardoso, Filipe; Vaz, Paulo; Silva, José; Abbasi, Maryam
Data cleaning remains one of the most time-consuming and critical steps in modern data science, directly influencing the reliability and accuracy of downstream analytics. In this paper, we present a comprehensive evaluation of five widely used data cleaning tools—OpenRefine, Dedupe, Great Expectations, TidyData (PyJanitor), and a baseline Pandas pipeline—applied to large-scale, messy datasets spanning three domains (healthcare, finance, and industrial telemetry). We benchmark each tool on dataset sizes ranging from 1 million to 100 million records, measuring execution time, memory usage, error detection accuracy, and scalability under increasing data volumes. Additionally, we assess qualitative aspects such as usability and ease of integration, reflecting realworld adoption concerns. We incorporate recent findings on parallelized data cleaning and highlight how domain-specific anomalies (e.g., negative amounts in finance, sensor corruption in industrial telemetry) can significantly impact tool choice. Our findings reveal that no single solution excels across all metrics; while Dedupe provides robust duplicate detection and Great Expectations offers in-depth rule-based validation, tools like TidyData and baseline Pandas pipelines demonstrate strong scalability and flexibility under chunkbased ingestion. The choice of tool ultimately depends on domain-specific requirements (e.g., approximate matching in finance and strict auditing in healthcare) and the magnitude of available computational resources. By highlighting each framework’s strengths and limitations, this study offers data practitioners clear, evidence-driven guidance for selecting and combining tools to tackle large-scale data cleaning challenges
2025-05-18Texto Acesso aberto Ver mais
Performance Comparison of Python-Based Complex Event Processing Engines for IoT Intrusion Detection: Faust Versus Streamz
Publication . Abbasi, Maryam; Cardoso, Filipe; ANTUNES VAZ, PAULO JOAQUIM; Silva, José; Sá, Filipe; Martins, Pedro
The proliferation of Internet of Things (IoT) devices has intensified the need for efficient real-time anomaly and intrusion detection, making the selection of an appropriate Complex Event Processing (CEP) engine a critical architectural decision for security-aware data pipelines. Python-based CEP frameworks offer compelling advantages through the seamless integration with data science and machine learning ecosystems; however, rigorous comparative evaluations of such frameworks under realistic IoT security workloads remain absent from the literature. This study presents the first systematic comparative evaluation of Faust and Streamz—two Python-native CEP engines representing fundamentally different architectural philosophies—specifically in the context of IoT network intrusion detection. Faust was selected for its actor-based stateful processing model with native Kafka integration and distributed table support, while Streamz was selected for its reactive, lightweight pipeline design targeting high-throughput stateless processing, making them representative of the two dominant paradigms in Python stream processing. Although both engines target different application niches, their performance characteristics under realistic CEP workloads have never been rigorously compared, leaving practitioners without empirical guidance. The primary evaluation employs an IoT network intrusion dataset comprising 583,485 events from 83 heterogeneous devices. To assess whether the observed performance characteristics are specific to this single dataset or generalize across different workload profiles, a secondary IoT-adjacent benchmark is included: the PaySim financial transaction dataset (6.4 million records), selected because its event schema, fraud-pattern temporal structure, and volume differ substantially from the intrusion dataset, providing a stress test for cross-workload robustness rather than a claim of domain equivalence. We acknowledge the reviewer’s valid point that a second IoT-specific intrusion dataset (such as TON_IoT or Bot-IoT) would constitute a more directly comparable validation; this is identified as a priority for future work. The load levels used in scalability experiments (up to 5000 events per second) intentionally exceed the dataset’s natural rate to stress-test each engine’s architectural ceiling and identify saturation thresholds relevant to large-scale or multi-sensor IoT deployments. We conducted controlled experiments with comprehensive statistical analysis. Our results demonstrate that Streamz achieves superior throughput at 4450 events per second with 89% efficiency and minimal resource consumption (40 MB memory, 12 ms median latency), while Faust provides robust intrusion pattern detection with 93–98% accuracy and stable, predictable resource utilization (1.4% CPU standard deviation). A multi-framework comparison including Apache Kafka Streams and offline scikit-learn baselines confirms that Faust achieves detection quality competitive with JVM-based alternatives (Faust: 96.2%; Kafka Streams: 96.8%; absolute difference of 0.6 percentage points, not statistically significant at p = 0.318) while retaining the Python ecosystem advantages. Statistical analysis confirms significant performance differences across all metrics (p < 0.001, Cohen’s d > 0.8). Critical scalability thresholds are identified: Streamz maintains efficiency above 95% up to 3500 events per second, while Faust degrades beyond 2500 events per second. These findings provide IoT security engineers and system architects with actionable, empirically grounded guidance for CEP engine selection, establish reproducible benchmarking methodology applicable to futurePython-based stream processing evaluations, and advance theoretical understanding of the accuracy–throughput trade-off in stateful versus stateless Python CEP architectures.
2026-03-23Texto Acesso aberto Ver mais
A Practical Performance Benchmark of Post-Quantum Cryptography Across Heterogeneous Computing Environments
Publication . Abbasi, Maryam; Cardoso, Filipe; Vaz, Paulo; Silva, José; Martins, Pedro
The emergence of large-scale quantum computing presents an imminent threat to contemporary public-key cryptosystems, with quantum algorithms such as Shor’s algorithm capable of efficiently breaking RSA and elliptic curve cryptography (ECC). This vulnerability has catalyzed accelerated standardization efforts for post-quantum cryptography (PQC) by the U.S. National Institute of Standards and Technology (NIST) and global security stakeholders. While theoretical security analysis of these quantum-resistant algorithms has advanced considerably, comprehensive real-world performance benchmarks spanning diverse computing environments—from high-performance cloud infrastructure to severely resource-constrained IoT devices—remain insufficient for informed deployment planning. This paper presents the most extensive cross-platform empirical evaluation to date of NIST selected PQC algorithms, including CRYSTALS-Kyber and NTRU for key encapsulation mechanisms (KEMs), alongside BIKE as a code-based alternative, and CRYSTALS-Di lithium and Falcon for digital signatures. Our systematic benchmarking framework measures computational latency, memory utilization, key sizes, and protocol overhead across multiple security levels (NIST Levels 1, 3, and 5) in three distinct hardware environments and various network conditions. Results demonstrate that contemporary server architectures can implement these algorithms with negligible performance impact (<5% additional latency), making immediate adoption feasible for cloud services. In contrast, resource-constrained devices experience more significant overhead, with computational demands varying by up to 12× between algorithms at equivalent security levels, highlighting the importance of algorithm selection for edge deployments. Beyond standalone algorithm performance, we analyze integration challenges within existing security protocols, revealing that naive implementation of PQC in TLS 1.3 can increase handshake size by up to 7× compared to classical approaches. To address this, we propose and evaluate three optimization strategies that reduce bandwidth requirements by 40–60% without compromising security guarantees. Our investigation further encompasses memory-constrained implementation techniques, side-channel resistance measures, and hybrid classical-quantum approaches for transitional deployments. Based on these comprehensive findings, we present a risk based migration framework and algorithm selection guidelines tailored to specific use cases, including financial transactions, secure firmware updates, vehicle-to-infrastructure communications, and IoT fleet management. This practical roadmap enables organizations to strategically prioritize systems for quantum-resistant upgrades based on data sensitivity, resource constraints, and technical feasibility. Our results conclusively demonstrate that PQC is deployment-ready for most applications, provided that implementations are carefully optimized for the specific performance characteristics and security requirements of target environments. We also identify several remaining research challenges for the community, including further optimization for ultra-constrained devices, standardization of hybrid schemes, and hardware acceleration opportunities.
2025-05-21Texto Acesso aberto Ver mais

Navegar

Percorrer ESTGV - DEMGI - Artigo em revista científica, indexada ao WoS/Scopus por autor "Abbasi, Maryam"

Resultados por página

Opções de ordenação