Percorrer por autor "Cardoso, Filipe"
A mostrar 1 - 5 de 5
Resultados por página
Opções de ordenação
- Design of Data Management Service Platform for Intelligent Electric Vehicle Charging Controller Multi-charger ModelPublication . Baptista, Pedro; Rosado, José; Caldeira, Filipe; Cardoso, FilipeThe electric charging solutions for the residential market imply, in many situations, an increase in the contracted power in order to allow to perform an efficient charging cycle that starts when the charger is connected and ends when the VE battery is fully charged. However, the increase in contracted power is not always the best solution for faster and more efficient charging. With a focus on the residential market, the presented architecture is suitable for single-use and shared connection points, which are becoming common in apartment buildings without a closed garage, allowing for sharing the available electrical connections to the grid. The multi-charger architecture allows using one or several common charging points by applying a mesh network of intelligent chargers orchestrated by a residential gateway. Managing the generated data load involves enabling data flow between several independent data producers and consumers. The data stream ingestion system must be scalable, resilient, and extendable.
- Performance and Scalability of Data Cleaning and Preprocessing Tools: A Benchmark on Large Real-World DatasetsPublication . Martins, Pedro; Cardoso, Filipe; Vaz, Paulo; Silva, José; Abbasi, MaryamData cleaning remains one of the most time-consuming and critical steps in modern data science, directly influencing the reliability and accuracy of downstream analytics. In this paper, we present a comprehensive evaluation of five widely used data cleaning tools—OpenRefine, Dedupe, Great Expectations, TidyData (PyJanitor), and a baseline Pandas pipeline—applied to large-scale, messy datasets spanning three domains (healthcare, finance, and industrial telemetry). We benchmark each tool on dataset sizes ranging from 1 million to 100 million records, measuring execution time, memory usage, error detection accuracy, and scalability under increasing data volumes. Additionally, we assess qualitative aspects such as usability and ease of integration, reflecting realworld adoption concerns. We incorporate recent findings on parallelized data cleaning and highlight how domain-specific anomalies (e.g., negative amounts in finance, sensor corruption in industrial telemetry) can significantly impact tool choice. Our findings reveal that no single solution excels across all metrics; while Dedupe provides robust duplicate detection and Great Expectations offers in-depth rule-based validation, tools like TidyData and baseline Pandas pipelines demonstrate strong scalability and flexibility under chunkbased ingestion. The choice of tool ultimately depends on domain-specific requirements (e.g., approximate matching in finance and strict auditing in healthcare) and the magnitude of available computational resources. By highlighting each framework’s strengths and limitations, this study offers data practitioners clear, evidence-driven guidance for selecting and combining tools to tackle large-scale data cleaning challenges
- Performance Comparison of Python-Based Complex Event Processing Engines for IoT Intrusion Detection: Faust Versus StreamzPublication . Abbasi, Maryam; Cardoso, Filipe; ANTUNES VAZ, PAULO JOAQUIM; Silva, José; Sá, Filipe; Martins, PedroThe proliferation of Internet of Things (IoT) devices has intensified the need for efficient real-time anomaly and intrusion detection, making the selection of an appropriate Complex Event Processing (CEP) engine a critical architectural decision for security-aware data pipelines. Python-based CEP frameworks offer compelling advantages through the seamless integration with data science and machine learning ecosystems; however, rigorous comparative evaluations of such frameworks under realistic IoT security workloads remain absent from the literature. This study presents the first systematic comparative evaluation of Faust and Streamz—two Python-native CEP engines representing fundamentally different architectural philosophies—specifically in the context of IoT network intrusion detection. Faust was selected for its actor-based stateful processing model with native Kafka integration and distributed table support, while Streamz was selected for its reactive, lightweight pipeline design targeting high-throughput stateless processing, making them representative of the two dominant paradigms in Python stream processing. Although both engines target different application niches, their performance characteristics under realistic CEP workloads have never been rigorously compared, leaving practitioners without empirical guidance. The primary evaluation employs an IoT network intrusion dataset comprising 583,485 events from 83 heterogeneous devices. To assess whether the observed performance characteristics are specific to this single dataset or generalize across different workload profiles, a secondary IoT-adjacent benchmark is included: the PaySim financial transaction dataset (6.4 million records), selected because its event schema, fraud-pattern temporal structure, and volume differ substantially from the intrusion dataset, providing a stress test for cross-workload robustness rather than a claim of domain equivalence. We acknowledge the reviewer’s valid point that a second IoT-specific intrusion dataset (such as TON_IoT or Bot-IoT) would constitute a more directly comparable validation; this is identified as a priority for future work. The load levels used in scalability experiments (up to 5000 events per second) intentionally exceed the dataset’s natural rate to stress-test each engine’s architectural ceiling and identify saturation thresholds relevant to large-scale or multi-sensor IoT deployments. We conducted controlled experiments with comprehensive statistical analysis. Our results demonstrate that Streamz achieves superior throughput at 4450 events per second with 89% efficiency and minimal resource consumption (40 MB memory, 12 ms median latency), while Faust provides robust intrusion pattern detection with 93–98% accuracy and stable, predictable resource utilization (1.4% CPU standard deviation). A multi-framework comparison including Apache Kafka Streams and offline scikit-learn baselines confirms that Faust achieves detection quality competitive with JVM-based alternatives (Faust: 96.2%; Kafka Streams: 96.8%; absolute difference of 0.6 percentage points, not statistically significant at p = 0.318) while retaining the Python ecosystem advantages. Statistical analysis confirms significant performance differences across all metrics (p < 0.001, Cohen’s d > 0.8). Critical scalability thresholds are identified: Streamz maintains efficiency above 95% up to 3500 events per second, while Faust degrades beyond 2500 events per second. These findings provide IoT security engineers and system architects with actionable, empirically grounded guidance for CEP engine selection, establish reproducible benchmarking methodology applicable to futurePython-based stream processing evaluations, and advance theoretical understanding of the accuracy–throughput trade-off in stateful versus stateless Python CEP architectures.
- A Practical Performance Benchmark of Post-Quantum Cryptography Across Heterogeneous Computing EnvironmentsPublication . Abbasi, Maryam; Cardoso, Filipe; Vaz, Paulo; Silva, José; Martins, PedroThe emergence of large-scale quantum computing presents an imminent threat to contemporary public-key cryptosystems, with quantum algorithms such as Shor’s algorithm capable of efficiently breaking RSA and elliptic curve cryptography (ECC). This vulnerability has catalyzed accelerated standardization efforts for post-quantum cryptography (PQC) by the U.S. National Institute of Standards and Technology (NIST) and global security stakeholders. While theoretical security analysis of these quantum-resistant algorithms has advanced considerably, comprehensive real-world performance benchmarks spanning diverse computing environments—from high-performance cloud infrastructure to severely resource-constrained IoT devices—remain insufficient for informed deployment planning. This paper presents the most extensive cross-platform empirical evaluation to date of NIST selected PQC algorithms, including CRYSTALS-Kyber and NTRU for key encapsulation mechanisms (KEMs), alongside BIKE as a code-based alternative, and CRYSTALS-Di lithium and Falcon for digital signatures. Our systematic benchmarking framework measures computational latency, memory utilization, key sizes, and protocol overhead across multiple security levels (NIST Levels 1, 3, and 5) in three distinct hardware environments and various network conditions. Results demonstrate that contemporary server architectures can implement these algorithms with negligible performance impact (<5% additional latency), making immediate adoption feasible for cloud services. In contrast, resource-constrained devices experience more significant overhead, with computational demands varying by up to 12× between algorithms at equivalent security levels, highlighting the importance of algorithm selection for edge deployments. Beyond standalone algorithm performance, we analyze integration challenges within existing security protocols, revealing that naive implementation of PQC in TLS 1.3 can increase handshake size by up to 7× compared to classical approaches. To address this, we propose and evaluate three optimization strategies that reduce bandwidth requirements by 40–60% without compromising security guarantees. Our investigation further encompasses memory-constrained implementation techniques, side-channel resistance measures, and hybrid classical-quantum approaches for transitional deployments. Based on these comprehensive findings, we present a risk based migration framework and algorithm selection guidelines tailored to specific use cases, including financial transactions, secure firmware updates, vehicle-to-infrastructure communications, and IoT fleet management. This practical roadmap enables organizations to strategically prioritize systems for quantum-resistant upgrades based on data sensitivity, resource constraints, and technical feasibility. Our results conclusively demonstrate that PQC is deployment-ready for most applications, provided that implementations are carefully optimized for the specific performance characteristics and security requirements of target environments. We also identify several remaining research challenges for the community, including further optimization for ultra-constrained devices, standardization of hybrid schemes, and hardware acceleration opportunities.
- Unified Data Governance in Heterogeneous Database Environments: An API-Driven Architecture for Multi-Platform Policy EnforcementPublication . Abbasi, Maryam; ANTUNES VAZ, PAULO JOAQUIM; Silva, José; Cardoso, Filipe; Sá, Filipe; Martins, Pedro; Cardoso, Filipe; Sá, Filipe; Martins, PedroModern organizations increasingly rely on heterogeneous database environments that combine relational, document-oriented, and key-value storage systems to optimize performance for diverse application requirements. However, this technological diversity creates significant challenges for implementing consistent data governance policies, regulatory compliance, and access control across disparate systems. Traditional governance approaches that operate within individual database silos fail to provide unified policy enforcement and create compliance gaps that expose organizations to regulatory and operational risks. This paper presents a novel API-driven architecture that enables unified data governance across heterogeneous database environments without requiring database-specific modifications or vendor lock-in. The proposed framework implements a centralized governance layer that coordinates policy enforcement across PostgreSQL, MongoDB, and Amazon DynamoDB systems through RESTful API interfaces. Key architectural components include differentiated access control through hierarchical API key management, automated compliance workflows for regulatory requirements such as GDPR, real-time audit trail generation, and comprehensive data quality monitoring with automated improvement mechanisms. Comprehensive experimental evaluation demonstrates the framework’s effectiveness across multiple operational dimensions. The system achieved 95.2% accuracy in access control enforcement across different data classification levels, while automated GDPR compliance workflows demonstrated 98.6% success rates with average processing times of 2.9 h. Performance evaluation reveals acceptable overhead characteristics with linear scaling patterns for PostgreSQL operations (R2 = 0.89), consistent sub-20ms response times for MongoDB logging operations, and sustained throughput rates ranging from 38.9 to 142.7 requests per second across the integrated system. Data quality improvements ranged from 16.1% to 34.3% across accuracy, completeness, consistency, and timeliness dimensions over a 12-week monitoring period, with accuracy improving by 17.8 percentage points, completeness by 13.2 percentage points, consistency by 19.7 percentage points, and timeliness by 24.5 percentage points. The duplicate detection system achieved 94.6% precision and 95.6% recall across various duplicate types, including cross-database redundancy identification. The results demonstrate that API-driven governance architectures can effectively address the persistent challenges of policy fragmentation in multi-database environments while maintaining operational performance and enabling measurable improvements in data quality and regulatory compliance. The framework provides a practical migration path for organizations seeking to implement comprehensive governance capabilities without replacing existing database infrastructure investments.
