Repository logo

Instituto Politécnico de Viseu

Scientific Repository

 

Welcome to the Polytechnic Institute of Viseu Institutional Repository

The aim of this Repository is to give greater visibility to the scientific production of the academic community of the Polytechnic Institute of Viseu, by increasing the impact and use through Open Access, ensuring the storage and preservation of all research produced in its organic units and research centres.

Recent Submissions

Performance and Scalability of Data Cleaning and Preprocessing Tools: A Benchmark on Large Real-World Datasets
Publication . Martins, Pedro; Cardoso, Filipe; Vaz, Paulo; Silva, José; Abbasi, Maryam
Data cleaning remains one of the most time-consuming and critical steps in modern data science, directly influencing the reliability and accuracy of downstream analytics. In this paper, we present a comprehensive evaluation of five widely used data cleaning tools—OpenRefine, Dedupe, Great Expectations, TidyData (PyJanitor), and a baseline Pandas pipeline—applied to large-scale, messy datasets spanning three domains (healthcare, finance, and industrial telemetry). We benchmark each tool on dataset sizes ranging from 1 million to 100 million records, measuring execution time, memory usage, error detection accuracy, and scalability under increasing data volumes. Additionally, we assess qualitative aspects such as usability and ease of integration, reflecting realworld adoption concerns. We incorporate recent findings on parallelized data cleaning and highlight how domain-specific anomalies (e.g., negative amounts in finance, sensor corruption in industrial telemetry) can significantly impact tool choice. Our findings reveal that no single solution excels across all metrics; while Dedupe provides robust duplicate detection and Great Expectations offers in-depth rule-based validation, tools like TidyData and baseline Pandas pipelines demonstrate strong scalability and flexibility under chunkbased ingestion. The choice of tool ultimately depends on domain-specific requirements (e.g., approximate matching in finance and strict auditing in healthcare) and the magnitude of available computational resources. By highlighting each framework’s strengths and limitations, this study offers data practitioners clear, evidence-driven guidance for selecting and combining tools to tackle large-scale data cleaning challenges
Designing and implementing an inclusive peer mentoring program in higher education
Publication . Antunes, Sandra; Oliveira, Isabel; Guedes, Anabela; Marques dos Santos, Paula Alexandra
Competitividade
Publication . Marques dos Santos, Paula Alexandra
THE SUSTAINABILITY OF REGIONS AS A CHALLENGE FOR HIGHER EDUCATION - THE PROBLEM-SOLVING METHODOLOGY IN TEACHING CULTURAL MANAGEMENT. THE CASE OF THE 'PORTO COMERCIAL DE CAMBRES'
Publication . Marques dos Santos, Paula Alexandra; Daniela Santos; Mariana Lopes; Mónica Silva
Teaching in higher education is a constant challenge, as it requires not only permanent updating, but also the search for and implementation of the most appropriate methodologies that enable the transmission of scientific and technical knowledge to students and their training to apply abstract knowledge to concrete situations in the future profession. The project proposal we present in our paper, was developed as part of the Cultural Management unit’s assessment. Its main objective was to challenge the students to apply the knowledge they had acquired and to develop feasible proposals for the sustainability of the territory and profitability of all the available resources. Therefore, in this paper we present not only the learning path taken throughout the semester, but also the results achieved. The cultural management project proposal for the Commercial Port of Cambres that we intend to present succeeds in transforming this facility into a driving force for the development of the entire surrounding area and, through the action proposal, it will also contribute to tourism sustainability itself, creating synergies between the built cultural heritage and the natural heritage of the municipality of Lamego, located in the Douro demarcated wine region, a world heritage site, since 2001, in Portugal. We also intend to demonstrate how the construction of the cultural management project contributed to consolidating the students' knowledge of the content covered and to increasing their confidence in preparing them for the labour market.