Name: | Description: | Size: | Format: | |
---|---|---|---|---|
4.1 MB | Adobe PDF |
Advisor(s)
Abstract(s)
The increasing complexity of managing modern database systems, particularly in terms
of optimizing query performance for large datasets, presents significant challenges that traditional
methods often fail to address. This paper proposes a comprehensive framework for integrating
advanced machine learning (ML) models within the architecture of a database management system
(DBMS), with a specific focus on PostgreSQL. Our approach leverages a combination of supervised
and unsupervised learning techniques to predict query execution times, optimize performance, and
dynamically manage workloads. Unlike existing solutions that address specific optimization tasks
in isolation, our framework provides a unified platform that supports real-time model inference
and automatic database configuration adjustments based on workload patterns. A key contribution
of our work is the integration of ML capabilities directly into the DBMS engine, enabling seamless
interaction between the ML models and the query optimization process. This integration allows for
the automatic retraining of models and dynamic workload management, resulting in substantial
improvements in both query response times and overall system throughput. Our evaluations using
the Transaction Processing Performance Council Decision Support (TPC-DS) benchmark dataset at
scale factors of 100 GB, 1 TB, and 10 TB demonstrate a reduction of up to 42% in query execution
times and a 74% improvement in throughput compared with traditional approaches. Additionally,
we address challenges such as potential conflicts in tuning recommendations and the performance
overhead associated with ML integration, providing insights for future research directions. This
study is motivated by the need for autonomous tuning mechanisms to manage large-scale, hetero geneous workloads while answering key research questions, such as the following: (1) How can
machine learning models be integrated into a DBMS to improve query optimization and workload
management? (2) What performance improvements can be achieved through dynamic configuration
tuning based on real-time workload patterns? Our results suggest that the proposed framework
significantly reduces the need for manual database administration while effectively adapting to
evolving workloads, offering a robust solution for modern large-scale data environments.
Description
Keywords
machine learning integration database optimization query performance dynamic workload management PostgreSQL real-time system tuning
Citation
Abbasi, M., Bernardo, M. V., Váz, P., Silva, J., & Martins, P. (2024). Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case Study. Information, 15(9), 574. https://doi.org/10.3390/info15090574