PATSQL: Efficient Synthesis of SQL Queries from Example Tables with Quick Inference of Projected Columns

التفاصيل البيبلوغرافية
العنوان: PATSQL: Efficient Synthesis of SQL Queries from Example Tables with Quick Inference of Projected Columns
المؤلفون: Keita Takenouchi, Takashi Ishio, Joji Okada, Yuji Sakata
بيانات النشر: arXiv, 2020.
سنة النشر: 2020
مصطلحات موضوعية: FOS: Computer and information sciences, SQL, Theoretical computer science, Computer Science - Programming Languages, Semantics (computer science), Computer science, General Engineering, Inference, Databases (cs.DB), Relational algebra, Sketch, Software Engineering (cs.SE), Computer Science - Software Engineering, Transformation (function), Computer Science - Databases, Component (UML), Key (cryptography), computer, computer.programming_language, Programming Languages (cs.PL)
الوصف: SQL is one of the most popular tools for data analysis, and it is now used by an increasing number of users without having expertise in databases. Several studies have proposed programming-by-example approaches to help such non-experts to write correct SQL queries. While existing methods support a variety of SQL features such as aggregation and nested query, they suffer a significant increase in computational cost as the scale of example tables increases. In this paper, we propose an efficient algorithm utilizing properties known in relational algebra to synthesize SQL queries from input and output tables. Our key insight is that a projection operator in a program sketch can be lifted above other operators by applying transformation rules in relational algebra, while preserving the semantics of the program. This enables a quick inference of appropriate columns in the projection operator, which is an essential component in synthesis but causes combinatorial explosions in prior work. We also introduce a novel form of constraints and its top-down propagation mechanism for efficient sketch completion. We implemented this algorithm in our tool PATSQL and evaluated it on 226 queries from prior benchmarks and Kaggle's tutorials. As a result, PATSQL solved 68% of the benchmarks and found 89% of the solutions within a second. Our tool is available at https://naist-se.github.io/patsql/.
Comment: 13 pages, 11 figures, To be presented at the International Conference on Very Large Data Bases (VLDB) 2021
DOI: 10.48550/arxiv.2010.05807
URL الوصول: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b5a216406dab0b56d6ea2059d9ff238f
Rights: OPEN
رقم الانضمام: edsair.doi.dedup.....b5a216406dab0b56d6ea2059d9ff238f
قاعدة البيانات: OpenAIRE
الوصف
DOI:10.48550/arxiv.2010.05807