Abstract
Improving code quality without changing its functionality, e.g., by refactoring or optimization, is an everyday programming activity. Good programming practice requires that each such change should be followed by a check if the change really preserves the code behavior. If such a check is performed by testing, it can be time consuming and still cannot guarantee the absence of differences in behavior between two versions of the code. Hence, tools that could automatically verify code equivalence would be of great help. An area that we are focused on is embedded sql programming. There are a number of approaches for dealing with equivalence of either pairs of imperative code fragments or pairs of sql statements. However, in database-driven applications, simultaneous changes (changes that include both sql and a host language code) are also present and important. Such changes can preserve the overall equivalence without preserving equivalence of these two parts considered separately. In this paper, we propose an automated approach for dealing with equivalence of programs after such changes, a problem that is hardly tackled in literature. Our approach uses our custom first-order logic modeling of sql queries that corresponds to imperative semantics. The approach generates equivalence conditions that can be efficiently checked using smt solvers or first-order logic provers. We implemented the proposed approach as a framework sqlav, which is publicly available and open source.
Similar content being viewed by others
Notes
In presence of recursive functions, at most k calls are applicable.
Developer survey results for 2019, StackOverflow https://insights.stackoverflow.com/survey/2019This conclusion is based on answers given by 87,354 developers.
Survey results can be found here: https://scalegrid.io/blog/2019-database-trends-sql-vs-nosql-top-databases-single-vs-multiple-database-use/
Including both primary and forked repositories.
At this point we do not specify the type of numbers (apart from assumption that for each number n it holds n ≥ 0 or n < 0). However, in our implementation we treat this type as integer or fixed width bit-vector integer.
Unary plus corresponds to the unary plus in sql, but could be omitted.
If names of columns in different tables are not different, table names are concatenated to column names, making names of uninterpreted functions unique.
Equivalence is possible if, for example, one code has a part of the common filter implemented on the host language level. However, such situations are seldom, as filters are usually implemented as part of sql queries.
This should not be considered as the comparison of solvers Z3 and Vampire.
References
Abiteboul, S., Hull, R., & Vianu, V. (Eds.). (1995). Foundations of databases: the logical level. Boston: Addison-Wesley.
Altibase. (2020). Altibase enterprise grade open source database. http://altibase.com/.
Amtoft, T., Bandhakavi, S., & Banerjee, A. (2006). A logic for information flow in object-oriented programs. In POPL, pp. 91–102. ACM. https://doi.org/10.1145/1111037.1111046.
Auerbach, J. S., Hirzel, M., Mandel, L., Shinnar, A., & Siméon, J. (2017). Handling env. in a nested relational algebra with combinators and an implementation in a verified query compiler. In SIGMOD, pp. 1555–1569. ACM. https://doi.org/10.1145/3035918.3035961.
Babic, D., & Hu, A. J. (2008). Calysto: scalable and precise extended static checking. In ICSE, pp. 211–220. ACM. https://doi.org/10.1145/1368088.1368118.
Backes, J., Person, S., Rungta, N., & Tkachuk, O. (2013). Regression verification using impact summaries. In SPIN, pp. 99–116. https://doi.org/10.1007/978-3-642-39176-7_7.
Barnett, M., & Leino, R. (2005). Weakest-precondition of unstructured programs. In Proceedings of the 6th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, pp. 82–87. ACM. https://doi.org/10.1145/1108792.1108813.
Barrett, C., Sebastiani, R., Seshia, S. A., & Tinelli, C. (2009). Satisfiability modulo theories. In Handbook of satisfiability, vol. 185, pp. 825–885. IOS press.
Barrett, C., Stump, A., & Tinelli, C. (2010). The SMT-LIB standard: Version 2.0. Tech. rep., Department of computer science, The University of Iowa. Available at www.SMT-LIB.org, retrieved September 22nd, (2019).
Barrett, C., & Tinelli, C. (2018). Satisfiability modulo theories. In Handbook of model checking, pp. 305–343. Springer.
Barthe, G., Grėgoire, B., Kunz, C., Lakhnech, Y., & Bėguelin, S. Z. (2012). Automation in computer-aided cryptography: Proofs, attacks and designs. In CPP, pp. 7–8. https://doi.org/10.1007/978-3-642-35308-6_3.
Beaulieu, A. (2009). Learning SQL, 2nd edn. Newton: O’Reilly Media, Inc.
Ben-Gan, I. (2012). Microsoft SQL server 2012 t-SQL fundamentals. Pearson Education.
Benedikt, M., Konstantinidis, G., Mecca, G., Motik, B., Papotti, P., Santoro, D., & Tsamoura, E. (2017). Benchmarking the chase. In PODS, pp. 37–52. ACM. https://doi.org/10.1145/3034786.3034796.
Benzaken, V., & Contejean, E. (2019). A coq mechanised formal semantics for realistic sql queries: formally reconciling sql and bag relational algebra. In CPP, pp. 249–261. ACM. https://doi.org/10.1145/3293880.3294107.
Benzaken, V., Contejean, É. , Keller, C., & Martins, E. (2018). A Coq formalisation of SQL’s execution engines. In Avigad, J., Mahboubi, A., & Interactive theorem proving (Eds.) https://doi.org/10.1007/978-3-319-94821-8_6 (pp. 88–107): Springer.
Bertot, Y., & Castran, P. (2010). Interactive theorem proving and program development: Coq’Art: the calculus of inductive constructions springer. https://doi.org/10.1007/978-3-662-07964-5.
Blome, M., Robertson, C., Cai, S., Jones, M., Sebolt, M., & Hogenson, G. (2020). Open database connectivity (ODBC), Microsoft Docs. https://docs.microsoft.com/en-us/cpp/data/odbc/open-database-connectivity-odbc?view=vs-2019 retrieved March 5th.
Brumm, B. (2019). Beginning oracle SQL for oracle database 18c. Springer.
Ceri, S., & Gottlob, G. (1985). Translating sql into relational algebra: optimization, semantics, and equivalence of sql queries. IEEE Transactions on Software Engineering, 11(4), 324–345. https://doi.org/10.1109/TSE.1985.232223.
Chandra, A. K., & Merlin, P. M. (1977). Optimal implementation of conjunctive queries in relational data bases. In STOC, pp. 77–90. ACM. https://doi.org/10.1145/800105.803397.
Chaudhuri, S., & Vardi, M. Y. (1993). Optimization of real conjunctive queries. In PODS, pp. 59–70. ACM. https://doi.org/10.1145/153850.153856.
Chirkova, R. (2016). Combined-semantics equivalence of conjunctive queries: decidability and tractability results. Journal of Computer and System Sciences, 82(3), 395–465. https://doi.org/10.1016/j.jcss.2015.11.001.
Chu, S., Murphy, B., Roesch, J., Cheung, A., & Suciu, D. (2018). Axiomatic foundations and algorithms for deciding semantic equivalences of SQL queries. Proc. VLDB Endow., 11(11), 1482–1495. https://doi.org/10.14778/3236187.3236200.
Chu, S., Weitz, K., Cheung, A., & Suciu, D. (2017). Hottsql: Proving query rewrites with univalent sql semantics. SIGPLAN Not., 52(6), 510–524. https://doi.org/10.1145/3140587.3062348.
Clarke, E., Kroening, D., & Lerda, F. (2004). A tool for checking ANSI-c programs. In TACAS, pp. 168–176. Springer. https://doi.org/10.1007/978-3-540-24730-2_15.
Clarke, E. M., Henzinger, T. A., Veith, H., & Bloem, R. (Eds.). (2018). Handbook of model checking. Berlin: Springer. https://doi.org/10.1007/978-3-319-10575-8.
Cohen, S. (2006). Equivalence of queries combining set and bag-set semantics. In PODS, pp. 70–79. ACM. https://doi.org/10.1145/1142351.1142362.
Cohen, S. (2009). Equivalence of queries that are sensitive to multiplicities. The VLDB Journal, 18(3), 765–785. https://doi.org/10.1007/s00778-008-0122-1.
Cohen, S., Nutt, W., & Serebrenik, A. (1999). Rewriting aggregate queries using views. In PODS, pp. 155–166. ACM. https://doi.org/10.1145/303976.303992.
Cordeiro, L., Fischer, B., & Marques-Silva, J. (2009). SMT-based bounded model checking for embedded ANSI-c software. ASE, 2009, 137–148. https://doi.org/10.1109/ASE.2009.63.
Cordeiro, L., Kroening, D., & Schrammel, P. Beyer, D., Huisman, M., Kordon, F., & Steffen, B. (Eds.). (2019). Jbmc: bounded model checking for java bytecode. Cham: Springer International Publishing.
Coronel, C., Morris, S., & Rob, P. (2009). Database systems: design, implementation, and management, 9th edn. Boston: Course Technology Press.
Cousot, P., & Cousot, R. (1977). Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of Fixpoints. In POPL, pp. 238–252. ACM Press. https://doi.org/10.1145/512950.512973.
Date, C. J. (2003). An introduction to database systems, 8th edn., Pearson, London.
De Moura, L., & Bjorner, N. (2008). Z3: an efficient SMT solver. In TACAS, pp. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24.
Douglas, G., & Lawrence, R. (2016). Improving sql query performance on embedded devices using pre-compilation. In SAC, pp. 961–964. ACM. https://doi.org/10.1145/2851613.2851657.
Ebel, N. (2020). Manually trigger a GitHub actions workflow. https://goobar.io/2019/12/07/manually-trigger-a-github-actions-workflow/ retrieved March 10th.
Emmi, M., Majumdar, R., & Sen, K. (2007). Dynamic test input generation for database applications. In ISSTA, pp. 151–162. ACM. https://doi.org/10.1145/1273463.1273484.
Felsing, D., Grebing, S., Klebanov, V., Rümmer, P., & Ulbrich, M. (2014). Automating regression verification. In ASE, pp. 349–360. ACM. https://doi.org/10.1145/2642937.2642987.
Fisher, M., Ellis, J., & Bruce, J. C. (2003). JDBC API tutorial and reference, 3rd edn., Pearson Education, London.
Flanagan, C., Leino, K. R. M., Lillibridge, M., Nelson, G., Saxe, J. B., & Stata, R. (2002). Extended static checking for java. SIGPLAN Not., 37(5), 234–245. https://doi.org/10.1145/543552.512558.
Fowler, M. (2018). Refactoring: improving the design of existing code. Addison-Wesley Professional.
Garbus, J. (2015). SAP ASE 16/Sybase ASE administration. Sap Press.
Ghafoor, M. A., Mahmood, M. S., & Siddiqui, J. H. (2019). Extending symbolic execution for automated testing of stored procedures. Software Quality Journal. https://doi.org/10.1007/s11219-019-09453-6.
Godlin, B., & Strichman, O. (2013). Regression verification: proving the equivalence of similar programs. Softw. Test., Verif. Reliab., 23(3), 241–258. https://doi.org/10.1002/stvr.1472.
Görg, C., & Weiundefinedgerber, P. (2005). Error detection by refactoring reconstruction. In Proceedings of the 2005 International Workshop on Mining Software Repositories, MSR ’05. (pp. 1–5). New York: Association for Computing Machinery.
Grossman, S., Cohen, S., Itzhaky, S., Rinetzky, N., & Sagiv, M. (2017). Verifying equivalence of spark programs. In CAV, LNCS, pp. 282–300. Springer. https://doi.org/10.1007/978-3-319-63390-9_15.
Guagliardo, P., & Libkin, L. (2017). A formal semantics of sql queries, its validation, and applications. Proc. VLDB Endow., 11(1), 27–39. https://doi.org/10.14778/3151113.3151116.
Hoare, C. A. R. (1969). An axiomatic basis for computer programming. Communications of the ACM, 12(10), 576–580. https://doi.org/10.1145/363235.363259.
Huang, S., & Cheng, K. (1998). Formal equivalence checking and design debugging. Kluwer Academic Publishers. https://doi.org/10.1007/978-1-4615-5693-0.
IBM. (2012). Developing embedded SQL applications. ftp://ftp.software.ibm.com/ps/products/db2/info/vr101/pdf/en_US/DB2DevEmbeddedSQL-db2a1e1010.pdf, retrieved September 22nd, 2019.
Ioannidis, Y., & Ramakrishnan, R. (1995). Containment of conjunctive queries: beyond relations as sets. ACM Transaction Database System, 20(3), 288–324. https://doi.org/10.1145/211414.211419.
ISO. (2016). Iso/iec 9075-1:2016. Online at: https://www.iso.org/obp/ui/#iso:std:iso-iec:9075:-1:ed-5:v1:en, retrieved September 22nd, 2019.
ITTIA. (2016). Benefits of database for embedded system and iot device manufacturers. On-line at: http://www.ittia.com/files/Benefits_of_Database_for_Embedded_System_and_IoT_Device_Manufacturers.pdf, retrieved September 22nd, 2019.
Jayram, T. S., Kolaitis, P. G., & Vee, E. (2006). The containment problem for real conjunctive queries with inequalities. In PODS, pp. 80–89. ACM. https://doi.org/10.1145/1142351.1142363.
Kim, M., Zimmermann, T., & Nagappan, N. (2014). An empirical study of refactoring challenges and benefits at Microsoft. IEEE Trans. Softw. Eng., 40(7), 633–649. https://doi.org/10.1109/TSE.2014.2318734.
King, J. C. (1976). Symbolic execution and program testing. Communications of the ACM, 19(7), 385–394. https://doi.org/10.1145/360248.360252.
Lattner, C., & Adve, V. (2004). LLVM: A compilation framework for lifelong program analysis & transformation. In CGO, pp. 75–86. IEEE Computer Society. https://doi.org/10.1109/CGO.2004.1281665.
Mahmood, M. S., Ghafoor, M., & Siddiqui, J. H. (2016). Symbolic execution of stored procedures in database management systems. In ASE, pp. 519–530. ACM. https://doi.org/10.1145/2970276.2970318.
Malecha, G., Morrisett, G., Shinnar, A., & Wisnesky, R. (2010). Toward a verified relational DB management system. In POPL, pp. 237–248. ACM. https://doi.org/10.1145/1706299.1706329.
Mens, T., & Tourwé, T. (2004). A survey of software refactoring. IEEE Transactions on Software Engineering, 30(2), 126–139. https://doi.org/10.1109/TSE.2004.1265817.
Merz, F., Falke, S., & Sinz, C. (2012). LLBMC: Bounded model checking of C and C++ programs using a compiler IR. In VSTTE, LNCS, pp. 146–161. Springer. https://doi.org/10.1007/978-3-642-27705-4_12.
Microsoft. (2020). Microsoft SQL server. https://www.microsoft.com/en-us/sql-server/.
Marcozzi, M., Vanhoof, W., & Hainaut, J.I. (2015). Relational symbolic exec. of sql code for unit testing of database programs. Science of Comp. Program., 105, 44–72. https://doi.org/10.1016/j.scico.2015.03.005.
Murphy-Hill, E., Parnin, C., & Black, A.P. (2012). How we refactor, and how we know it. IEEE Transactions on Software Engineering, 38(1), 5–18. https://doi.org/10.1109/TSE.2011.41.
Negri, M., Pelagatti, G., & Sbattella, L. (1991). Formal semantics of sql queries. ACM Trans Database Syst., 16(3), 513–534. https://doi.org/10.1145/111197.111212.
Obe, R. O., & Hsu, L. S. (2017). PostgreSQL: Up and running: a practical guide to the advanced open source database. “O’Reilly Media Inc.”.
Oracle. (2020). Oracle database. https://www.oracle.com/database/.
Pan, K., Wu, X., & Xie, T. (2014). Guided test generation for database applications via synthesized database interactions. ACM Trans. Softw. Eng. Methodol., 23(2), 12:1–12:s27. https://doi.org/10.1145/2491529.
Post, H., & Sinz, C. (2009). Proving functional equivalence of two AES implementations using bounded model checking. In ICST 2009, pp. 31–40. https://doi.org/10.1109/ICST.2009.39.
PostgreSQL. (2020). PostgreSQL: The world’s most advanced open source relational database. https://www.postgresql.org/.
Ramos, D. A., & Engler, D. R. (2011). Practical, low-effort equivalence verification of real code. In CAV.
Riazanov, A., & Voronkov, A. (2002). The design and implementation of VAMPIRE. AI Communications, 15(2-3), 91–110.
Rodchenko, N., Prokopov, A., & Gaviar, A. (2019). U∘OS blockchain framework. https://github.com/UOSnetwork/uos.docs/blob/master/yellow_paper/uos_yellow_paper.eng.pdf.
Sagiv, Y., & Yannakakis, M. (1980). Equivalences among relational exp. with union and difference operators. Journal of the ACM, 27 (4), 633–655. https://doi.org/10.1145/322217.322221.
Scheben, C., & Schmitt, P. H. (2014). Efficient self-composition for weakest precondition calculi. In FM, LNCS, vol. 8442, pp. 579–594. Springer. https://doi.org/10.1007/978-3-319-06410-9_39.
Schlaipfer, M., Rajan, K., Lal, A., & Samak, M. (2017). Optimizing big-data queries using program synthesis. In SOSP, pp. 631–646. ACM. https://doi.org/10.1145/3132747.3132773.
Schmitt, D. (2019). Bug #5673: Optimizer creates strange execution plan leading to wrong results. https://www.postgresql.org/message-id/201009231503.o8NF3blt059661@wwwmaster.postgresql.org retrieved September 22nd.
Spasić, M., & Vujošević Janičić, M. (2018). First steps towards proving functional equivalence of embedded SQL. In TYPES, pp. 78–79. Univ. Minho.
Spasić, M., & Vujošević Janičić, M. (2020). GitHub repository: SQLC (2020). https://github.com/mirkospasic/sqlc retrieved March 10th.
Strichman, O., & Godlin, B. (2005). Regression verification - a practical way to verify programs. In VSTTE, LNCS, vol. 4171, pp. 496–501. Springer. https://doi.org/10.1007/978-3-540-69149-5_54.
Sulik, M. (2019). Bug #70038: Wrong select count distinct with a field included in two-column unique key (2013). https://bugs.mysql.com/bug.php?id=70038 retrieved September 22nd.
Sybase, S. (2020). Relational database server. https://www.sap.com/products/sybase-ase.html.
Tillmann, N., & Halleux, J. (2008). Pex – white box test generation for.NET. In TAP, LNCS, vol. 4966, pp. 134–153. Springer. https://doi.org/10.1007/978-3-540-79124-9_10.
Trakhtenbrot, B. A. (1950). Impossibility of an algorithm for the decision problem in finite classes. Doklady Akademii Nauk SSSR, 70, 569–572.
Veanes, M., Tillmann, N., & de Halleux, J. (2010). Qex: symbolic sql query explorer. In LPAR, pp. 425–446. Springer. https://doi.org/10.1007/978-3-642-17511-4_24.
Verdoolaege, S., Palkovic, M., Bruynooghe, M., Janssens, G., & Catthoor, F. (2010). Experience with widening based equivalence checking in realistic multimedia systems. J. Electronic Testing, 26(2), 279–292. https://doi.org/10.1007/s10836-009-5140-4.
Vujošević Janičić, M. (2014). Regression verification by system LAV. InfoM — Journal of Information Technology and Multimedia System, 49, 14–20.
Vujošević Janičić, M., & Kuncak, V. (2012). Development and evaluation of LAV: an SMT-based error finding platform. In VSTTE, LNCS, pp. 98–113. Springer. https://doi.org/10.1007/978-3-642-27705-4_9.
Vujošević Janičić, M., & Marić, F. (2020). Regression verification for automated evaluation of students programs. Computer Science and Information Systems, 17(1), 205–228. https://doi.org/10.2298/CSIS181220019V.
Vujošević Janičić, M., Nikolić, M., Tošić, D., & Kuncak, V. (2013). Software verification and graph similarity for automated evaluation of students’ assignments. Information and Softw Technology 55(6). https://doi.org/10.1016/j.infsof.2012.12.005.
Vujošević Janičić, M., & Spasić, M. (2020). Tools LAV and SQLAV (2020). http://argo.matf.bg.ac.rs/?content=lav retrieved March 10th.
Wang, Y., Dillig, I., Lahiri, S., & Cook, W. (2017). Verifying equivalence of database-driven apps. Proc. ACM Program Lang., 29, 1–56. https://doi.org/10.1145/3158144.
Web, L. (2020). Ten ways databases run your life. https://www.liquidweb.com/blog/ten-ways-databases-run-your-life/.
Weidenbach, C., Dimova, D., Fietzke, A., Kumar, R., Suda, M., & Wischnewski, P. (2009). SPASS Version 3.5. In CADE, pp. 140–145. https://doi.org/10.1007/978-3-642-02959-2_10.
Weißgerber, P., & Diehl, S. (2006). Are refactorings less error-prone than other changes?. In Proceedings of the 2006 International Workshop on Mining Software Repositories, pp. 112–118. https://doi.org/10.1145/1137983.1138011.
Weißgerber, P., & Diehl, S. (2006). Identifying refactorings from source-code changes. In Proceedings of the 21st IEEE/ACM International Conference on Automated Software Engineering, ASE ’06. https://doi.org/10.1109/ASE.2006.41 (pp. 231–240). USA: IEEE Computer Society.
Welsch, Y., & Poetzsch-heffter, A. (2014). A fully abstract trace-based semantics for reasoning about backward compatibility of class libraries. Sci. Comput. Program., 92, 129–161. https://doi.org/10.1016/j.scico.2013.10.002.
Yang, P., Jin, P., & Yue, L. (2014). Exploiting the performance-energy tradeoffs for mobile database apps. Journal of Universal Comp. Science, 20(10), 1488–1498.
Funding
This work was partially supported by the Serbian Ministry of Science grant 174021 and by COST action CA15123.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Spasić, M., Janičić, M.V. Verification supported refactoring of embedded sql. Software Qual J 29, 629–665 (2021). https://doi.org/10.1007/s11219-020-09517-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-020-09517-y