InQuery: Text2SQL App coded with LLMs
Why Dart?
Dart is a programming language which enables the same code to be compiled to multiple platforms, be it IOS, Android or a web platform. This means that you can build sort of “universal” applications accessible through any platform.
Why Text to SQL?
Not everyone is an expert in structured language. There is a general field of research on text to SQL dedicated to users unfamiliar with this syntax navigate a database. In our project, we grounded it in the context of querying a medical database. As a sample use case, we query a free, open-source demo of the MIMIC-IV Clinical Database.
Are LLMs enough to build this app?
While I acknowledge the use of LLMs in building this app, their role is better described as an assistant rather than a substitute. An app is way more complicated, in the process of building it I faced a lot of problems which requires a human’s debugging skills, and continuous iterative improvements (tons of breakpoints and trying to pinpoint the source of a problem).
For example, many of the problems which the LLM couldn’t solve:
-
I faced an API query error which required me to try different API connection points to identify which is the one which works and could bridge the flask backend with the front end interface.
-
There was a problem regarding querying the database whereby I didn’t know the database should be copied in the memory of the device, otherwise I would face unknown tables identified in the database. I needed a lot of breakpoints to debug to figure this out. Originally I wanted to query an external database, but as a proof-of-concept I started easy and queried a local database. The idea should generalize quiet smoothly.
-
There was a lot of trial and error before deciding to design the app as it is (something no LLM could anticipate at the beginning of the project).
3.1 Initially I wanted to use this BERT-based model, MedTS, for my app. However, it’s difficult to use and requires intricate input preprocessing before I can use it. In my app, I’m just assumming the user will provide a natural language input, not information such as a database schema intricately parsed into a json file, conveniently preprocessed in tree structures to be fed to MedTS. MedTS is lightweight and definitely would have been good, but it’d require extensive automatization regarding preprocessing of input.
3.2 Another approach to use a powerful text-to-SQL model is to use an open-source big model like one based on Llama, and compress it using tflite to deploy it on-device. This was not possible for me, because this powerful model is just too big to be compressed in <2 gb. After many trial and error attempts, I ended up (thanks to brainstorming with Perplexity.AI) to host the LLM in a Flask-based backend and call it through an API in the app. I’ll still explore quantization to reduce the memory footprint of the sqlcoder-7b-2 used.
The app’s name, InQuery, is meant to be a play of the words Inquiry and Query as part of SQL.
The code along with instructions on how to run it is found at https://github.com/awxlong/ai_sql_coder. Hope you find it helpful!
Enjoy Reading This Article?
Here are some more articles you might like to read next: