2024

Database Introspection with Bal Persist

Updated on 29th August 2024

A CLI tool to introspect databases and generate Bal Persist data models.

Introduction

Bal Persist

Ballerina is a programming language that is designed with integration and cloud in mind.

Bal Persist is the officially developed persistence library to bring out-of-the-box persistence capabilities to Ballerina. Bal Persist currently supports in-memory, MySQL, MSSQL, PostgreSQL, Google Sheets, and Redis as data stores, with support for more data stores such as MongoDB coming soon. At the time, Bal Persist worked seamlessly if you are starting a project from scratch, i.e., in a scenario where you don't have any database set up. However, it is rather difficult to use Bal Persist with existing databases due to the mismatches in how Bal Persist represents data and how an existing database would already have the schema defined. Several challenges were identified.

Writing a Ballerina data model that exactly maps to an existing database takes considerable amount of time
The mismatch between the names of tables and columns between Bal Persist and existing database implementation
The mismatch between data types used by Bal Persist and what's already used in the existing database
No way to quickly identify whether the existing implementation is compatible with the current design of Bal Persist

Due to these reasons, this project is aimed to provide introspection support to Bal Persist to seamlessly integrate with existing relational databases such as MySQL, MSSQL, and PostgreSQL. The main goals of this project were to,

Identify whether Bal Persist supports the existing database implementation
Handle any name and type mismatches between Ballerina naming conventions and types
Read and automatically generate the database model to a model.bal file
Expose this functionality through the bal persist pull command

As my internship project at WSO2, I worked on Bal Persist to add the capability to introspect SQL databases such as MySQL, PostgreSQL, and MSSQL databases, to allow users to generate Ballerina data models and client APIs for their existing databases without having to write the data model manually.

Moreover, I've also added support for advanced SQL annotations that can be used to customize the generated data models, such as specifying the table and column names, column types, indexes, foreign keys, and database-generated values.

Feature Overview

The short YouTube video below provides a brief tutorial on how to use the bal persist pull command to introspect a MySQL database and generate Ballerina data models and client APIs.

In the next section, I will provide an overview of the project architecture and the methodology used to implement the feature.

Project Architecture

Bal Persist Introspection Architecture

Step 1: The database credentials are read from the Command Line Options passed when executing the bal persist pull command.

Step 2: The Script Runner queries the database to retrieve the schema of the desired database and populates the Introspection Data Transfer Objects (DTO) with the raw data extracted from the database to be processed into Ballerina Entities.

Step 3: The Mapping Process maps the DTOs to Ballerina Entities by mapping entities, relationships, data types, Enums, etc., and converts all names to Ballerina conforming names along the way (record names to be PascalCase and record field names to be camelCase).

Step 4: The source generator reads the generated entities and creates a model.bal file. Here, any SQL annotataions that can indicate and map names, data types, indexes, relationships, and database-generated fields are also injected to the syntax tree as well.

Methodology

The project was implemented using Java and Ballerina. I got the opportunity to work with the Ballerina Syntax API, Compiler Diagnostics, and the Ballerina CLI tooling and I learned a lot about how programming languages work under the hood. Moreover, I got the opportunity to work with MySQL, PostgreSQL, and MSSQL databases and learned how to query databases and retrieve schema information. I also got to work with TestNG and Gradle to write and run unit tests and to use Docker to run integration tests.

Conclusion

I'm excited to share that I've successfully implemented introspection support for MySQL databases in the release of Ballerina Swan Lake Update 9, and support for PostgreSQL and MSSQL data stores are released with Ballerina Swan Lake Update 10. This also includes new advanced SQL annotations that can be used to customize the generated data models, such as specifying the table name, column name, custom column types, indexes, foreign keys, and database-generated values.

Overall, I had a great experience working on this project and interning at WSO2. I've learned a lot about Ballerina, databases, and how to work on a large-scale open-source project. I'm looking forward to contributing more to Ballerina in the future.

I would like to express my heartful gratitude for my mentors Danesh, Bhashinee, Sahan, and the awesome Ballerina team at WSO2 for guiding me throughout the project. Moreover, I am deeply grateful for leaders behind the Ballerina Team Sameera, Anupama, and Shafreen for giving me invaluable feedback during the project reviews.

Learn More

In the below URLs, you can find more information about how to use the Bal Persist CLI tool and the link to download the latest Ballerina Swan Lake release.