In this post, you’ll learn about the vital task of rewriting data warehouse applications when moving them from one database to another. Also, you’ll be able to answer two important questions: “Why is this necessary?” and “What makes it so challenging?”
The short answer to the first question is that the nature of the SQL standard and the numerous SQL dialects that have been created have evolved and transformed as a result of the requests from the user community of each database. Therefore, when moving to a new database, your applications need to be adjusted or rewritten to take advantage of the new SQL.
The short answer to the second question as to why this is a challenge is that the rewriting process is laborious and time-consuming, and it can easily introduce unintended errors in your applications, for example, if the translated SQL returns different data types.
Let’s assume you have been tasked to move your current on-premise database to the cloud. The legacy application is using Teradata’s SQL dialect (let’s call it SQL-A) and connects to the database using a standard driver (Native client, JDBC or ODBC) as shown in the figure below. In the new architecture, the applications will connect to a cloud database, in this case AWS Redshift using SQL-B dialect. In the legacy application, SQL-A queries need to be changed to work in the new environment. As explained in Part 1 of this blog series, this happens because of database schema changes, missing functionality or different behavior of functions between the database engines.
To understand why, let’s walk through a Python code snippet to demonstrate the work involved in translation.
There are four major steps in the code as shown below:
Let’s execute the SQL-A statement with no changes. There are clear differences between Teradata and Redshift SQL dialects. When executing the query against the AWS Redshift database, a syntax error is returned (see output below). The first difference is the SEL keyword. It is not recognized and needs to be translated to SELECT. Two additional modifications are needed to execute the query successfully; 1) change the syntax of the data type casting function and; 2) the following DATE and FORMAT functions need to be translated to the Redshift SQL-B dialect.
To get the query running directly in AWS Redshift, you need to make syntactic changes as shown in the section below. In this case, we are not changing the structure (grammar) of the query. (In part 3 of this blog series, I’ll show a case where a transformation is required due to semantic changes.)
We are already connected to Redshift, so let’s make some changes to the original query. In this case, we are translating keywords and function names. As mentioned before, these changes do not affect the structure of the query. The figure below depicts what needs to be changed.
As you can see, even a simple translation of DATE and FORMAT functions require you to be fluent in both dialects. The code below shows the changes:
After executing the translated query, we are able to get a response from the database:
The Datometry adaptive data virtualization platform is a paradigm shift as it enables your applications to continue working using the original SQL-A dialect. No changes to the application are necessary, you need to simply point the connection string of the current application to the cloud database.
Let’s take the original query in SQL-A dialect and instead of connecting directly to the target database, we connect the application to the Datometry Hyper-Q end point. In turn, the Hyper-Q gateway connects to the target database as shown in the diagram below. Hyper-Q captures the incoming query at the network level in real time and translates it to the SQL-B dialect. The target database takes the translated query and executes it through its query processing engine.
Let’s see this in action in the Python code snippet below.
As you can see, the query did not change from the original SQL-A dialect and the result set is identical to the one produced by the translated query:
Translating a sentence from English to German is not difficult; however, translating a whole book is laborious and time consuming. Errors can be introduced if it is not done correctly. Similarly, rewriting or adjusting a few applications to work using a different SQL dialect is not difficult: the challenge arises when millions of lines of code need to be rewritten for a new database.
Some of the key advantages of using Hyper-Q when migrating to a modern database:
Stay tuned for part 3 of this blog series: it will discuss how Hyper-Q takes care of cases where the grammar (query structure) changes.