Moving Database Applications to the Cloud: SQL Query Translation (Part 2)

10.10.18

The Monetization of IT – Cost Center to Profit Center

10.17.18

Cloud Express: The Journey to the Cloud Begins

10.03.18

In this post, you’ll learn about the vital task of rewriting data warehouse applications when moving them from one database to another. Also, you’ll be able to answer two important questions: “Why is this necessary?” and “What makes it so challenging?”

The short answer to the first question is that the nature of the SQL standard and the numerous SQL dialects that have been created have evolved and transformed as a result of the requests from the user community of each database. Therefore, when moving to a new database, your applications need to be adjusted or rewritten to take advantage of the new SQL.

The short answer to the second question as to why this is a challenge is that the rewriting process is laborious and time-consuming, and it can easily introduce unintended errors in your applications, for example, if the translated SQL returns different data types.

Connecting to a Cloud Database in Four Steps

Let’s assume you have been tasked to move your current on-premise database to the cloud. The legacy application is using Teradata’s SQL dialect (let’s call it SQL-A) and connects to the database using a standard driver (Native client, JDBC or ODBC) as shown in the figure below. In the new architecture, the applications will connect to a cloud database, in this case AWS Redshift using SQL-B dialect. In the legacy application, SQL-A queries need to be changed to work in the new environment. As explained in Part 1 of this blog series, this happens because of database schema changes, missing functionality or different behavior of functions between the database engines.

Diagram of business applications connected to legacy database

To understand why, let’s walk through a Python code snippet to demonstrate the work involved in translation.

There are four major steps in the code as shown below:

  1. Import needed modules to display the query output and connect to the database.
  2. Pass necessary information to connect to the source database (that is, target database, schema, and user id).
  3. Assign the SQL statement to a variable, create a cursor and execute the query.
  4. Display the result set in a markdown cell.

diagram_2

Connecting to AWS Redshift

Let’s execute the SQL-A statement with no changes. There are clear differences between Teradata and Redshift SQL dialects. When executing the query against the AWS Redshift database, a syntax error is returned (see output below). The first difference is the SEL keyword. It is not recognized and needs to be translated to SELECT. Two additional modifications are needed to execute the query successfully; 1) change the syntax of the data type casting function and; 2) the following DATE and FORMAT functions need to be translated to the Redshift SQL-B dialect.

Diagram of business application SQL attempting to move to target database

To get the query running directly in AWS Redshift, you need to make syntactic changes as shown in the section below. In this case, we are not changing the structure (grammar) of the query. (In part 3 of this blog series, I’ll show a case where a transformation is required due to semantic changes.)

SQL Error

Making Syntactic Changes to Execute the Query

We are already connected to Redshift, so let’s make some changes to the original query. In this case, we are translating keywords and function names. As mentioned before, these changes do not affect the structure of the query. The figure below depicts what needs to be changed.

As you can see, even a simple translation of DATE and FORMAT functions require you to be fluent in both dialects. The code below shows the changes:

After executing the translated query, we are able to get a response from the database:

query_results

Virtualizing the Query: A New Approach

The Datometry adaptive data virtualization platform is a paradigm shift as it enables your applications to continue working using the original SQL-A dialect. No changes to the application are necessary, you need to simply point the connection string of the current application to the cloud database.

Let’s take the original query in SQL-A dialect and instead of connecting directly to the target database, we connect the application to the Datometry Hyper-Q end point. In turn, the Hyper-Q gateway connects to the target database as shown in the diagram below. Hyper-Q captures the incoming query at the network level in real time and translates it to the SQL-B dialect. The target database takes the translated query and executes it through its query processing engine.

Diagram of legacy applications using Datometry Hyper-Q to translate queries to target database

Let’s see this in action in the Python code snippet below.

As you can see, the query did not change from the original SQL-A dialect and the result set is identical to the one produced by the translated query:

Table of database query translation results using Datometry Hyper-Q

Why Virtualization of Applications is the Right Solution

Translating a sentence from English to German is not difficult; however, translating a whole book is laborious and time consuming. Errors can be introduced if it is not done correctly. Similarly, rewriting or adjusting a few applications to work using a different SQL dialect is not difficult: the challenge arises when millions of lines of code need to be rewritten for a new database.

Some of the key advantages of using Hyper-Q when migrating to a modern database:

  1. Your applications are now database, schema, and cloud agnostic. Hyper-Q will convert from one SQL dialect to another in real-time.
  2. Easier learning curve and faster adoption of cloud database technology.
  3. Avoid introducing unintended errors due to poor or wrong translation.

Stay tuned for part 3 of this blog series: it will discuss how Hyper-Q takes care of cases where the grammar (query structure) changes.

 References

The cloud revolution is already underway.
Download Ebook

About Alex Infanzon

Alex Infanzon holds a Master of Science in New Generation Computing from The University of Exeter in the United Kingdom and a B. A. in Computer Science from The Anahuac University in Mexico. He has over 30 years of professional experience in consulting, pre-sales and management roles at Westinghouse, Informix Software, Sun Microsystems, Composite Software, Dun and Bradstreet, EMC Greenplum and most recently the SAS Institute. He has worked with fortune 500 companies in different vertical industries, both in a selling and consulting capacity. Alex has extensive experience in architecting and implementing Business Intelligence, Data Management, Enterprise Information Integration, Fuzzy Matching and High Performance Computing solutions.

Your email address will not be published. Required fields are marked *

All comments are moderated

By checking this box, I acknowledge that this form collects your name and email so that Datometry can keep track of comments placed on this website. For more information, view privacy policy.

Continue Reading