Why SQL Code Conversion Doesn’t Work11.29.19
So, you’ve come to the conclusion that you need to move on from Teradata. Your company cannot afford to miss out on the opportunity to generate new revenue by leveraging the vast array of data processing facilities in the public cloud? You’re tired of being told how IT only runs the business but doesn’t help grow it? You’ve had it. You’re joining the ranks of a rapidly growing movement. 2020 is going to be the year where you will go beyond Teradata, the epitome of a long-gone era.
Yet, replatforming instills fear in even the most hardened IT leader. For a perspective, consider this quote a major enterprise recently received for a rewrite-based migration: $26m for a project that was supposed to take 36 months. Take into account how practically every migration runs late and goes way over budget. Now, you’re looking easily at $40m to $50m, probably over 5 years. Horror stories about failed migrations abound.
Surprisingly, several vendors in this space recently suggested Teradata migrations were quite easy; a matter of just a few months. Especially their claims how tools could recompile application code automatically have struck a chord with increasingly desperate customers.
So, let’s analyze these claims, and debunk some of the myths.
Before we jump into the details, let’s define the operating theatre. Migrations consist of several distinct operations that need to be coordinated closely. They are: (1) schema transfer, (2) data transfer, and (3) rewriting of all dependent applications to make them work with the new target data warehouse. For our purposes, we assume additional tasks such as provisioning of the new destination data warehouse, general issues of connectivity between the data center and the cloud, etc. to be outside of the scope of this analysis.
First off, schema transfer is a solved problem: simply dump object definitions from Teradata, translate them individually, and replay the result into the destination data warehouse. Since schema definitions are typically devoid of complex SQL such as the use of Global Temporary Tables or, say, recursion, most schema translators do a decent job when transcribing. Less than 10% of the overall effort goes into schema transfer.
Similarly, the transfer of data poses few real problems. Although transferring 10’s or even 100’s of terabytes can be tricky especially if only a short window of time is available to complete the task. Data transfer accounts for about another 10% of the overall effort.
What puts migrations at risk every time, however, is making existing applications work with the new destination data warehouse. Approximately 80% of the overall effort of a conventional migration goes into application rewrite.
Most BI/ETL vendors support cloud data warehouses in the latest versions of their software. Emphasis is on latest version. If you have a well-established application ecosystem you are probably running older versions of these systems. This means you cannot just repoint them.
In order to make them work with your new destination data warehouse you need to upgrade first. The effort depends on the size and structure of the existing system. But upgrading is an operation that typically spans multiple quarters and quickly runs into the millions of dollars.
However, even an expensive upgrade doesn’t do the job just yet. All commercial BI/ETL tools support the injecting of customized SQL into reports and data pipelines. As a result, there is Teradata SQL embedded practically in all your daily processes. In a conventional rewrite, all that logic needs to be extracted, rewritten, re-inserted, re-tested, and re-deployed.
While it is tempting to believe the ecosystem is portable, in practice it requires significant effort which is highly labor-intensive.
All modern cloud data warehouses pride themselves to be general purpose data warehouses. Powerful query optimizers built into these system ensure the best possible execution of any query. No matter the query. They are capable of processing just about anything thrown at them.
Yet, there’s still that persistent misconception that Teradata queries need to be rewritten in certain, unspecified, ways. Supposedly that makes them perform better on cloud data warehouses. In all fairness, this was true a few years ago when the cloud was still in its infancy. But luckily, these days are long gone.
Today, not only do you not need these customizations—you really want to stay away from them. Any such optimization, no matter how well intended, is adding complexity and quickly turns into technical debt.
Remember, it is in the cloud vendors’ core interest to eliminate the need for customization or special formulation of queries. What might seem like a clever optimization today is nothing but an obstacle tomorrow. In short, “optimizations” require significant effort, yet, are almost always a waste of time.
At first, this one seems absurd, given how much effort the reimplementation of ETL requires. However it comes often on the heels of #1. (see above “Misconception #1: BI/ETL tools can just be repointed”) If repointing the existing system is not an option because of all the rewrites needed, one might just bite the bullet and rewrite the data pipelines entirely. Or so the logic goes.
As you guessed, it is grossly flawed. It is a great example why rewrite-based migrations are failing regularly. You went from adjusting ETL to a complete redesign? This will undo years of investment. Adding insult to injury, it’s quite likely you will end up redesigning and reinventing the exact same business logic of the existing system anyway.
Certainly, rewriting and evolving ETL can be an important design project for the company. There’s a time and place for everything. When the enterprise must replatform to the cloud as fast as possible, is not the time nor the place for experimental designs.
Most migrations fail because IT leaders underestimate the effort. Database migrations are an 80-20 problem if ever the was one. The first 80% of the journey require only 20% of effort. It’s treacherously easy to make good progress with a rewrite. Almost all the migration is surprisingly easy—at first.
It’s the remaining 20% that bring out all the problems. It’s the remaining 20% that take years and years. It’s the remaining 20% that kill migration projects.
Now consider that most automatic code conversion claim about 50-70% success and you quickly see why they are not a solution to this problem. They basically solve most of the easy issues that weren’t the problem to begin with. In the best case, automatic code converters reduce the overall effort by 10-15%. But in the greater scheme of things that’s just noise.
A solution that’s short of 95% completeness is not going to move the needle for you at all — and that’s just math.
The truth is, you can replatform from Teradata to a cloud data warehouse only if you do not rewrite. We get to that in a moment, below.
Provisioning, rerouting applications, and testing alone will take about a couple of months. Remember when you last upgraded your Teradata? Maybe from version 14 to 15? It took a couple of months to plan and execute. In this case, you did not rewrite a single application. You did not adjust any embedded SQL. And you did not replace loaders and utilities.
If migrating from Teradata to a modern cloud data warehouse involves rewriting, all bets are off. If you must rewrite, “a couple of months” quickly turns into a couple of quarters, then years. Testing alone is a bear: you must rewrite all tests, revalidate and re-deploy them. And validation is complex because you are operating two distinctly different test environments now. Then come the user acceptance tests which are even more onerous because of the differences of the systems.
In short, a couple of months is the bare minimum for any operation. If you must rewrite applications, you are quickly looking at a large multiple of that baseline.
Rewriting your way out of Teradata in a couple of months is a pipe dream that could cost you your career. There’s good news though: IT leaders who are not willing to let others decide their destiny for them have a better option now. The Datometry platform enables enterprises to replatform their applications that run on Teradata today to a modern cloud data warehouse at a fraction of cost, time and risk‑without changing SQL or APIs.
To learn more, contact your cloud vendor today and ask them about Datometry and see if you qualify for a free assessment. Here is to 2020—where a prosperous future for your enterprise in the cloud is well within reach!