by: Mike Waas
Why SQL Code Conversion Doesn’t Work
By now it isn’t exactly a secret anymore: data warehouse appliance customers worldwide are looking to replace their Teradata, and other systems, with modern cloud data warehouses. Enterprises everywhere are currently considering manual migration of these systems. This includes the rewriting or adjusting of all their applications that are using Teradata today.
The typical migration cost for an individual system and its applications is significant. Viewed globally across the entire Teradata installed base, this effort amounts to a whopping $10-15bn. And that’s just the estimate when everything goes well. Ironically, this number is about ten times larger than the annual Teradata license revenue.
In response to this challenge, automatic code conversion systems have been developed. They promise to convert all SQL code from Teradata SQL to that of the destination cloud data warehouse. It seems like a helpful approach at first. However, it is also one that is grossly flawed and generally unsuccessful in real-world application scenarios.
In this article, we examine the challenges and why rewrite, and automatic code conversion, in particular, doesn’t live up to its promise.
SQL-Only Rewriting Is Insufficient
Let’s look at underpinnings of rewriting first. Rewrite operates on the assumption that SQL from one dialect has an equivalent in the SQL of the other. In other words, there is a translation for every statement from the source into one or more statements on the destination. If the destination does not support certain constructs, however, this breaks down immediately.
Take for example recursive queries. Execution of one recursive query translates into the execution of a variable number of SQL statements. The exact number of which is not known a priori but is a result of the execution itself. If the new destination does not support recursion, rewrite is simply impossible. Rather, the applications that use recursion need to be modified—significantly.
Recursion illustrates the fundamental defect of rewrite quite well. And in practice, most IT professionals would be hard-pressed to rearchitect an application correctly to resolve it. But recursion is actually still one of the simpler ones. Teradata has many other frequently occurring concepts that derail rewriters. Examples include Global Temporary Tables and Stored Procedures to name but the most obvious ones.
By now you get the picture. Rewrite is never going to be scoped to SQL only. If the functional gap between source and destination isn’t trivial, rewrite will always bleed into the application code. Significant surgery on the application code level will ensue. And that’s where the cost for rewrite goes up exponentially.
Automatic Code Conversion Is Actually Counter-Productive
Automatic code conversion is the idea that rewrites can be executed by software. I think, you already see where this is going to break. It simply accelerates the rewrites that don’t require rearchitecting of applications. It offers, however, no solution for the many cases that go beyond SQL-only rewrites. By their own admission, automatic code converters claim therefore only 50-70% success rate.
But it gets worse. Machine-generated code is not what you would call maintainable. The resulting code of a conversion attempt is usually more difficult to grasp and analyze than the original SQL. The cost of manual modifications—that are inevitably needed, see above—just went up significantly. This technical debt will rear its head with any every future maintenance operation for years to come.
On top of all that, automatic code conversion comes with a significant execution issue. The process is simply not scalable. Every individual application must be located and taken apart; SQL is extracted, run through the code converter, and inserted into the application. And that’s just the 50-70% of cases where things go well. This is a highly labor-intensive process—even in the best-case scenario.
In short, automatic code conversion is a technically flawed approach that works only in the most trivial of examples. In practice, it will create more problems than it solves. The fallout from its application will tax the enterprise for years to come.
A 70% Solution to an 80-20 Problem —Is Not A Solution
Database migration is a classic 80-20 problem if ever there was one. The first 80% of the project will take only 20% of effort and cost. Not surprisingly, the last 20% is what gives these projects such a bad name. I believe we made it quite clear that the “easy” 80% and the “hard” 20% are orders of magnitude apart.
Now let’s simply run the numbers. A typical migration for a mid-range Teradata system of 30TB with 2,500,000 queries daily takes about 3-4 years and costs $25m. This assumes using a reputable SI and that no complications are encountered. Using a code converter with 60% success rate will make good headway with the easy problems but leave all of the hard ones unaddressed.
Surprisingly, automatic code converter will reduce time and cost only by a measly 15%. Remember 80% of the effort goes into solving problems that code conversion can’t handle. This brings the overall effort in the above example to 2.5 years and a cost of $22m. Given the uncertainty of these estimates in the first place, the anticipated savings are well below the noise level.
Taking into account the long-term repercussions that code conversion has on maintainability and support, the ROI for code conversion is certainly negative. This may seem counter-intuitive at first. But consider that in order to break even, any solution would have to demonstrate at least 90+% effectiveness. And that’s just math.
At Datometry we pioneered data warehouse virtualization. With Datometry, enterprises run their existing applications, written for Teradata, natively on a cloud data warehouse. Replatforming off Teradata is now an option, without putting career or enterprise at risk. The Hyper-Q platform transforms SQL on-the-fly and synthesizes optimized SQL for the destination cloud data warehouse.
Through its emulation of complex features, Datometry Hyper-Q gives applications a fully equivalent platform using the latest cloud data warehouse technology in Azure, AWS, and GCP. In addition, a complete suite of client-side tools, from drivers to loaders optimizes the deployment for cloud.
See the guidebook for a detailed analysis of challenges when rewriting and how Datometry lets you overcome these obstacles. To learn how to transition from Teradata to any modern cloud data warehouse at a fraction of time, cost and risk, please visit www.datometry.com