We recently benchmarked Datometry Hyper-Q against an industry leading code converter. Unlike most database benchmarks, we used an actual customer scenario. We expected that Hyper-Q would beat out any code converter. What was astonishing, however, was the wide margin by which Hyper-Q won.
The benchmark was not particularly exotic or challenging. The code converter we experimented with is widely regarded as state of the art. So what’s going on here?
Hyper-Q Trounces code converters
Perhaps the most pressing question is: was this a fluke or are code converters intrinsically limited? After all, the industry has been using converters since the dawn of databases. SQL translators have long been available for every database.
In this article we look at the principles underlying code converters, what their limitations are, and—importantly—what it means for database migrations in practice.
We recently presented our findings of comparing Hyper-Q with a leading code converter. The results are powerful. Hyper-Q trounces the code converter completely. It’s not even close.
Since there is no official benchmark for database migrations, we had to make up our own experiment. We chose a real-world customer scenario from a financial data warehouse. We dumped the schema from said Teradata system and ran it through both Hyper-Q and the converter.
Out of the 2,906 objects in the schema, Hyper-Q was able to translate all but 6 correctly. Overall, Hyper-Q generated correct output for 99.75% of the schema.
In contrast, the converter spewed over 4,600 warnings and even crashed 16 times in the process. For only 448 objects did it generate output without issues. That is a measly 15.4% of the entire schema.
Some of the warnings may be spurious depending on the usage of the object, others indicate wrong results. For example, dropping a uniqueness constraint may be admissible if one can guarantee that no duplicates will ever get inserted in a table. Doing it without that knowledge, however, is prone to corrupt business processes.
During a migration, implementers need to check every single one of the 4,641 issues and determine if it’s okay to ignore it, and when it’s not, come up with a solution or workaround for the failed translation.
Why are code converters not getting better?
The lackluster performance of the code converter we tested was not unique, and neither were the problems it encountered: all four code converters that we tested struggled with the same issues.
However, most converters simply swept their failures and inaccuracies under the carpet. They dropped constraints, altered semantics, and made it the implementer’s responsibility to find these issues.
When confronted with the low grades for a code converter, one may think: this is just a matter of programming. How come converters do not succeed at a higher rate?
The insight, that is key to understanding what’s going on is this: manual migrations don’t just modify SQL, they modify the applications that emit SQL. Sometimes in subtle ways, sometimes substantially. Consultants constantly insert code or change business logic to make up for the discrepancy between the SQL dialects of the old and the new database system.
Code converters, however, can only change SQL. And to do so, they can only use the SQL dialect of the destination system. If there is no corresponding construct in the new SQL dialect, then there is no translation. No amount of engineering can make up for this discrepancy.
That also means, code converters cannot get any better. We’d like to believe the converter we tested did everything right — within the theoretical limitations of code conversion. However, aside from the crashes, there is little that can be done to improve the results.
In our experience, this fundamental truth, that code converters cannot be improved, has been the hardest to accept for practitioners and engineers alike.
Code converters are not scalable
Developers love code converters. Developers also love to rewrite and redesign just about any component of their systems. Code converters neatly support this approach. However, that approach is not scalable.
For every application that is to be converted, one must identify and extract the SQL that needs to be translated. Then, the SQL snippet is run through the converter which may require additional work to provide enough context so the converter has a fighting chance of success. Lastly, the translated code is re-inserted into the application; again, a highly intrusive and risk-laden operation.
Even without the inherent limitations of code converters detailed above, this process is painful and excruciating. A typical Enterprise Data Warehouse has easily 50-150 applications that all need to be reworked in this rather labor-intensive way.
But it’s worse: once the code is in place the testing begins. Testing the output of a converter is not so much about validating that it’s working —it’s about assessing just how far off the results are. Then comes the time-consuming process of fixing up individual shortcomings of the translation by modifying the applications.
Because of the frequent and time-consuming manual intervention that’s required, the approach is not scalable for any data warehouse of serious size or complexity.
How can AI fix code converters?
For anything that is difficult or tedious it has become fashionable to simply proclaim some kind of AI will solve it. AI is good at writing code — when it’s closely supervised and directed. Granted, most of the code written by AI is apparently discarded and only 25% or so make it into production. Recently, AI has been used successfully for online translation of human conversations.
So how does AI help here? For sure, this is a difficult problem, thus hoping AI will solve it fits the pattern. It’s tempting to equate SQL translation with natural language processing and the general translation of human languages, a discipline for which AI is quite qualified.
However, AI is not (yet) a terribly good match here. It can help with static translation, but code converters have that already covered. The comparison with human language breaks down quickly, too. SQL is a well-defined language with no deliberate ambiguity but a compact, cleanly defined grammar instead. It is hard to see how AI can do “better” than a converter.
And yet, applying AI like a static code converter suffers not only from the same scalability problems we saw above. It also calls for considerably more effort to check its work, eliminate hallucinations and shortcuts taken. In short, as of today, AI isn’t any better than code converters.
How does Datometry Hyper-Q overcome these limitations?
Datometry Hyper-Q is middleware that makes existing applications written for a legacy system work instantly on a modern data warehouse. It does so by translating and emulating all requests from the application in real time.
In a nutshell, if it runs on, say, Teradata today, it will run on Snowflake or Databricks tomorrow. Practically no code changes are needed.
As opposed to code converters, Hyper-Q can insert the requisite compensating logic at run time without the need for modifying the business logic. Hyper-Q makes up for the discrepancies between old and new system on the fly.
As we’ve seen in the benchmarks, Hyper-Q trounces code converters by a wide margin — even in areas that are arguably sweet spots for code converters. But it goes deeper. Hyper-Q eliminates not only the translation problem. It accelerates testing too. Business users can quickly validate the new system by comparing results side by side from the old and the new stack.
And while we’ve seen that applying AI statically is not a win, infusing AI into a live system for workload acceleration, adaptive optimization, and advanced operations carries much promise. Going forward, Hyper-Q would be a great place to start using AI and unlocking the potential for holistic data management.
About Datometry
Datometry is the leader in database system virtualization. With Datometry Hyper-Q, enterprises can run their existing applications directly on next-generation databases without needing costly and risk-laden database migrations. Datometry counts leading Fortune 500 and Global 2000 enterprises worldwide among their customers. For more information, visit www.datometry.com