How to Accelerate Existing ETL with Databricks 

More and more IT warriors want to accelerate moving of their existing ETL — the ingest, reconciliation, and cleansing of data from various sources across the organization — to Databricks. Wait what? Isn’t ETL one of Databricks’ strong suits in general? So what is this about? 

ETL is the lifeline of every data warehouse. ETL is the firewall between the chaos of the operational world and logical decision making. As such, ETL is incredibly precious. You might not love everything about your ETL, but the investment in it makes it indispensable. 

Migrating existing, and often sprawling ETL installations is a formidable challenge. As it turns out, many IT professionals underestimate what it takes to migrate existing ETL. They hope they somehow can repoint their existing systems only to find out that’s not the case. Conventional migrations require significant rework of existing jobs and may put the entire migration at risk. 

Practitioners are also wary of massive rewrites that are supposed to be “somehow better.” They know, these pitches are just used to mask an inability to make the current setup work. Not to mention, every rewrite runs behind schedule, exceeds the projected cost, and ends up cutting corners in the end to claim success. 

Here, we’ll show how Datometry and database virtualization can overcome these issues by virtualizing your ETL onto Databricks. With Datometry, Databricks becomes a complete drop-in replacement for Teradata and Oracle. With that in mind, let’s look at some of the most common challenges when migrating ETL to Databricks.

Many ETL systems in use at major corporations predate Databricks. They are the workhorses of Data Engineering departments who have honed these systems over the years. Changing the connection of these ETL servers from, say, Teradata to Databricks is not an option if they predate Databricks’ wider adoption. 

There are simply no connectors available. Or, if they are available at all, they would require a costly and time-consuming upgrade to the latest version of the ETL platform — a move the enterprise might not be ready to do just now. 

In contrast, with Datometry repointing becomes a smooth reality. Datometry converts all SQL on-the-fly from Teradata to Databricks SQL. Even feature gaps like global temporary tables are automatically reconciled and fully functional with Datometry. There is no need for new connectors, loaders, or drivers. Instead, your ETL can continue to work as-is. 

If your ETL is full of legacy SQL 

ETL naturally evolves with the specific behaviors of the underlying database. Teradata’s numeric formats may be the standard example for non-portable features. Not surprisingly, over years of development, ETL development simply adapts to the platform it is given to work with. 

Most prominently, there are non-standard extensions of the query language, like Teradata’s Qualify. Another category are keywords that collide when ported between databases. And then, there’s a plethora of semantic differences that may introduce data discrepancies that are notoriously difficult to find. 

Even more intricate are the workarounds addressing specific limitations of the legacy system. They include physical database design and performance optimizations that are almost always irrelevant when moving to Databricks. 

Again, Datometry has your back. With Datometry, legacy SQL continues to work with the same fidelity. Datometry eliminates the risk of inadvertently changing the semantics of complex SQL expressions.  

Your ETL may not use any third-party software. Instead, your ETL is the result of years of custom software development tailored to your organization’s needs. In doing so, your system is using existing tools optimized for use with the legacy database. 

Existing tools are often highly optimized for data throughput and the specific needs of ETL. Replacing them is not only a major paradigm shift that may require adopting an entirely new tool chain. It may also present a bigger challenge of reskilling your workforce on short notice. 

With Datometry, however, legacy tools continue to be your trusted workhorse. Existing scripts, programs, and applications can immediately interact with Databricks with no change. 

Datometry: Move now — modernize later 

Moving to a modern data platform like Databricks is a massive opportunity for the enterprise. However, conventional database migrations take years and produce uneven results at best. In the guise of modernization, they squander precious time and resources. 

Don’t fall into the trap of thinking you must rewrite your ETL to get started. Truth being told, most rewrites end up implementing the very same logic— to make the deadlines. 

With Datometry, you have a much more powerful option at your disposal: Move fast, get going on Databricks now, and outpace your competition. Instead of hastening a modernization and compromising on the vision of your future data platform, separate migration and modernization. 

Once migrated with Datometry, let the individual business units modernize any part of their business logic on their terms. By splitting move from migration, your outcomes are poised to be distinctly superior: massive savings, lowered risk, and the opportunity to do modernization the right way. 

About Datometry

Datometry is the leader in database system virtualization. With Datometry Hyper-Q, enterprises can run their existing applications directly on next-generation databases without needing costly and risk-laden database migrations. Datometry counts leading Fortune 500 and Global 2000 enterprises worldwide among their customers. For more information, visit www.datometry.com