Case Study: Netezza & DataStage to Snowflake & Talend Code Conversion
Written by Jared Hillam
About The Client
The Client, who is the leading off-price apparel and home fashions retailer in the U.S. and worldwide, was ranked 85 in the 2019 Fortune 500 company listings. At the end of 2019, the Client had nearly $42 billion in annual sales, more than 4,500 stores in nine countries, four e-commerce sites, and approximately 286,000 associates.
Intricity is a team of specialized data management, data warehousing, and business intelligence experts. The team members at Intricity have been handpicked over the course of 20 years and represent the top talent globally in data-oriented disciplines.
Migration: Challenges & Wins
The Client had used DataStage extensively for many years. However, the adoption of the cloud created a landscape that DataStage was not originally designed for. Additionally, the cost of licensing the ETL environment was prohibitive for the organization. After studying the capabilities of the available ETL platforms, Talend was selected as the target platform. However, the primary barrier to its adoption was the massive footprint that the existing DataStage code had. Replacing all that DataStage code by hand was going to be a massive effort. To determine the cost of the effort, the Client received a quote from its trusted System Integration (SI) partner. However, the quote came to ~$20,000,000. Additionally, the SI’s quote came with a contingency that required a study of the source systems which were quoted at an additional ~$300,000 in cost. The sheer cost of the endeavor made the CIO of the organization tell the cloud providers that the project was on hold unless a more viable refactoring effort could be uncovered.
The Client codebase was truly enormous with both DataStage and Netezza needing to be migrated to Talend and Snowflake respectively.
The conceptual differences between DataStage and Talend were no small issue; the architect of the project compared it to turning “Jupiter into a banana”.
The footprint of mainframe transformations was something that could be supported in Talend but did not have the same level of tuning as IBM had put into DataStage. This was not something that the Talend product teams had in their roadmap.
Win 1: Analyzer
The Snowflake representative reached out to Intricity after the CIO decided to put everything on hold. Intricity presented the BladeBridge code conversion product and how to size the effort with Analyzer. The Client team provided Intricity with the metadata from their DataStage environment and the SQL from Netezza and Intricity ran Analyzer for free to generate a full inventory of all the jobs and the count of their complexity. This analysis also allowed Intricity to generate an empirical tie to its services quote. The quote was ~$3,000,000 – a fraction of the original SI quote. With the results connected to the empirical counts and the complexity findings, both Intricity and the Client were confident in the findings. Intricity had provided the Analyzer results to the Client for free which was something the competing SI proposed as a study for $300,000.
Win 2: Code Migration
In joint collaboration with the BladeBridge and Client teams, Intricity converted the enormous quantities of code from DataStage to Talend. The Client team handled the data testing tasks while Intricity and BladeBridge converted the code for unit test readiness. The BladeBridge code converter allowed the Intricity team to migrate in pattern sets rather than individual code snippets by hand. The automation provided a massive decrease in manual effort. For each iteration, the BladeBridge configurations for converting DataStage and Talend would get further conditioning, automating ~80% of the code migration process. The latter ~20% represented code that was too low repetitiously to adapt to the BladeBridge converter and did not pass a unit test. These jobs usually only required some manual tweaks to fully convert as the core had been converted by BladeBridge. The Analyzer results acted as an effective inventory and tracking tool during the code conversion effort.
Win 3: Adaptation to Mainframe Files
Awareness that mainframe transformations were going to pose a performance problem in Talend occurred about 4 months into the conversion project and this caused no small concern with all parties involved. Since the mainframes were an IBM product, an outsized effort had been made by IBM to ensure that DataStage had such tuning for processing such data. The Talend product team was unable to make such core changes to the Talend engine in the timeframe of the project timelines.
Intricity and BladeBridge came up with a workaround leveraging the flexibility of the BladeBridge configuration platform. The transformations which carried these mainframe transformation jobs were offloaded to a PySpark cluster and then, when finished processing, were re-injected into the Talend workflow. This workaround provided the speed necessary to process the mainframe files and the generation of code for this maneuver was done within the BladeBridge tooling. This was a testament to the range of code generation flexibility provided by BladeBridge.
Win 4: Converted Code in Snowflake
The Client was able to leverage the power of Snowflake’s compute layer and converted all their DataStage and Netezza assets to Talend and Snowflake. This gave the client the power to roll out analytics that were no longer constrained by hard-wired compute and storage limitations.
Who is Intricity?
Intricity is a specialized selection of over 100 Data Management Professionals, with offices located across the USA and Headquarters in New York City. Our team of experts has implemented in a variety of Industries including, Healthcare, Insurance, Manufacturing, Financial Services, Media, Pharmaceutical, Retail, and others. Intricity is uniquely positioned as a partner to the business that deeply understands what makes the data tick. This joint knowledge and acumen has positioned Intricity to beat out its Big 4 competitors time and time again. Intricity’s area of expertise spans the entirety of the information lifecycle. This means when you’re problem involves data; Intricity will be a trusted partner. Intricity's services cover a broad range of data-to-information engineering needs:
What Makes Intricity Different?
While Intricity conducts highly intricate and complex data management projects, Intricity is first a foremost a Business User Centric consulting company. Our internal slogan is to Simplify Complexity. This means that we take complex data management challenges and not only make them understandable to the business but also make them easier to operate. Intricity does this through using tools and techniques that are familiar to business people but adapted for IT content.
Intricity authors a highly sought after Data Management Video Series targeted towards Business Stakeholders at https://www.intricity.com/videos. These videos are used in universities across the world. Here is a small set of universities leveraging Intricity’s videos as a teaching tool:
Talk With a Specialist
If you would like to talk with an Intricity Specialist about your particular scenario, don’t hesitate to reach out to us. You can write us an email:email@example.com
(C) 2020 by Intricity, LLC
This content is the sole property of Intricity LLC. No reproduction can be made without Intricity's explicit consent.
Intricity, LLC. 244 Fifth Avenue Suite 2026 New York, NY 10001 Phone: 212.461.1100 • Fax: 212.461.1110 • Website:www.intricity.com
Case Study: Netezza & DataStage to Snowflake & Talend Code Conversion