The ETL Drag
A common complaint in the construction of a Business Intelligence (BI) solution is the length of time, and the associated cost with building the Extract, Transform, Load (ETL) routines to populate the associated data repositories (i.e., DataMart’s, Data Warehouse). Some estimates place this ETL development at up to 80% of the overall BI development effort.
There are a few reasons for this ETL conundrum that include:
- Consolidation of multiple source (operational) systems
- Efficiencies with data pulls (full vs. incremental)
- Proper interpretation of each systems’ business rules
- Maintaining historical perspectives
None of which is addressed in this article.
But perhaps an equally challenging problem with the construction of a Data Warehouse occurs with the process of coding the ETL logic, which is really an issue inherent to the ETL developer’s toolkit. In short, most ETL tools are centric to the graphical user interface (think very repetitive ‘drag and drop’, ’point and click’). These interfaces create a nice visual representation of data mapping, but at the same time this method generally abandons decades of proven software development techniques, such as ‘code reuse’.
Microsoft recognized this issue several years ago, and sought to address with project Vulcan (the remnant of this now defunct project can be viewed here: https://vulcan.codeplex.com/). Fortunately, a key member of project Vulcan (Scott Currie) has formed a company called Varigence to continue this effort, which has given the BI community an important XML dialect called Biml, along with related products and technologies (Mist, BimlScript, BimlOnline, BimlExpress) which form a healthy part of the growing ‘Biml ecosystem’.
It should be mentioned that while most of the work leveraging Biml is centric to the Microsoft product line, nothing in the Biml language precludes developing emitters for other platforms. The beauty of Biml is it’s a highly readable XML declaration of Business Intelligence assets. As a side note, while database and cube definitions are also a part of Biml, what appears to be of most interest in the BI community, is its application for ETL processes.
To read the remainder of Jim's Biml article which includes code snippets click here: