Data Migration 4: Some Common ETL Methodologies & Tools

In last week’s blog, I touched briefly on some of the more common tools/methods you can use to move data from your legacy system to your new software. This week I provide some more detail on those tools. Let’s start with the most obvious!

MANUAL LOAD

The simplest way to go about loading data into your new system is to enter it manually. Unfortunately, this approach has two obvious – and significant – drawbacks: it’s a very slow and it is open to data entry errors.

Having said that, if you are dealing with a relatively small data set, and the data can be easily validated once loaded, it may make sense to adopt this approach. Also, there are applications on the market which can speed up this process, as they can mimic the work of many sets of human hands. These apps are generally called ‘scripting tools’.

SCRIPTING TOOLS

Scripting tools essentially record the steps that it takes to load a piece of data (i.e. a customer master record). The application then replays the recording over and over, populating the various required fields with data it pulls from some source, such as an Excel spreadsheet. (The spreadsheet would have been populated with data from your legacy system.)

Not only do scripting tools significantly speed up the ‘manual load’ process, but they also minimize the number of data entry errors. However, in my opinion, these applications are most appropriate for use with relatively small amounts of data that do not have a lot of complexity associated with their load into a new system

ETL SOFTWARE

The market abounds with applications which specialize in helping companies with their Extract- Transform-Load process. These applications – including several open-source options – cover a broad range of cost and sophistication, which means there needs to be significant investigation in order to establish which tool is most appropriate for your company.

However, if you are looking to move vast amounts of data, purchasing an ETL software is almost essential. This is especially true if you plan on updating your data on a regular basis, which is often the case.

SOFTWARE-SPECIFIC APPLICATION

One other class of ETL tools that I should mention are those that are specific to a particular software. Many of the software packages out there come with a ‘built-in’ ETL tool or process. Therefore, you should be able to use this tool to manage the data-load into that particular software.

CUSTOM PROGRAMS

The last – and least preferred – option when it comes to the tools you can use to transfer data from your legacy system into your new software is a custom program. If all else fails, you may be forced into a situation where you have to develop your own program specifically to load a particular data-set. I say this is the last resort because developing these types of programs tends to be very time-consuming (think not only in terms of development time, but also all the testing that would be required), and therefore expensive.

Custom programs are typically utilized for large, industry-specific data sets that may prove to be too complex for the other ETL tools to handle. Hopefully you will be able to execute your ETL Plan without being forced to develop a custom program (or – God forbid – programs!), but you may not have any choice.

SUMMARY

The table below provides a snapshot of each of the tools listed above and in what situations they should best be used.

Method/Tool Description Pros Cons
Manual Load Manually entering data into your new system Simple Slow; error-prone
Scripting Tool Records the steps in the process associated with loading a particular data set, and plays it back over and over Relatively inexpensive; more efficient than a manual load Not useful for large, or particularly complex, data sets
ETL Software External application purchased specifically to manage the bulk of ETL activities Robust apps which can be used with many types/large volumes of data (might be the only ETL tool you’ll need!) The software can be expensive; will probably require hiring/retaining resources required to support the tool
Software-Specific An ETL application which comes built-in to your new software No incremental cost to purchase; designed to work seamlessly with your new software Can only be used when loading data into that specific software
Custom Program Internally-built script used to load a unique, or particularly complex, data set Can manage very complex data-load scenarios; designed specifically to work for your company Labour intensive/expensive to build, test and maintain

Leave a Reply

Your email address will not be published. Required fields are marked *