Data Migration: Component 1 – Data Clean-Up

The activity which has the greatest potential to both derail your project and to also have lasting negative impacts on your organization is Data Migration. Unfortunately, it is extremely difficult to avoid this risk, as the vast majority of software/ERP implementations have a Data Migration component attached to them.

So, how can something which seems so conceptually simple – moving a piece of data from System A to System B – carry so much risk? Well, the question contains the answer: the Data Migration process only ‘seems’ simple. Companies regularly under-estimate both the complexity of the under-taking and the amount of time required to successfully execute their data migration program.

In my next several blog posts, I will review each of the critical steps in the data migration process and, by doing so, will hopefully illustrate the intricacies of that process.

Let me start this series by focusing on perhaps the most painful and time-consuming part of the entire process: Data Clean-Up.

Firstly, Data Migration is often referred to by another term: ETL. ETL stands for Extract-Transform-Load, which are the three primary steps Data Migration. (In simple terms, ETL refers to the process by which we pull data out of the old (or ‘legacy’) system, change it so that it fits into the new system, and then enter it into that system.) However, before we even begin the first of these steps (“Extract)”, a significant amount of work will be spent on data clean-up.

THE DATA CLEAN-UP PROCESS

It should be taken as an article of faith that when you are moving to a new system, you do NOT want to bring all the data from your legacy system with you. There are many reasons for this, but perhaps the two most common are:

  1. You don’t want to place huge ‘memory’ and processing demands on your new system right ‘out-of-the-box’
  2. There are certainly pieces of data in your legacy system that will not never ‘see the light of day’ in your new system, so why import them?

So, how do you define what data to move to your new system? Simply put, this is largely up to you. There are no ‘hard-and-fast’ rules. (Even the rules around retaining financial data for 7 years does not mean that all that data needs to be transported to your new system. You can leave this data in your old system, and ‘leave the lights on’ in that system, or you can move the older data to a cheaper storage medium (i.e. tape). However, here are some examples of questions that you might ask yourself when defining which data to move:

  • Do you want to migrate customers with whom you haven’t done business in 1 year? 2 years? 5 years?
  • Do you want to bring over closed POs (for historical purposes)?
  • Do you want to import GL balances into the new system? (I would suggest ‘yes’!)
  • What are the cut-off criteria for the inventory/warehouse transactions that you will move?

In short, for every ‘data element’ (i.e. Goods Receipts, Asset Records, Vendor Master Records, Equipment Inspection Records, Payments, etc.) which you want to migrate to your new system, you must define the parameters which limit the number of records of each element that will be moved.

DUPLICATE RECORDS

Besides defining the ‘migration parameters’, for the data to be migrated, one other critical step in the data clean-up process is to eliminate ‘duplicate’ records.

Duplicate records are – as you would suspect – records which are exact copies of each other. Obviously, having copies of the same record in your system is going to cause confusion. Furthermore, it is a guarantee that there will be records in your legacy system which are duplicates of each other, and from experience I can tell you that identifying duplicates is a huge PITA!

Again, it may be hard to understand why identifying duplicates can be so difficult, but to illustrate, I’ll create an example using an imaginary customer: Acme Inc. (and, yes, I was – and perhaps still am – a huge fan of Wile E. Coyote!)

Let’s assume that Acme is an international company, and your organization does business with many of their offices around the globe. Therefore, you need to make sure that all the different Acme offices are discrete entities.

A more complex scenario is where you have two Acme offices with almost identical addresses. For instance, is the Acme Office at 123 Main Street, Acmeville, the same as the Acme Office at 123A Acmeville?

As well, abbreviations can cause any number of problems when it comes to identifying duplicates: Is Acme, 123 Main Street, Acmeville, the same as Acme, 123 Main St., Acmeville?

There is software which you can use to help identify these duplicates, but if your company is serious about doing a great job of cleaning up your data, the exercise often comes down to some unfortunate having to manually review the lists of customers and vendors.

SUMMARY

I hope this post has given you some idea about the complexity of the data clean-up process. If you are still un-clear about the amount of effort involved, let me close by saying that I generally advise my clients to allow 4 months for their data cleansing activities, and this is before the vendor arrives on site and your project actually begins. (You need to allow this kind of time, as it should be recognized that this activity will be an ‘add-on’ to everyone’s typical daily activities, and typically involves quite a few resources.)

Leave a Reply

Your email address will not be published. Required fields are marked *