Are you thinking about moving data (e.g., patient files, images, telemedicine survey data) from that clunky old Yugo computer or storage system to a brand new shiny Ferrari system? Are those happy thoughts or are they clouded by dread and visions of digital monsters eating up all your data and laughing in your face? You are not alone.
Data migration can be painful but there are ways to lessen that pain with careful preparation. First let’s start with a few basic terms. Legacy data is what you have now and want to transfer. Data migration is the process of importing that data into the new system. Data cleansing is getting the old data ready to move – making it compatible with what the new system requires or expects. It’s this last part that is often the most difficult.
One of the first steps in this process is to decide on what your goals are. Here are some questions you should be thinking about.
Next consider what type of data you have: spreadsheet entries, images, text (pdf, doc, etc.), dedicated program formatted. You should also consider whether the data adhere to some sort of standard in its present form or will it have to in the future (for example SQL). The problem is that there are lots of data formats and some databases contain multiple formats (e.g., CSV, Dbase2, DIF, Excel, HTML, SQL, SYLK, Text, WKS, Windows metafiles) increasing the complexity of the transfer. Thus it is key to start out knowing what you actually have before figuring out what can be done with it.
What are your options? Well the first one certainly has some appeal – avoid it! Simply leave the old data where it is under the expectation it will likely not be needed or needed only rarely and thus it can stay where it is until that system dies and hope it’s never needed again. Attractive but not always feasible. The next option is to work incrementally. Maybe start moving the most recent (thus most likely to be requested) data over then progressively work back in time until everything has been transitioned over. This can take a while but if time is not that pressing of an issue it has its benefits. Finally there is the option to use existing tools or services – there are companies that do this! As with any service you need to shop around, get the best prices, guarantees, verify the time frame, what happens if data are lost, etc. If you have the resources this is often your best and easiest option.
What are some of the more common problems people encounter?
All of these typically can be solved in the data cleansing phase, but they need to be identified before they can be dealt and this is the difficult and often tedious part. The problem is often confounded by the fact that a lot of older databases were set up and used by one or two people but they are often no longer available or have themselves forgotten all the rules and shortcuts they developed!
There is an interesting option for researchers called REDCap (Research Electronic Data Capture) that is available at http://project-redcap.org/. This is a mature, secure web-based tool for building and managing online surveys and databases for research studies. It is available free of charge but is not open-source. You can join the consortium to get the source code with a valid end-user license agreement through Vanderbilt University. It may be necessary to contact your Tech Transfer or General Counsel Office but it is a fairly straightforward process. REDCap can be installed on a variety of environments (e.g., Mac, Windows, Linux, Unix) and is compliant with HIPAA, 21 CFR Part II, FISMA and other international standards.
So don’t let the data migration demons haunt you! It can be challenging but if you utilize available resources, be prepared to invest the necessary preparation time and resources (yes financial too), strategize, plan, and think about how you will ensure that all the data you wanted to migrate made it over (i.e., validation) half of the challenge is already met.