With all the buzz around cloud technology, it’s not uncommon to come across a team or organization that is planning or currently in the process of migrating data from one or more legacy systems over to a cloud infrastructure. And this is for a good reason. Moving to the cloud offers an opportunity to use data in different ways by different consumers (e.g., data scientists, analytic teams). This may be exactly what you’re working on — trying to determine which cloud to migrate to, what resources it will require, and how long it will take, among many other factors.
While these are all important considerations, it is critical to think about how you will clean and curate your data before any migration. It’s all too common for organizations to lift and shift their data without giving prior thought to improving the quality of the data before it’s moved. So while the data may reside in a new location (the cloud), it is still as unusable as before, and the data consumers don’t possess an intimate knowledge of the data or how it got there.
Why is this? Principally it’s because the data is still sitting in several places (albeit on the cloud) and is not defined into logical entities that the business can easily digest.
Think of this issue like moving into a new apartment. Would you pack up the messes in your old place and simply ship the messes to your new apartment? Of course not. More likely, you would neatly organize and pack the things you plan to bring so that when it gets to your new place, it’s easy to unpack, and you’ve given yourself a fresh and clean start. The same logic can be applied to data migrations; figure out what is most important to shift, logically organize, master, and curate, and then migrate to the new infrastructure. Taking the time to organize and curate data into logical entities allows you to have a known baseline to plug into existing applications today and new ones tomorrow.
ETL alone cannot solve cloud migration issues
A typical migration process may heavily utilize an ETL tool to move data from multiple source systems. Also, ETL services are used to build the required logical entities (customers, accounts, etc.) for reporting and any further processing of data before being stored in a downstream data warehouse for consumption. There may be additional reporting logic needed to power the analytics and reporting downstream systems and any future migration efforts. Some of the common pitfalls with this strategy are:
The Tamr Advantage
Tamr can be used to generate the logical entities for reporting via accelerated schema mapping and data mastering. This can provide a trusted view of each entity for reporting combining data from multiple sources from Day 1. Reporting systems can use this data and any other data that may not need to go through Tamr (e.g., transactions).
The need for complex additional ETL is severely reduced due to Tamr’s machine learning-based capabilities for reporting and migration.
In this workflow, multiple disparate internal and external data sources are brought together into a landing zone. From here, Tamr’s Schema Mapping provides accelerated entity mapping to reduce the time it takes to align common attributes and build target data models. Once the logical entities are defined, Tamr’s Record Mastering and Golden Records capabilities match and de-dupe the data to provide a new curated layer for the logical entities. These cleansed and mastered datasets are then sent to the new cloud infrastructure–eventually feeding into downstream reporting and analytics applications.
Under the hood, Tamr can map, enrich, match, classify, and consolidate data at scale thanks to the patented human-guided machine learning technology. Business data experts — who are often most familiar with the data — contribute directly to the mastering model by answering simple match or no-match questions about the data. This iterative process hastens the time to develop accurate, curated datasets as part of the migration since no traditional rule development processes are implemented.
Getting started with Tamr
The target cloud infrastructure will have data that is ready to use from the get-go. This allows you to immediately start benefiting from all the new features in the cloud that drove your decision to migrate in the first place. Tamr’s solution allows for this requirement to be met with best-of-breed machine learning technology that accelerates at scale and optimally cleanses and organizes your data during the migration.
To learn more about Tamr’s data migration services, download our full white paper, Migration, Unified.
The original article by Fabien Vaucheret, sales engineer at Tamr, is here.
The views and opinions expressed in this article are those of the author and do not necessarily reflect those of CDOTrends. Image credit: iStockphoto/Dilok Klaisataporn; Charts and tables: Tamr