Understanding the Data Migration Tool

This document provides some practical guidelines to populating the data conversion canonical schema in order to populate data successfully.

DATA CONVERSION TOOL

sharedo provides a dedicated data on-boarding framework. This framework is designed to overcome the short comings of a pure ETL based approach which is prone, with complex schemas, such as sharedo to leave data in an inconsistent state and is not easily extendable to meet different client processing scenarios.

The data-on-boarding framework provides an extensible framework to accommodate different data domains. In the initial release the following data domains will be supported

  • Sharedos of the following types Statements of Work, Tasks, Matters, Proceedings, Offers, Key Dates and their constituent parts
  • Documents related to cases
  • Operational Data Source (ODS) Entities e.g. People, Organisations, Teams

For each of these items the framework provides

  • A Canonical SQL schema for the import of these entities
    • It is important to note that the framework does NOT provide facilities from transformation of data sources to this schema as this is best served by other (ETL) tools
    • But does provide a framework to implement client specific adapters if required
  • A Configuration utility – enabling users to specify
    • The validation rules that should be run against the information
    • The data defaults or other pre and post load processing rules that should be applied to this information
    • The behaviour, in terms of smart plans that should be affected when information is imported.
  • A Loader utility that
    • Can be operated manually in which case it provides visual feedback for a particular run to a user
      • Or can be called as part of a specific API e.g. operating behind a B2B API and therefore should be capable of running to completion without human interaction.
    • Enables information to be loaded for validation purposes only without affecting the live data set
    • Can be run both as part of a stand-alone data migration or client on boarding exercise but also as part of an incremental load of information, say, into a client contract.
    • Provides actionable details on validation errors
  • A set of reporting and analysis tools to assist with the on-boarding of significant volumes of data


DATA FLOW


YOUR Data Sources

We recommend extracting data from your datasources into an intermediary schema so that you can apply the transformation necessary to load into the sharedo import schema.


Staging ETL

Your Staging ETL Process will be responsible for populating the Sharedo_Import database with the records that are required to be loaded into sharedo.

This process is typically implemented by the client using map data from the source locations according to the mapping provided in the Master Data Dictionary.


Canonical Staging

The canonical staging database provides a location and structure which is understood by the sharedo data load framework. All data required in the migration will be uploaded to this schema.

The conceptual structure of this database is shown below;

The canonical schema is broken down in a number of distinct “top level” areas

  • Sharedo’s – this is where you populate your matters, tasks and so on
  • ODS – this is where you specify people or organisations
  • Documents – this is where you specify documents to be imported.

This guide presents some practical advice to populating the data migration canonical schema for these types.


Terminology

The primary work container is a case – this in turn will have various datasets associated. For example, a case has a list of participants, key dates, tasks etc. The top-level entity that relates to a case is called a matter in the database. These terms are interchangeable in most scenarios, though a matter more correctly describes the top level attributes held in the import.matter table whereas the case is the wider collection of datasets.