Data Loading Principles and Practices

Or, How To Load Complex Data, Easily and Repeatedly

Introduction

In ShareDo, there are three methods of ingesting data:

Using public REST APIs: Suitable for real-time integrations or to keep two systems that are both masters of data synchronised with each other.
Excel/CSV data table upload: For small volumes, where data is processed via a workflow.
Bulk data loading: For large volumes, using SQL, as part of an Extract, Transform, and Load (ETL) project.

The Data Table Upload feature released in ShareDo version 7.8.0 replaces the legacy CSV and Excel data upload feature, and the Data Loader v2 feature replaces the legacy Bulk Data Upload feature.

This article details the tooling to support bulk data loading, specifically the Load step of an ETL process.

Typically, data to be loaded includes active and inactive cases (also known as matters) along with their associated data, such as instructions, tasks, comments, key dates, bespoke data fields, etc. In addition to the case data, a set of organisations and people will exist and be reused across cases. In turn, the organisation and people data will likely include contact details, address information and their own bespoke data fields.

The data load framework within ShareDo supports the bulk loading of all of this data via low-ceremony interim database tables. The framework isn't limited to the known work types, but can be extended to enable the loading of any data that can be configured as a work type in Modeller. Please contact us for details on the extensibility framework. In addition to extending the work type support, Matching and Validation logic can also be extended. Please contact us to discuss.

Clients or partners will extract the data from the source systems and, combined with knowledge of the configuration, transform it into the import-ready structures. This data is then brought into the ShareDo environment for validation and loading. Data loading is an interactive activity, partly as the configuration may be refined throughout the development cycle, but also as a result of business validation of the migrated data uncovering further data mapping requirements. The framework supports this iterative approach - once loaded, the data can be safely and easily unloaded in a performant manner, leaving the system in a valid state. Please contact us for guidance on project planning for build and test cycles around data migration.

Please refer to the Data Loading v2 article for a detailed description of the user interface and operations that can be performed. The articles (here) describe in detail how to populate ODS, work item, and document metadata interim (import) tables.

The remainder of this article covers advice for various points of the process. Please contact us or a partner to assist with the process.

This article applies to clients using ShareDo version 7.9 or later, or those targeting the use of 7.9 at the point of go-live. If you are using earlier versions of ShareDo, please contact us.

Glossary of Terms

Term	Meaning
Extract	The movement of data between systems.
Transform	The morph of data into a different type, often following business rules.
Load	The controlled ingestion of data into an environment.
ODS	Operational Data Store - long lived, cross case data including organisations, people, addresses, contact methods etc.
Staging database	A database that is populated with data that has been extracted, and has transformation rules applied to it. Within the dataloading framework, this is (typically) a small Azure SQL database within the client's control.
Source schema	The database schema that contains the tables within the staging database.
Import tables	The database tables within a ShareDo environment that hold the data ready for validation against the environment and subsequent loading into the environment.
Data Migration Environment, aka DM	The non-production ShareDo environment dedicated to data loading; this is distinct from a development or UAT environment and has restricted access (see notes below).

Data Flow

Data integrity and security are important. A principle of data import is that data should never exist at rest in a backup file outside of the client's network at any point in an ETL project. For the Dataloading framework, this means ShareDo will connect to a client-controlled staging database with data prepared in tables (the source tables) and be extracted directly into ShareDo's import tables in the data migration environment.

A key motivation for the staging database is that ShareDo does not control access, and that the access required is read-only. This database holds the source schema and can exist in any Microsoft SQL Server database - on premise (i.e. in a DMZ), Azure SQL, a VM in Azure, AWS, etc. The dataload framework connects via managed identity (for Azure SQL) or using a named SQL account - any port can be used.

When using Azure SQL, it can be the smallest configuration subject to the data size. As of May 2025, a single database using the vCore model, serverless, with minimal redundancy and space for 32 GB, costs less than £10 per month. An advantage of the Azure SQL database is that IP firewall restrictions are in place by default, and it is easy to configure for the one (1) IP that the data migration or production environment would require.

Supporting Assets

Schema Scripts

The schema for the staging database can be downloaded from any ShareDo environment in the Data Loading page from within the Admin area. For any given version of ShareDo, these schema scripts are the same and (in principle) have no breaking changes between versions. The (source) schema can co-exist with other schemas in the database; the Dataload framework will only extract data from the specific tables required.

Configuration Scripts

Solution Architects within the ShareDo, client or partner teams create and curate the solution. This may include work types, roles, key dates, quality rules, workflows, bespoke data capture, etc. The ‘source of truth’ for this activity is the Modeller area of ShareDo; however, for data mapping purposes, it is beneficial to have this data in low-ceremony tables to fit in with the Extract and Translation activities. These are referred to as the ‘Import Config Scripts’. These are best taken from the environment that holds the ‘gold configuration’, as the DM environment may lag behind.

The import configuration scripts can be downloaded from the Data Loading page from within the Admin area. The file is a Zip and contains SQL insert statements; this is generated at run-time, so holds the current configuration. Please do contact us or a partner for assistance and queries, and if configuration types are not included in the extract, then we can extend without requiring a product enhancement.

Additional Considerations

Data Migration Lifecycle

Data migration is typically an iterative activity that aligns with project iterations. As an example pattern:

At the start of the iteration, the migrated data from the previous iteration is unloaded (using the Data Loading UI).
The dataload can be removed, and ideally is removed, to maintain clean separation of dataload runs.
The configuration is exported from the gold environment into the DM environment.
The data migration team will refine the extract and translation logic and stage fresh data.
The data is extracted, validated, and loaded into ShareDo using the data loading page within ShareDo.
The data is reviewed in the environment for correctness and completeness.
At the end of the review, the data is unloaded via step 1.

Underpinning this is the restriction of a single active dataload in flight at a time. Whilst the data load is open, the scope, configuration, validation, and load messages are available. When the data load is closed, all extracted data and tracking data is removed. If step 2 above is skipped or omitted and a new data load is created, the previous data is treated as pre-existing and can't be removed. Clearly, for a production environment, this is necessary; however, if step 2 is skipped in the data migration environment, then other destructive tools will be required. Contact us for assistance.

Event Generation

During data loading, normal event generation is suppressed. This is primarily for performance, though it has the benefit that closed cases do not trigger unexpected automation. In addition, the case data is loaded ‘at rest’—there is no support to load case data and instantiate workflows mid-flow.

We support the triggering of a specific event type on data load data, which can be used to handle the specific scenarios of in-flight workflows. This is an advanced technique, and if required, please discuss it with the Sharedo (or partner) Solution Architects.

Extensibility Framework

In addition to the run-time configuration and validation, it is possible to extend the framework to support alternative loaders for existing data types or new loaders for new data types. The code repository https://github.com/sharedo-tech/data-loading-extensibility contains examples for these two scenarios. The developer guide contains further general information on development principles. For extending the data loading framework, please contact us.

Performance and Volumes

In non-production and as part of an interactive project approach, small test sample sizes are good - perhaps 1K matters with related data. This enables a quick load-review cycle. Larger volumes are supported, however, the data migration environment is not typically scaled the same as production, so any performance metrics from the DM environment are of limited use. As part of cut-over planning, consideration of data volumes and time to load often feeds into the critical path of a go-live event. A common approach is to partition the data so that (for example) closed and aged cases can be loaded first, before the critical path, and then further load of open and recent cases as part of the cut-over. Please contact us or a partner for assistance with planning this stage of the project.

Data Migration Environment

While a client's development or UAT environment usually has access rules that allow many ShareDo and client users to access it, the data migration (DM) environment, by default, has restricted user access. The access rules are typically managed by the client or partner data migration team. A slight variation on this is to use the production environment for data loading up until the point of the first go-live. Please contact us to discuss environment requirements, costs, timescales, acceptable use, performance considerations, etc.