Cloud projects: adopting the correct data lineage approach

By Peter Ferguson
on Mar 17, 2022

Effective data governance has become a vital requirement for a successful enterprise in an increasingly data-driven world.

But, as well as the traditional challenges of maintaining high-quality, fit-for-purpose data while complying with ever-growing legislative and regulatory requirements the data world has added the new imperative: gaining the competitive advantages made possible by moving some or all of your data to the cloud.

With such changes to the data environment, it is immediately apparent that traditional, manual, spreadsheet-driven data governance processes aren’t up to the task. How can those who wish to modernize their data governance processes in line with their progress to the cloud achieve this?

CDMC review

An initiative of the EDMC, the CDMC (Cloud Data Management Capabilities) framework is a thorough review of best practices for managing data in the cloud.

It comprises six major categories:

  • governance and accountability
  • cataloguing
  • accessibility
  • protection
  • data lifecycle
  • data and technical architecture.

Through these, it lays out how the implementation and maintenance of cloud storage may best be achieved.

A holistic approach to the many and varied challenges posed by cloud adoption has not been attempted before. This is one of the most significant and comprehensive analyses of cloud data management ever compiled.

The ‘skeleton’ of the CDMC framework was laid out initially as a series of six Components, containing 14 major Capabilities, further divided into 37 Sub-Capabilities, to which, as a practitioner in cloud and data management, data architecture, lineage and catalog, I contributed. These formed the basic ‘building blocks’ assigned to relevant subject-matter experts on the various subgroups that took ownership of them. In addition, specialist working groups were set up to deal with areas of particular importance, such as data lineage, of which I was also a participant.

Once a draft of a topic (either Upper Matter, Capability or Sub-Capability) was developed, members of the working group would be informed. They could read, review, and comment on it in the EDMC Dropbox before a full Review Committee review took place in one of the scheduled sessions. 

If you’re an EDM Council member, you can view the new models that emerged from this initiative here. Feel free to contact us via this form or at info@solidatus.com if you need any help.

Transformation with the cloud

It brought home to me how rapidly and radically good data management has been transformed by the arrival of cloud technology and how the consequences and ramifications of its adoption have remained nebulous until now. For example, with the exception of global multinational companies, data sovereignty and cross-border data movement were not concerns that typically would have bothered an on-premises installation.

Once a company decides to put its data into the cloud, though, questions on where it may end up being held become pertinent, and you must establish, understand and observe jurisdictional requirements. In turn, this is just one example of how vital a comprehensive data catalog is to successful cloud adoption. Without one, the metadata essential to complying with the myriad regulations and data privacy obligations to which holders of data are subjected cannot be stored and used to avoid regulatory infractions.

Understanding the data

Over and over again in these review sessions, practitioners stressed the absolute importance of understanding the data that you are putting into the cloud, at the time it is put into the cloud, and that without catalogs containing metadata to describe those data assets fully, controlling and managing that data becomes impossible. 

Multi-cloud and hybrid-cloud environments also present additional challenges for keeping data lineage accurately recorded and up to date. Unsurprisingly, automation is a major theme throughout. For example, "The data lineage in cloud environments must be captured automatically, and changes to lineage must be tracked and managed. Visualization and reporting of lineage must be implemented to meet the needs of both business and technical users."

Five objectives for data lineage are included:

  • Implement automated functionality that identifies processes that move data.
  • Record data lineage metadata for data movement processes that are discovered automatically.
  • Ensure lineage auto-discovery identifies processes that move data across jurisdictions, availability zones and physical boundaries.
  • Ensure lineage auto-discovery is enabled in hybrid and multiple cloud environments, and identifies data movement between those environments.
  • Define and implement processes for the review of auto-discovered lineage information.

Data governance platforms

The best contemporary data management solutions, including those that help with data governance, are designed to provide exactly these capabilities, in radical contrast to cumbersome, manual, text-based methods.

For organizations using DCAM to assess and monitor their data governance maturity, you can use Solidatus to map to CDMC, empowering a holistic data governance approach across both on-premises and cloud data on an enterprise-wide basis; one which can be automatically maintained, giving management confidence that its data governance solutions are optimised for the future, not stuck in the past.

A note on Solidatus for EDMC members

Solidatus is a powerful tool for efficient data management, visualization and discovery. As the selected EDM Council knowledge modeling solution, EDM Council members are given read-only access to the modeled frameworks and how they interact with each other as well as map on to further privacy and financial regulations.

Using Solidatus, you create living blueprints that map how your data flows, and how it’s affected, as it moves around your systems, both now and at other points in time.

By revealing hidden opportunities and threats, and showing you the impact of change, a Solidatus data lineage blueprint delivers clarity and an enriched understanding of your ecosystem, so you can optimize your infrastructure, operate more efficiently, and minimize risk.

If you are not an EDM Council member but would like to view the models, please contact us via this form or email info@solidatus.com.

You may also be interested in:
7 things the Chief Data Officer needs to know about ESG
The case for the Solidatus approach to data lineage: new podcast
The power of data lineage to deliver compliance and business insight
Data isn't expensive, complexity is

Get your free Solidatus preview

Topics: Data Governance, Cloud, Cloud Migration

Author: Peter Ferguson

As Principal Data Architect, Peter has produced models to demonstrate the capabilities of Solidatus that include a MiFID II Data Dictionary, the General Data Protection Regulation (GDPR) and the FIX Trade Capture Report mapped to RTS 22. He also models clients’ data, as well as writing documentation and publicity material.
Find me on: