Data Lineage

Definition

The documented lifecycle of data as it moves through an organisation's systems, showing its origin, transformations, dependencies, and destinations. Data lineage provides visibility into how data is created, processed, and consumed, enabling organisations to ensure data quality, comply with regulatory requirements (particularly GDPR's right to explanation), debug data pipeline issues, and assess the impact of system changes. Robust data lineage is a key component of data governance maturity.

Complementary Terms

Concepts that frequently appear alongside Data Lineage in practice.

Master Data Management (MDM)

The processes, governance, policies, and technology used to ensure that an organisation's critical shared data entities — such as customers, products, suppliers, and accounts — are accurate, consistent, and controlled across all systems and business units. MDM creates a single trusted source of master data, reducing duplication, resolving conflicts, and enabling reliable reporting and analytics.

First-Party Data

Data collected directly by an organisation from its own customers, users, or audience through owned channels such as websites, apps, CRM systems, transactions, and surveys. First-party data is considered the most valuable data category because it is collected with consent, is unique to the organisation, and provides direct insight into customer behaviour and preferences.

Customer Data Platform (CDP)

A software system that creates a unified, persistent customer database accessible to other systems by collecting and integrating customer data from multiple sources — including CRM, website analytics, email, social media, transactions, and customer service interactions. CDPs resolve customer identities across channels and devices to build comprehensive individual profiles, enabling personalised marketing, customer journey orchestration, and advanced segmentation.

Third-Party Data

Data collected by entities that do not have a direct relationship with the individuals whose data is being gathered, typically aggregated from multiple sources and sold to other organisations for marketing, analytics, or enrichment purposes. The value and availability of third-party data have declined sharply due to privacy regulations (GDPR, CCPA), browser restrictions on third-party cookies, and growing consumer demand for data transparency.

Data Protection Impact Assessment

A structured process required under GDPR Article 35 to identify, assess, and mitigate privacy risks arising from data processing activities that are likely to result in high risk to individuals. DPIAs are mandatory before deploying new technologies, large-scale profiling, or processing sensitive personal data, and must document the necessity, proportionality, and safeguards of the proposed processing.

Zero-Party Data

Data that a customer intentionally and proactively shares with a business, including preferences, purchase intentions, communication choices, and personal context. Unlike first-party data (which is observed from behaviour), zero-party data is explicitly volunteered through mechanisms such as preference centres, surveys, quizzes, and account settings.

Data Sovereignty

The principle that data is subject to the laws and governance structures of the country in which it is collected or stored. Data sovereignty requirements affect cloud computing architecture, cross-border data transfers, and vendor selection, particularly in light of GDPR restrictions on transfers to countries without adequate data protection standards.

Synthetic Data

Artificially generated data that mimics the statistical properties of real-world datasets, used to train machine learning models when actual data is scarce, sensitive, or expensive to obtain. Synthetic data enables AI development in privacy-constrained domains such as healthcare and finance, while reducing data acquisition costs and regulatory exposure.

Put this knowledge to work

Use Opagio's free tools to measure and grow the intangible assets that drive your business value.