Glossary
Data lineage
Data lineage (or data traceability) provides full visibility of the data lifecycle inside and outside the organization, including any changes made.
What is data lineage?
As organizations become increasingly data-driven, they have to trust the data that they are working with. Data lineage (also known as data traceability) aims to build this trust by ensuring that there is a full picture of where particular data has come from, how it has been changed, processed, or enriched, where it has been used, who has used it, and where it will go in the future.
Companies need to be able to trace data upstream and downstream back to its original source to ensure quality, good governance and regulatory compliance, all the way to the end of its lifecycle. This helps them see how data is being reused, both inside and outside the organization.
Data lineage covers the full data lifecycle:
- The origins of the data, and whether it is internal or external
- The level of sensitivity of the data (such as if it contains personal customer information)
- The systems it has flowed through
- Any changes that have been made, including enrichment and standardization to meet governance requirements
- Who it is shared with (internally and externally) and how this is used (such as for business intelligence, and within operational systems)
Data lineage solutions provide a visual representation of the data lifecycle, enabling data administrators to drill down into how it has been created and then transformed/moved and used throughout the organization and wider external ecosystem.
What is the difference between data lineage and data traceability?
The terms data lineage and data traceability are often used interchangeably as there is no real difference between them. They both describe the same process of understanding the data lifecycle and providing full visibility across it.
A third term – data provenance – refers to the origin of the data, i.e. how and where it was created.
Data lineage/data traceability can be broken down into two areas:
- Business lineage: Looking at how data has been changed from a business perspective. It provides a simplified view of where data comes from, the policies/processes/standards that were applied to it and how it has been used. This gives business users trust in the data when using it in, for example, decision making.
- Technical lineage: A more in-depth view of how data moves and transforms between systems, tables and columns, that is normally only understandable by technical/IT users. It covers areas such as the applications data flows through, technical transformations, look ups and staging tables. While too complex for business users it is vital to ensuring technical data quality and debugging errors in the data sharing process.
Why is data lineage important?
Data lineage is vital to delivering confidence in the data that is used to power a business. Strong data lineage allows organizations to:
- Have trust that the data being used for business operations is accurate and high quality, so that any decisions based on it will therefore be valid. As companies increasingly introduce advanced analytics and AI that automate decision-making, traceability becomes even more critical.
- Ensure data governance by tracking and monitoring how data is used (and by whom).
- Support compliance by being able to prove that data meets both organizational policies and external privacy regulations, such as GDPR. This makes data lineage a key part of risk management when it comes to data.
- Securely protect data by understanding the systems it flows through and who has access to it.
- Enable debugging by highlighting errors that potentially impact data use and flow.
- Manage technical migrations, such as to the cloud, by modeling data flows and the impact of any technology/system changes on downstream solutions.
What are the challenges to data lineage?
Organizations generate enormous amounts of data, and increasingly add to this with information from partners and their wider ecosystems. This brings five key challenges to data lineage:
- Volume and range: the number of different data sources continues to grow as organizations digitize and more and more data-producing devices (such as IoT sensors) are added to their infrastructure. This means that the amount of data an organization has to manage is growing exponentially and all need to be fully traceable across their life cycles.
- Speed: data now moves at a much greater velocity within organizations. Whereas in the past weekly or monthly reporting was standard, users now need access to trusted data on a real-time basis.
- Compliance: regulators (and consumers) are increasingly focused on ensuring that information, particularly personal data, is used and protected in ways that meet legislation such as the CCPA and GDPR. This adds a further level of importance to traceability to provide an audit trail to regulators as required.
- Complexity: All of these factors mean that organizations have a much more complex data environment to manage, again making traceability key.
- Collaboration: monitoring data across the organization and more importantly with external partners requires open collaboration between departments and organizations to break down silos.
Learn more
Data Marketplace
3 reasons why data marketplaces are the only solution to turn data into value
How can you maximize the value of data and use it to achieve organizational objectives? That’s the ambitious goal of many data leaders as they plan for 2025. In an increasingly digitalized world, where data volumes are exploding, to generate value data leaders need to enable everyone in the business to easily access the right information in a seamless way. Data marketplaces are essential to this, delivering capabilities that move beyond traditional data catalogs, as this article explains.
Data Trends
2025 data leader trends and the importance of self-service data – insights from Gartner
Growing data volumes, increasing complexity and pressure on budgets - just some of the trends that CDOs need to understand and act on. Based on Gartner research, we analyze CDO challenges and trends and explain how they can deliver greater business value from their initiatives.