All You Need to Know About Metadata
Data without context is meaningless. When presented with data, most people immediately ask a variety of questions, such as: Where did it come from? When was it last updated? Who is responsible for maintaining it? Metadata can answer these questions and many more.
Data without context is meaningless. When presented with data, most people immediately ask a variety of questions, such as: Where did it come from? When was it last updated? Who is responsible for maintaining it? Metadata can answer these questions and many more.
Good metadata practices help us get more value from data by providing the context we need. However, doing metadata right can be a big topic and it can get very technical, very quickly. There are entire global and national communities dedicated to doing metadata well and keeping up to speed can be difficult.
This post will walk through a few key topics that will help you improve your metadata no matter what your current practices are. Starting from basic definitions and moving all the way through technical implementation, this blog is designed to help you understand metadata and know where to look to get better.
Foundations – What is metadata and why do we need it?
Metadata is often simply described as data about data. A variety of definitions exist, and a simple Google search reveals many resources out there which describe in detail the nuts and bolts of metadata in different fields. As Jason notes in a previous blog post, in almost all cases metadata is answering the basic questions of who, what, where, when, and why for data with the goal of providing a summary view of the data in question.
Fundamentally, metadata is providing the context we need to put data to use. This context makes it possible for data to be findable, comparable, and verifiable. In addition, metadata also can provide a standard format to enable interoperability, increasing the quality of data and facilitating greater use of data. Metadata allows us to do the things we want with data like cut costs, encourage collaboration, and increase our understanding of the problems we face. Without it, responsible data use becomes almost impossible.
Components – What does metadata include?
Good metadata includes a few critical elements to provide context. These elements are usually present in datasets no matter the specific data type or field you are working in and provide the answers to the who, what, where, when, and why questions about the data. Take a look at the screenshot below from one dataset hosted on the City of Salinas Open Data portal for an example of metadata in action.
The metadata for this dataset is found on the “Information” tab and provides all the basics about the dataset. This includes key elements like dataset title, publisher, license, last update, language, and keywords. In this case, the metadata also provides the helpful element “Geographic Area” to indicate that this dataset covers more than just the City of Salinas.
Most of these elements are included in all types of metadata. Ensuring that things like title, date, publisher, and keywords are captured can serve as a quick checklist for your metadata practices. Many industries often have their own specific metadata elements as well and in the next section, we will walk through technical considerations for implementing metadata in your organization.
Specifications – How can we ensure metadata is implemented consistently?
Providing data context in repeatable and systematic ways is critical to implementing good metadata practices. Essentially, organizations need to describe and define metadata according to standardized rules. For example, the dataset from Salinas described above conforms to DCAT, which is a vocabulary for publishing data catalogs on the web. DCAT provides the elements, rules, and structures to follow when publishing data on the web and the individual dataset metadata includes elements that follow this structure.
Often organizations publish technical specifications to create consistent, internal rules for metadata. This screenshot below shows a specification from the Environmental Protection Agency (EPA) which details core metadata elements.
The EPA specification does an excellent job of detailing what is needed for each metadata element, including name, description, whether the element is required, and whether it is related only to geospatial data. In addition, for each element there is a more detailed set of guidance lower on the page containing examples of the element in practice, data types details, and acceptable values for the element. Finally, the technical specification refers to several external “standards” that the EPA is following, which is critical to make the EPA data align with other datasets from around the world.
Standards – How do we ensure our metadata aligns with external practices?
The EPA specification detailed above is designed for internal data providers to ensure their metadata conforms to the rules. Basically, it attempts to create internal standards for metadata in the EPA. Complementarily, external metadata standards aim to enable comparison over time, geographies, and industries.
Metadata standards create rules for specific fields to ensure consistency and interoperability. Many industries have their own metadata standards designed to provide guidance to organizations around the world as they catalog their data. A few examples are included below but these are just a small sample of the metadata standards and guidance that exist.
- Content Standard for Digital Geospatial Metadata (CSDGM) – geographic data
- Dublin Core – web publishing
- Data Documentation Initiative – survey data
- RLG Descriptive Metadata – cultural data
- Ocean.data.gov – geographic data related to oceans
Schemas and application profiles – How do we put standards into practice?
Metadata standards lay out principles to follow and issues to look out for in metadata initiatives. But to really put them into practice, two more terms are needed: schema and application profile. I’ll describe the basics and some examples below but if you want to dig into the details and differences in schemas, profiles, and related terms, check out this excellent guide from the International Standards Organization.
A schema is a set of detailed guidelines that describe explicit rules and relationships for each metadata element in a standard. Schemas are much more technical in nature than standards and provide definitions and guidance focusing on semantics and syntax. Schemas also cover what elements in metadata are required (similar to the technical specification from the EPA noted above) in order to make sure key metadata is always captured.
One layer below a schema is an application profile. An application profile is a set of rules and policies that describe how a schema will be implemented in a particular situation or organization. Application profiles go further than schemas by adding specific business rules, examples, and comments on how to implement the schema. Critically, application profiles may include metadata elements from more than one schema. For example if you were publishing geospatial data on the web you might create an application profile with elements from the DCAT or Dublin Core (web data) and the CSDGM (geospatial data) schemas.
A great example of an application profile in action comes from the State of North Carolina, focusing on GIS data. Essentially, application profiles encompass everything an organization needs to do in fine detail related to metadata in one instance. Learning to put schemas and application profiles into use takes time and practice but they are critical pieces of getting to better metadata in your organization.
Tips – What should we keep in mind as we work with metadata?
We covered a lot of ground in this blog and to keep us focused, here a few tips to keep in mind as you implement better metadata in your organization.
- Talk to others – Engaging other people always makes metadata better. Talk to people inside your organization and external experts, work with existing data-focused groups to get a variety of perspectives, and ask for help if you need it. Talking to others will make your metadata practices stronger in the long-term.
- Beg and borrow – With metadata, it’s always better to borrow from someone else. A huge amount of work has been done around the world to create metadata guidance, so use it liberally (this blog has a lot of links as well). Don’t reinvent the wheel and use the great resources out there.
- Document in detail – Documentation is the basis of strong metadata. Much of this post covered how items are logged, defined, and described. Keeping consistent documentation (and eventually a data dictionary) will help you improve your metadata practices over time and learn from how your organization is putting metadata into practice.
Well-structured metadata is crucial to enabling data to be found and used with confidence within organizations and ecosystems. It is central to effective data sharing and reuse. At Opendatasoft our mission is to accelerate data democratization, ensuring that everyone has access to easily understandable information in their working and home lives. Our data portal solution enables data democratization by centralizing all of an organization’s data assets and making it available to all internal and external users in a seamless, intuitive way, without requiring specialist skills or support.
In an increasingly data-driven world, understanding the differences between data, metadata, data assets, and data products is essential to maximizing their potential. This is because these interrelated yet distinct concepts each play a key role in driving digital transformation by facilitating data sharing and consumption at scale.
Growing data volumes, increasing complexity and pressure on budgets - just some of the trends that CDOs need to understand and act on. Based on Gartner research, we analyze CDO challenges and trends and explain how they can deliver greater business value from their initiatives.