[Webinar] Collaboration and Monetization of Data Products: The Role of the Data Marketplace

Watch the replay
Glossary

Data Normalization

Data normalization ensures that data from different sources is organized and structured in a uniform, consistent and logical way, removing anomalies.

What is data normalization?

Data normalization ensures that data from different sources is organized and structured in a uniform, consistent and logical way, removing anomalies that may lead to errors. It can be applied to databases (also known as database normalization), where it organizes fields and tables or within data analysis, where it is part of data pre-processing, adjusting the scale of data to ensure uniformity.

Applying data normalization rules can involve changing the data structure, format, type, value, or scale. At a basic level it could mean converting dates to a common format, standardizing units of measurement, or removing outliers.

Essentially data normalization avoids data redundancy (saving storage space and improving performance), while ensuring that data dependencies are logical. It allows data to be queried and analyzed more easily which can lead to better business decisions.

What are data anomalies?

Data anomalies are inconsistencies or errors. For databases they commonly fit into three categories

  • Insertion anomalies, the inability to add data due to the absence of other data
  • Update anomalies, caused by data redundancy and partial updates
  • Deletion anomalies, the unintended loss of data due to deletion of other data

For data analysis, anomalies can be:

  • Missing values
  • Incorrect data types
  • Unrealistic values

What is the difference between data normalization and data standardization?

While both processes are essential to data management, they address different aspects of organization and quality.

  • Data standardization brings information into a consistent format or structure
  • Data normalization organizes and transforms data to eliminate redundancies and improve integrity

How does data normalization work?

Essentially data normalization involves creating standard formats for all data throughout a company, such as around how names and addresses are formatted and how numbers are standardized.

Beyond basic formatting, there are six general rules or “normal forms” (6NF) to performing data normalization. Each rule builds on the one before — so you can only apply the next rule if your data meets the criteria of the preceding one. The first four rules are the primary stages of normalization:

  1. First Normal Form (1NF) or primary key, which focuses on removing duplicate data and ensuring that each record is unique and formatted correctly.
  2. Second Normal Form (2NF), which moves all subsets of data that can exist in multiple rows into separate tables. Once this is done relationships between these new tables and new key labels can be created.
  3. Third Normal Form (3NF), building on 1NF and 2NF, this removes transitive dependencies for non-primary attributes (or columns) and means that they solely depend on the primary key.
  4. Boyce and Codd Normal Form (3.5NF), a developed version of the 3NF, it doesn’t have candidate keys that overlap.

What are the benefits and drawbacks of data normalization?

What are the benefits of data normalization?

Data normalization is crucial to the confident use of data, delivering benefits around

  • Data integrity
  • Data consistency
  • Reduced data redundancy and storage requirements
  • Improved/easier data access and analysis
  • Faster query response times
  • Better decision making through more accurate results
  • Greater efficiency in data management, saving time and resources

What are the disadvantages of data normalization?

Data normalization was first introduced in the 1970s, and technology/data management has evolved since then. This means that some of the advantages of data normalization (such as reducing costly disk storage) are no longer so important. Additionally, new structures such as data warehouses and NoSQL databases do not rely so heavily on data normalization.

Additionally, data normalization can lead to these issues:

  • Slower query response times on more complex queries
  • A requirement for specialists trained in applying data normal forms
  • Increased complexity in database design, reducing flexibility

 

En savoir plus
3 key collaborative features to engage data consumers with your data portal Product
3 key collaborative features to engage data consumers with your data portal

How can you break down silos and make data available to everyone within your organization, not just data specialists? How do you get employees to use data effectively in their everyday working lives? This article explains the key features you need on your data portal to engage users and maximize data sharing and reuse.

Accelerating statistical data sharing with SDMX and intuitive data portals Public Sector
Accelerating statistical data sharing with SDMX and intuitive data portals

Access to accurate statistical information is key to the successful functioning of the global economy and for policymakers and businesses to make informed decisions around subjects that impact us all. How can institutions effectively and efficiently share their statistical data in an interoperable, scalable way to democratize access and build trust?

Centralize all your data assets through Opendatasoft’s unlimited connectivity Product
Centralize all your data assets through Opendatasoft’s unlimited connectivity

In this article, explore how Opendatasoft’s extensive range of connectors enable customers to successfully complete all their connectivity projects and seamlessly industrialize the collection, centralization and availability of all their data assets.

Start creating the best data experiences