Glossary
Snowflake schema
A snowflake schema is a type of database/data warehouse schema used to store data through a multi-dimensional structure.
The snowflake schema is a more complex version of a star schema and consists of a central ‘fact table’ that contains information about events or facts, and a number of ‘dimension tables’ (or lookup tables) that contain information about the dimensions of these facts or events.
Unlike in a star schema, dimension tables can be divided into more than one table, creating the complex, hierarchical snowflake pattern. For example, a table with address dimensions could be split into multiple dimension tables covering parts of the address, such as country, state and city.
What are the characteristics of the Snowflake Schema?
The snowflake schema has these features:
- Normalization: It is a normalized design, which means that data is organized into multiple related tables.
- Hierarchical structure: It has a hierarchical structure with multiple dimension tables containing linked attributes.
- Multiple levels: There are multiple levels of dimension tables, each related to the central fact table.
- Joins: Due to the larger number of hierarchical dimension tables, SQL queries require multiple table joins.
What are the advantages and disadvantages of the Snowflake Schema?
What are the advantages of the Snowflake Schema?
By splitting dimension tables into additional tables, the snowflake schema normalizes tables, splitting them by moving commonly repeating groups of data into new tables to avoid duplication/redundancy. This reduces required disk space, helps with updates and allows for more granular analysis of data as users can drill down into specific subsets of data. As data is highly structured it reduces the problem of data integrity.
It works best when data is detailed, with multiple relationship levels, providing a structure that is optimized for efficiently querying large amounts of data.
What are the disadvantages of the Snowflake Schema?
As it uses multiple hierarchical dimension tables, more table joins are required when executing a query. This negatively impacts system performance, seen through slower query response times and higher resource usage, especially when processing cube data.
It can also be more difficult to understand relationships and maintain snowflake schemas because of the increased complexity of the schema design.
What are the uses of the Snowflake Schema?
Snowflake schemas are most commonly found in dimensional data warehouses and data marts. They are especially used when organizations and data scientists want to be able to carry out more granular analysis and require adaptability to meet changing data requirements.
What is the difference between the Snowflake Schema and the Star Schema?
The snowflake schema is a development of the star schema. There are seven main differences between them:
- Normalization: The snowflake schema has a fully normalized data structure, with dimensional hierarchies stored in separate dimensional tables. Star schemas are denormalized, with values repeated within a table.
- Data redundancy: As it normalizes dimension tables, the snowflake schema does not store redundant/repeated values.
- Disk space: By not storing redundant data, the snowflake schema uses less disk space.
- Query complexity: The snowflake scheme requires a more complex query design, with multiple joins. As it involves only one level of dimension tables, the star schema does not require queries with multiple joins.
- Query performance: Due to the greater number of joins, running queries in snowflake schemas are slower than in star schema.
- Data integrity: In the star schema, multiple copies of the same data exist in different dimensional tables. This means new inserts, updates, or deletes can compromise the integrity of data. In contrast the snowflake schema stores dimension data once, improving data integrity.
- Set up and maintenance: As they are more complex, snowflake schemas are harder to design and set up than snowflake schema. However, due to potential data integrity issues in star schemas, snowflake schemas are easier to maintain when new data is added to a data warehouse.
Learn more
Data access
How to break down organizational silos to engage everyone in your data project
Organizational silos prevent data sharing and collaboration, increasing risk and reducing efficiency and innovation. How can companies remove them and ensure that data flows seamlessly around the organization so that it can be used by every employee?