[REPLAY] Product Talk: Using AI to enhance the data marketplace search experience

Watch the replay
Digital transformation

Is It Possible to Anonymize Open Data?

Although data is highly useful, how do you protect the people to whom the personally identifying information belongs? Read on to learn more about whether it is indeed possible to anonymize data so that you can reap the value of open data.

Brand content manager, Opendatasoft
More articles

While the benefits of open data have been clear for some time, there’s a major concern preventing more municipalities and agencies from utilizing it. That big worry is privacy.

Copy to clipboard

Data anonymization isn’t a new development. You’ll see it in use in any country that has a census. We’ll take US census data. Every ten years, the US government collects information from millions of citizens. It never reveals personally identifying data of individual respondents; rather, the government publishes its information in aggregate.

“Data anonymization isn’t a new development”

However, data gathered from individuals is quite valuable. Researchers have spent years trying to figure out how to preserve anonymity (and by extension, privacy) while still utilizing data gathered at the micro level.

Copy to clipboard

There are a few data anonymization techniques in use today that are effective at protecting people’s identities. One technique is noise addition. The term “noise” refers to adding imprecise figures to the data on purpose. For example, a census taker might personally interview Jane Smith, who is 65 years old. When the data is published, her age appears in a range from 60 to 69 years.

“Noise addition refers to adding imprecise figures to the data on purpose”

A second method is substitution. As its name implies, substitution works by exchanging one identifying factor for another. We’ll go back to the example of Jane Smith – instead of listing her age as 65 years; it would appear as the color red (as it would with anyone else of the same age).

The third approach to data anonymization is differential privacy. This approach involves giving a third party access to an anonymized data set, while the organization that gathered the information maintains the original set. Noise addition and substitution are generally applied when differential privacy is used.

“Researchers have determined that donut geomasking provides a higher level of privacy protection”

A fourth means of de-identifying data is known as geomasking. It’s the opposite of geocoding, in which street addresses are matched to map coordinates. There’s more than one method of geomasking, but “donut” geomasking (in which each geocoded address is relocated in a random direction by a minimum distance from its original location, but isn’t relocated more than a certain maximum distance). Researchers have determined that donut geomasking consistently provides a higher level of privacy protection in comparison to other techniques.

Copy to clipboard

When it comes to turning people’s personal information into open data, concerns about privacy are perfectly valid. Putting precautions in place, such as donut geomasking, ensure that identifying data remains safe and secure.

You can use open data responsibly as well as derive the greatest possible value from it while still protecting individual privacy. To learn more about how that’s possible, contact us.

Articles on the same topic : Open data
More articles
Accelerating public sector data sharing – best practice from Australia Public Sector
Accelerating public sector data sharing – best practice from Australia

Data sharing enables public sector organizations to increase accountability, boost efficiency and meet changing stakeholder needs. Our blog shares use cases from Australia to inspire cities and municipalities around the world

The importance of urban data exchanges to building smart cities Public Sector
The importance of urban data exchanges to building smart cities

Data drives effective, well-functioning smart cities and helps build local ecosystems that bring together all stakeholders to meet the needs of the entire community. However, sharing data between stakeholders can be difficult - based on recent Gartner research we explain how urban data exchanges transform smart city data sharing.

The central role of data in delivering the Paris 2024 Olympic and Paralympic Games Company news
The central role of data in delivering the Paris 2024 Olympic and Paralympic Games

As we get closer to the start of the world's biggest sporting event, we look at the role of data in preparing for the Paris 2024 Olympic and Paralympic Games, which start on July 26th 2024.