Tales from Two Data Hunters
Introducing with two of our data hunters: Audrey and Cécile. They’ll tell us all about their jobs and even describe one of their current tasks: hunting for data on Covid-19. Here we go!
When I joined ODS last October, one of the first things I did was flip through the company’s employee directory (extremely useful for making sure you don’t screw up a first name when running into a coworker next to the coffee machine). Product team, Sales team, CSM team, HR…the directory seemed to cover all the usual basis. But then I came across something unusual: “data hunter.” I thought, “OK, I’m not really sure what that means, but it sounds pretty awesome!” And lo and behold (spoiler alert), I was right!
If you’re anything like me, then you’re probably dying to know more about this profession! In this article, I’ll introduce two of our data hunters: Audrey and Cécile. They’ll tell us all about their jobs and even describe one of their current tasks: hunting for data on Covid-19. Here we go!
The Primary Objective: Expand the Data Network
As part of the R&D team, Audrey and Cécile are mainly in charge of hunting for cross-sectional data to expand Opendatasoft’s Data Network. The Data Network contains all the public data of our customers, as well as the data that is added by our hunters. Customers can use the Data Network to cross-reference the data sets that interest them and to enhance their own content. (By the way, the Data Network is a veritable goldmine of information. It can even help you make the most of your summer…)
Step 1: Chasing the Target
The world of data is vast…How do our hunters choose their prey? “We choose according to the needs of our customers and the ODS teams,” explains Audrey. Customers are the first to request their services. They ask the team to search for data that can help them solve specific problems.
But sometimes it’s Audrey and Cécile’s coworkers who need the data: “We often receive requests from the Sales and Production teams. These requests are, of course, designed to help the company achieve its strategic goals,” adds Audrey.
Step 2: Locking Down the Prey
But where could data possibly be hiding? “Everywhere!” exclaims Cécile, with a big smile on her face. According to the hunters, the sources are quite varied, and searches sometimes feel like police investigations. “The web is vast. It’s all about finding the right keywords and checking the accuracy of the source,” Cécile explains. “The most reliable platforms are government websites. Data Gouv, the official platform of the French government, is one of our main resources. We also get data from organizations that supply their own platforms, such as the INSEE (France National Institute of Statistics and Economic Studies).”
But working with reliable sources doesn’t necessarily make our hunters immune to pitfalls… First obstacle: “Publication habits vary from country to country,” explains Cécile. “Data is often extremely local by nature,” adds Audrey. “There’s a cultural aspect that must not be overlooked.” Mexicans, for example, use a different format for their documents than the French, while Americans tend to work more with APIs than data sets.
Second obstacle: “The data we are hunting is incredibly diverse!” Audrey explains: “On the same day, we can cover topics ranging from demographics and mobility to global shark attacks. It’s not always easy, as we have to delve into subjects which we often know very little about. But this is what makes our job so exciting!”
Portrait of a perfect data hunter, according to Audrey:
To be a data hunter, you must
- Have good analytical skills (to be immersed in the data world and can discern what they can be used for)
- Be adaptable and eager to explore a variety of diverse topics (which makes this job interesting)
- Love to clean (lots of cleaning to do – slackers beware!)
- Be rigorous (to keep misinterpretations to a minimum)
- be curious and have an eye for detail
- Have a sense of humor (datasets are full of surprises!)
Step 3: Taming the Wild
Once the hunt is over, the team is in possession of several interesting and reliable data sets. The data is then imported into the Data Network. What happens next? The team then considers the best way to clean and promote its data. “At this stage, our goal is to present the data in a way that will maximize its chances of being reused,” explains Audrey. When data is presented in the right way, it is easy to add filters, cross-reference with other data, convert into graphs…or simply use. “Merely displaying data is not enough. Our job serves little purpose if no one makes use of our tamed data.”
Our hunters are thrilled when they receive questions about data sets from the Support team. “This proves that the data sets are alive and have made an impression!” exclaims Cécile.
Step 4: Continuing the Work
The job of our data hunters doesn’t end with the publication of a new data set. Each data set must be carefully monitored and updated over time. In fact, Audrey and Cécile work hard to continuously improve the quality of the data sets in the Network. “We strive to share high-quality data, even if that means sharing less,” explains Audrey.
The perfect dataset, according to Cécile:
- Is friendly. It updates itself.
- Allows for discussions.
- Can be used and reused!
A Concrete Example: Covid-19 Observatories
Cécile and Audrey just completed their biggest hunt to date: data linked to the Covid-19 crisis.
Last March, ODS decided to create Covid-19 Observatories for France, Belgium, Switzerland, Canada, and the United States. The goal was to provide and present the data in a simple way so our customers could quickly use it to their advantage by incorporating the data into their portals and communication. Audrey and Cécile played a crucial role in implementing this large-scale project. Let’s hear how it went!
Can you tell us about this project?
Audrey: As usual, we went hunting… but it was rough going because we couldn’t get a grasp on our sources. New indicators were emerging every day, while other sources were appearing and then quickly disappearing again. It was intense in the beginning: we were constantly redoing what we had already done the day before. Usually, the data we work with is stable. But in this case, the hunt lacked structure, as everyone else was working at the same time.
How did you decide to present the data as an observatory?
Cécile: We went with an observatory because this format is more effective than a table. An observatory provides an overview of the current situation. It also displays clear and decisive diagrams that keep misinterpretations to a minimum. It’s important to remember that even “objective” data can be interpreted in many different ways. A field labeled “number of patients” does not have the same meaning everywhere. Are we talking about the total number of patients? The number of tested or self-diagnosed patients? Or the number of patients at the hospital? We wanted these nuances to be clear.
Audrey: We also didn’t want to create widespread worry. We did a lot of research to find out how to present Covid-related data in a non-sensationalized manner. After all, our objective was clear: to allow our customers to make use of our data to provide a rapid response to their citizens. Nothing more, nothing less.
How has the feedback been from your customers?
Audrey: Extremely positive! I’m happy about that, because after all, this hunt was for them. We were able to anticipate their demands.
What did you take away from this experience?
Audrey: A certain amount of pride, because this project allowed us to make a difference. We may not have produced any masks (that wouldn’t have made much sense), but by setting up our observatories and our pro-bono service, we were able to use our know-how to serve the greater good.
These observatories also prove that open data can be useful to everyone. By working with an open data platform, you can quickly create tables to highlight a topic that affects us all.
Cécile: This project also allowed us to raise awareness within the company on the data hunter profession. Since several teams were involved in the creation of the observatories, we were able to explain exactly what we do and the problems we encounter on a daily basis. Our coworkers discovered the challenges that are specific to “data culture.” Data is alive, and sometimes a bit hectic!
This work on Covid-19 data also reminded me of the need to instill better practices in the data sector. How do we manage our licenses? How do we manage our metadata? How do we make our data usable? We can now provide some answers to these questions.
As you can see, our data hunters are extremely busy. They are required to work on a variety of different subjects and handle many types of requests. Their strongest weapon? Insatiable curiosity combined with relentless determination!
How can you break down silos and make data available to everyone within your organization, not just data specialists? How do you get employees to use data effectively in their everyday working lives? This article explains the key features you need on your data portal to engage users and maximize data sharing and reuse.
Access to accurate statistical information is key to the successful functioning of the global economy and for policymakers and businesses to make informed decisions around subjects that impact us all. How can institutions effectively and efficiently share their statistical data in an interoperable, scalable way to democratize access and build trust?