An Open Response to the Civic Analytics Network
Civic Analytics Network (CAN), a consortium of Chief Data Officers (CDOs) and analytics principals in large US cities, recently published an open letter to the open data community. The purpose of the letter is to open a dialogue between open data vendors, open data producers, and open data consumers.
In its letter, the Civic Analytics Network expresses several core values and goals central to its mission: the desire to help make governments “use data to be more efficient, innovative, and transparent.”
Opendatasoft shares this philosophy and belief in open data. When CAN opened this discussion within this community, we were extremely pleased for the opportunity to add our perspective as a vendor to work together to expand open data’s audience, grow its community, and ensure its long-term viability.
Here is our response to each of the individual points mentioned in that letter:
1. Improve Accessibility and Usability to Engage a Wider Audience
In an open data and accessibility article, we highlighted the issues that many users face when accessing and navigating web content that does not take into account accessibility and universal design issues, often referred to as “A11Y” principals. We are convinced that accessibility should be baked into the DNA of open data design. Our role as a vendor is precisely to provide the best possible accessibility and usability for our clients’ open data portals. Because of our “universal design first” philosophy, Opendatasoft puts considerable effort into ensuring all of our user interfaces, widgets and interactive charts and maps adhere to standards.
In addition, we know that engaging with a wide audience is not only the main objective of any data producer, but most of the time the reason behind a budget for an open data project. We dream of seeing more dialogues about data and their usage between producers and users (citizens, developers). To this end, we have published articles listing the wide spectrum of people open data should address, and some on the distribution of the popularity of datasets. Our goal in writing on such subjects is to provide elements for understanding. This is why we also allow data producers to collect data by themselves, through a dataset compiling every users’ interaction with the platform. Data producers know their target users and segmentation better than anybody. We provide the tools to measure as much as possible so that you have full autonomy to do so.
2. Move Away from a Single Dataset Centric View
This is indeed a real weakness of the open data world. Some vendors have made the choice to link the data and do not have a dataset centric view. But it implies some work on the data that is still costly and painful. The other choice, keeping a dataset centric view with tabular data, mirrors best the kind of data producers want to publish, though we know how heartbreaking it is to unlink a relational database to open it. The alternative we have chosen is allowing to join datasets within the platform. The plus side is that it allows Chief Data Officers to join their existing data with open datasets from the whole world. Hence they become more efficient and they enrich their own catalog. Thinking in terms of networks rather than data sets in silos, we believe, will make open data viable in the long run.
3. Treat Geospatial Data as a First Class Data Type
Geospatial data must indeed be a top priority for the open data community. Simply because most of the things that matters to citizens are brick and mortar. On the producer side it implies being able to publish a wide range of Geographic Information System (GIS) formats. We try to go a bit further than that by allowing them to plug themselves directly to their GIS system to keep the data in sync. On the consumption side, every open data user should be able to create his or her own map easily from the data and create maps from different datasets or sources. Once again a network of geospatial data will be key in letting smart city topics emerge. We also think that this addresses a key accessibility issue of Geospatial data – GIS systems can be very complex for many users, and by opening these data on a more user-friendly interface, more people can become reusers of this extremely valuable information.
4. Improve Management and Usability of Metadata
This one is a bit tricky and we have to say we do not yet have a determinate vision on the topic. There is a compromise to find between giving data producers full autonomy in their definition of metadata schemes and allowing open data consumers to search or analyze data in the same way between portals. Confronted with that problem ourselves, we now allow for the creation of ‘admin metadata’ in addition to the basic set, to at the very least help admins in catalog management. We believe it is a good path to allow best practices to emerge from its usage. When the ecosystem is mature enough, it would make sense to allow full customization of the scheme. Since the U.S. open data community is in advance, maybe it is time for us to reconsider that issue, but it still seems contradictory to development of a useful data network.
5. Decrease the Cost and Work Required to Publish Data
We do see automation as the critical challenge for open data, because manpower will not grow at the same pace as the demand for open datasets. Those automation tools must be built into the open data platform directly. This goes from data sourcing to publishing. We don’t think that someone on a CDO’s team should have to do the same work several times as is still the case today. We build tools like harvesters or a whole Extract, Transform and Load (ETL) engine directly in the portal. It helps avoid doing the same thing twice, but we believe that in the future the network of people in charge of open data portals will take care of it itself. Indeed, we could see the emergence of a data processing pipeline marketplace. It should help smaller cities or any newcomer to bootstrap and become efficient as quickly as possible. That would make open data more viable and broaden the audience.
6. Introduce Revision History
With the rapid development of open data, we have seen the growing concern about data quality and traceability. As any CDO knows, the battle on data quality is never-ending. We acknowledge that most open data vendors, including us, have emphasized quantity until recently. We have taken note of the shift in focus that is happening. It’s key if we want to see more citizen involvement via crowdsourced data. Let’s build these tools!
7. Improve Management of Large Datasets
We are seeing more and more large datasets being opened, and the development of Internet of Things (IoT) and smart cities projects will only accelerate this trend. In the data world, the value often resides in the freshness of the data. Our platform is able to handle hundreds of millions of records at a time, as our origins are in the Smart City and sensors ecosystems. But developing the capabilities to process more records than this, we believe, may prove to be a challenge for anyone purely in the open data sphere. This is especially the case when trying to display these data on a searchable and interactive map available to any kind of users in the browser. We are making progress along with the whole web development world. Once again future solutions may reside in a more efficient data network.
8. Set Clear Transparent Pricing Based on Memory, Not Number of Datasets
We fully agree on this point and it led us to update our pricing a few months ago so that it now depends only on the data size and the usage. We have found it to be the best way to share the same incentive than Chief Data Officers. When the data are used by a wide audience, everybody wins, if not, there is no surprise.
Thank You
Thank you all once again for your letter. We wanted to take the time to add our voice into the discussion to contribute to the conversation. As a vendor, it is our primary responsibility to provide open data producers and re-users with the tools they need to make open data happen. We also must take steps to engage, broaden, and empower the growing open data community. We are excited to be proactive in this open data ecosystem to create the conditions of an open discussion, whether through workshops, web call, or face to face meetings. We know that the letter from the CAN was originally targeted at the open data community in the United States, but we do hope the dialogue continues and extends worldwide. We think that including everyone in the discussion will help move the movement forward to the next step, and allow everybody to focus on real impact.
Data drives effective, well-functioning smart cities and helps build local ecosystems that bring together all stakeholders to meet the needs of the entire community. However, sharing data between stakeholders can be difficult - based on recent Gartner research we explain how urban data exchanges transform smart city data sharing.