How to accelerate the reuse of data thanks to deep search features
Searching for data shouldn’t be the equivalent of looking for a needle in a haystack. Our blog explains why you need natural language search within your data platform if you are to increase usage and drive data democratization.
Without effective search engines, the web would simply be an enormous mass of disorganized information. People would only be able to visit sites that they already knew about, severely limiting their ability to discover new information. Organizations would find it hard to attract new customers or visitors who had not heard about them before.
Thankfully, today’s web is much more user-friendly. All people need to do is either type or speak their query into a search engine, phone or smart assistant to get fast, accurate results. And because search companies such as Google are continuously looking to improve their natural language algorithms to better reflect the intent behind a search, results are always getting better.
The importance of search on data portals
When it comes to internal and external data portals, users have the same need to find relevant datasets or information as when they are searching on the web. If they can’t quickly find what they are looking for they are likely to give up on their query or switch to other ways to find it.
- On public open data portals it will undermine trust and stop people coming back to use them in the future
- For internal self-service data portals employees will waste valuable time searching, leading to inefficiency. Alternatively, they won’t bother incorporating new, relevant datasets into decision making or their daily working lives, undermining data cultures and stopping organizations becoming data-driven
- Users visiting data services platforms won’t benefit from relevant data services, potentially hitting the revenues of providers.
All of this means it is imperative that data portals make it easy for visitors to find information successfully. This means overcoming multiple challenges:
- They have a wide range of users, particularly on open data portals, with differing levels of knowledge about data. For example an energy company’s portal might have visitors from local government, developers, generators, researchers and the general public. Search has to be able to deliver the right results to all of these audiences by understanding their queries.
- People will use different words to describe datasets. This is particularly true on public open data portals, where citizens might search using completely different words to those that internal data administrators use. Some portals might even include data in multiple languages, further complicating navigation. Just like a search engine, portals need to be able to find the right results, whatever terms (or language) is used.
- How data is described, and the metadata used, can be complex and technical. That means non-specialists can find it hard to understand exactly what a dataset means or the information it contains. If they are unsure whether it is what they are looking for, they may well then not access, download or reuse it.
- While data portals clearly contain less information than the web itself, they do have an increasing amount of data available on them. For example, Log In to North Carolina, the open data portal of the North Carolina Office of State Budget Management (OSBM) has datasets that go back to 1969, covering areas as diverse as population (including census data), labor force, education and agriculture, supplied by 20 state departments. For many portals simply scrolling through the available datasets is just too time-consuming for users to attempt.
The advantages of Natural Language Search
All of this makes navigation and search a key part of deploying a successful data portal and driving data democratization. Making it easy for users to find information is critical, and that requires natural language search that uses the same techniques as commercial search engines such as Google.
What is Natural Language Search?
As the term implies, natural language search is a search carried out in everyday language, just as if you were having a conversation with someone. The search engine understands the whole sentence and uses this to find the best answer to the query. Algorithms learn from user satisfaction and therefore continually improve the results they provide.
The alternative is keyword-based search. This tries to break the query down into its most important terms, removing connecting words such as “how” and “the”. It then matches these against what is in its database or knowledge base. However, it may not be able to find an exact match (or may find thousands of results) and cannot cope with different terms that refer to the same concept, unless it has been specially programmed.
On a data portal, natural language search enables organizations to:
- Deliver relevant results faster to users, improving efficiency, gaining time and giving users exact matches to their intent
- Improve the user experience as searchers seamlessly get to access the data they need.
- Provide greater confidence to users that they are getting the right results
This all makes it more likely people will return to data portals and incorporate datasets in their working and private lives. It therefore boosts data democratization, enabling greater transparency, performance and innovation.
Opendatasoft and Natural Language Search
At Opendatasoft we are committed to making it as easy as possible for everyone to benefit from improved access to data, making it easy for all to experience data in our daily lives. That’s why our platform incorporates:
- Straightforward navigation that enables users to easily move through a data catalog manually to find the datasets they want. You can even build portals with pages and subpages covering specific themes to aid navigation outside of the data catalog itself.
- The ability to automatically add detailed metadata to help users find specific datasets
- In-built filtering that lets users narrow down their searches. For example, French energy company Enedis enables visitors to filter geographically and by theme (such as mobility, energy and operations), with the ability to combine multiple filters.
- Powerful natural language search capabilities that understand the entire search and look for matches within the metadata from all the available datasets (title, description, keywords etc.) Importantly, the natural language search engine also gives insight into what users are looking for – including flagging searches that had no matching dataset. This helps plan which new datasets should be published to meet user requirements.
- Advanced searching using our query language. For technical users the Opendatasoft query language makes it possible to express complex boolean conditions as a filtering context. These can be full-text, using boolean operators or through per-field filtering.
Users expect finding the right dataset to be as simple as performing a Google search. That’s why the platform supporting your data sharing portal has to deliver seamless and immediate results, using natural language searching if it is to drive usage and data democratization.
Organizations face an unprecedented explosion in data volumes. However, this information is scattered across the business, in multiple formats, making it difficult to organize, analyze and share. How can organizations gain control over their data and use it effectively?
To meet their need to increase data consumption and maximize value, businesses are increasingly implementing centralized, self-service data product marketplaces. We look at what they are, and how they deliver value for employees, the business, data teams and data leaders.
How successful are governments at sharing their data with citizens and businesses? The latest Open Data Maturity report provides an overview of progress across Europe, and highlights the importance of improving data portals and measuring impact to future success