How is Data Changing the Way We Research?
Access to data is changing the way we work. Discover the fourth interview of our monthly series looking to capture the power of data at work. We interviewed urban sociologist Tommaso Vitale, Tenured Professor at Sciences Po to find out more.
Data quality and data sources have changed dramatically in the last few decades, and the impact of this transformation is far beyond the data science field – academic researchers who rely on data for their studies also have to adapt to the new generation of data. What are the new challenges and opportunities?
We interviewed Tommaso Vitale, associate professor at Sciences Po, permanent researcher of the Center of European Studies (CEE), Scientific Head of the Master program “Governing the Large Metropolis” (Sciences Po Urban School), and an urban sociologist, to find out about how data transformed the way he works.
What is your profession?
I’m a sociologist, working in terms of urban sociology especially in cities. I look at the relationship between social policies, service delivery, and the position of ethnic minorities at the urban and metropolitan level. I try to discover the relevant relationships between social citizenship and services, like fresh food delivery, housing, the welfare of the people, and political citizenship. I study empowerment and the capacity to participate in the policy-making processes. I am also interested in ethnic minorities as well as the urban poor’s capacity to represent their interest in cities.
You can read about my work on urban poverty and shantytowns here, which was produced in collaboration with computational sociologists, or on the scientometrics of fast growing empirical research on minorities.
How do you incorporate data in your day-to-day professional activities?
For me, what is really important is to connect different processes, at micro, meso and macro level. I need to find comprehensive data that avoid generalization from single cases. Rather than case study research, I try to look at the big picture. Maybe I will lose some details, but I want to see the general dynamics.
Data has changed enormously in the last few decades. How has this transformation impacted your research?
During my professional life, the data sources that I have used, as well as the types of data have changed enormously. I started working with administrative data, which often was extremely bad and produced in irregular forms by public authorities at the urban level. Then my research moved towards using survey data, where I designed representative samples or contributed to define questions for surveys made by national institutes of statistics. Later on, it entered the revolution of panel data, so to say, data coming from surveys repeated to the same samples in time, providing more rich evolution in the history, in the life history of the people, and their major events. I have seen the dramatically improved data quality and how it transformed the way we do research.
Then in the last 10 years, we started having a new generation of administrative data, which were more quality-based with automatic updates, driving the web 2.0. Through automation, improvement of data production, smart programs, we are able to better understand basic dimensions of service delivery in cities, their location, evolutionary life cycle(s), and inequalities contrasted or reinforced. Concerning the research on urban poverty, and policies to contrast urban poverty, we have been able to add data sources on fundamental human development issues in terms of energy consumption, public transportation, localization of ethnic spaces, information about the sociability of people, homophily/segregation, and their relation to mobility.
In many ways, we are able to map forms of severe deprivation of the very poor, understanding inequalities in service delivery and provision. All this new generation of data was extremely extremely exciting, productive, but difficult to manage.
What are the biggest challenges in using the new generation of data?
The data can be very dependent on the willingness of private companies and sometimes public organizations. Some might incorporate certain variables, others might not, and they are difficult to use in time and space comparison. Many databases might have new rules, new protocols, new procedures, new scripts every six month.
Service delivery and utility companies can produce lots of data, which I love to work and play with, but they are not very user-friendly for researchers. It is fantastic that they provide so much new data, automatically updated, but they have sometimes poor variables and are always changing, with different names, scripts, and practical procedures. That was a level of complexity that I was not used to.
What I am used to is to have large numbers of variables, thousands of cases, but with exactly the same label, exactly the same scripts, to have a universalistic access to data for researchers. Now with new ways of data production, there are new constraints, but also new opportunities.
What kind of skills do you use to incorporate data in your work?
It requires many connections with the new disciplines. Many colleagues started working with people from information science. It is not necessarily extremely advanced computational methods, but requires a smart use of data sources. It is an important skill to know how to manage data and data sources.
Because of this, I also had to change not only the way I research, but also the way I teach. This is where teaching became more and more about data management, how to clean data, how to cope with the changing data sources environment. It is much less on programming and data analysis, but much more on teaching students on learning how find and manage data, as well as how to reduce time in coping with data sources.
Looking into the future, what kind of impact do you think data can have on government innovation and public policy?
Before data, there was a great cleavage between researchers and policy makers. Policy makers design what they want, they implement what they have designed, then arrive researchers who measure outcomes. And advocacy coalitions used evaluations to mobilise for policy change or continuity. Sometimes, policymakers redesign policies, or inertia prevails. This evolutionary approach frames researchers as a part of the political process. And it was a platonic idea, but it never works in reality. Because today, much more challenges are less related to the policy design, but related to the policy implementation. The old approach of policy design became problematic because implementation was really the source of policy problems. And the solution to this was to better design implementation.
What data allows us to do is to move away from the 70s pragmatist evolutionary approach to learning to co-production between policy makers and researchers. It allows us to monitor where there is a problem in implementation, then to dedicate resources to monitor better and to understand what is happening there.
However the new solution is not perfect – because in-depth monitoring of implementation processes can be the source of many negative perverse effects! It could be too time consuming, with huge impact of the daily life of civil servants, in many delicate policy sectors. I invite you to learn about the work done by my colleague Patrick Le Galès. He studied how much the monitoring of health policy reform has affected the British public health system. The result is that it greatly impacted the working time of nurses and physicians. Professionals were engaged in monitoring and monitoring, endlessly filling forms to prove what they were doing, and this heavily reduced their working time as medical staff. New data sources need to help improving knowledge of policy implementation without distracting human resources from their main tasks, and reducing perverse effect of time-consuming “filling the form” nightmares.
From Professor Vitale’s experience as a researcher and a teacher, we can certainly see the positive impact of improved data quality and data sources. At the same time, the world of data production has a long way to go – the plethora of data, without standardized procedures and stable environments, can create many challenges for data management.
Private companies and public authorities can take advantage of Opendatasoft’s one-stop shop to create their open data portal, which provides a user-friendly interface for the public, from data novice to data scientists alike.
To meet their need to increase data consumption and maximize value, businesses are increasingly implementing centralized, self-service data product marketplaces. We look at what they are, and how they deliver value for employees, the business, data teams and data leaders.
How successful are governments at sharing their data with citizens and businesses? The latest Open Data Maturity report provides an overview of progress across Europe, and highlights the importance of improving data portals and measuring impact to future success
In an increasingly data-driven world, understanding the differences between data, metadata, data assets, and data products is essential to maximizing their potential. This is because these interrelated yet distinct concepts each play a key role in driving digital transformation by facilitating data sharing and consumption at scale.