Big Data, the analysis of huge datasets using sophisticated algorithms, holds huge promise. One of the most-cited examples, Google Flu Trends predicts flu outbreaks based on the search terms from users in a given region. However, the quality of data used has been criticized as a key weaknesses of Big Data. The idea behind Big Data usage currently is that the datasets are so massive that the quality of the data does not matter. That assumption is now being challenged.
Since nations collect a lot of high quality data, e.g. through a census, and are increasingly opening up their databases (see, for example, the U.S. database at www.data.gov) one could ask whether governments could, and in fact, should make use of Big Data. After all, if companies can increase their efficiency based on the analysis of huge datasets, why should governments—with high quality data at their disposal—not make use of such technology? The thought is tempting.
However, states should not focus on using Big Data for at least two reasons: 1) Big Data in its current form is not very valuable and in fact potentially dangerous for policy making: 2) Big Data as used in the private sector will create a need for regulatory reform, and thus states should rather think about “rules of the game”.
The limited value of Big Data for policy-making stems from its principle that correlation, not causation, is the important question when approaching a problem. Amazon, for example, suggests products to users based on other user’s similar purchases. Google’s algorithms produce different search results based on the location of a user and his or her previous queries. In the private sector, this might be a sensible approach. Companies can use these tools to sell more products or services, though they do run the risk of annoying their customers (for example, advertisements on Facebook could potentially interest me though there is a possibility of being a nuisance). Since Big Data is interested in producing new insights, everything can be potentially correlated. In the words of Manfred Schneider, a finding could arise from the correlation between beer consumption in Europe and the frequency of vehicle accidents in Bangladesh.
Such an approach is inadequate for public policy making. First, the stakes of “being wrong” are much higher than in the private sector. Second, public policy should be based on causation, not simple correlation, as policy makers should understand what they are doing and for what reasons. In other words they should not worry so much about the “what” but rather about the “why”. After all, governments can be held accountable for their decisions by the people. Imagine that a state agency in charge of public health analyses a government-wide dataset, looking for potential correlations, and finds that people that are unemployed are also often in a worse state of health than average. What does this tell the agency? How does this finding help to draft public health policy?
The only way Big Data can have an impact in public policy making is to raise questions, not to answer them. As for the dangers of jumping to conclusions after a Big Data finding, consider this example: another agency finds that a certain area of a city has high crime rates and thus increases the cost for the police forces. Based on this finding, the agency then proposes to tax inhabitants of this area at a higher rate to account for the higher cost. Would you not agree that something that makes sense for your health insurance, such as incentivizing leading a healthy life by raising premiums for smokers, does not seem to be a good idea for public policy?
While Big Data may be of limited use for public policy, governments will still have to deal with it for different reasons. In light of major IT scandals, users and citizens are increasingly “data-aware”. As mentioned in the introduction, Big Data relies on the presence of huge datasets that today are often provided by users for free and without clear knowledge about the use of this data. This raises concerns among civil society that eventually will have to be addressed by states: who collects what data about me for what purpose and how long? Given the trans-border nature of the internet as one of the main data-gathering mediums, international fora such as the G7 and G20 seem best suited for addressing these regulatory questions. States should, however, keep in mind that Big Data governance should not be addressed by one stakeholder alone, but this discussion must include the private sector and civil society as well.
Nicolas Zahn is currently a Master candidate in International Affairs at the Graduate Institute of International and Development Studies in Geneva. He holds a B.A in Social Sciences from the University of Zurich. His main research interests include security policy and Internet governance.
This article was originally published in the Diplomatic Courier's November/December 2014 print edition.
a global affairs media network
What G20 Governments Should Know About Big Data
November 14, 2014
Big Data, the analysis of huge datasets using sophisticated algorithms, holds huge promise. One of the most-cited examples, Google Flu Trends predicts flu outbreaks based on the search terms from users in a given region. However, the quality of data used has been criticized as a key weaknesses of Big Data. The idea behind Big Data usage currently is that the datasets are so massive that the quality of the data does not matter. That assumption is now being challenged.
Since nations collect a lot of high quality data, e.g. through a census, and are increasingly opening up their databases (see, for example, the U.S. database at www.data.gov) one could ask whether governments could, and in fact, should make use of Big Data. After all, if companies can increase their efficiency based on the analysis of huge datasets, why should governments—with high quality data at their disposal—not make use of such technology? The thought is tempting.
However, states should not focus on using Big Data for at least two reasons: 1) Big Data in its current form is not very valuable and in fact potentially dangerous for policy making: 2) Big Data as used in the private sector will create a need for regulatory reform, and thus states should rather think about “rules of the game”.
The limited value of Big Data for policy-making stems from its principle that correlation, not causation, is the important question when approaching a problem. Amazon, for example, suggests products to users based on other user’s similar purchases. Google’s algorithms produce different search results based on the location of a user and his or her previous queries. In the private sector, this might be a sensible approach. Companies can use these tools to sell more products or services, though they do run the risk of annoying their customers (for example, advertisements on Facebook could potentially interest me though there is a possibility of being a nuisance). Since Big Data is interested in producing new insights, everything can be potentially correlated. In the words of Manfred Schneider, a finding could arise from the correlation between beer consumption in Europe and the frequency of vehicle accidents in Bangladesh.
Such an approach is inadequate for public policy making. First, the stakes of “being wrong” are much higher than in the private sector. Second, public policy should be based on causation, not simple correlation, as policy makers should understand what they are doing and for what reasons. In other words they should not worry so much about the “what” but rather about the “why”. After all, governments can be held accountable for their decisions by the people. Imagine that a state agency in charge of public health analyses a government-wide dataset, looking for potential correlations, and finds that people that are unemployed are also often in a worse state of health than average. What does this tell the agency? How does this finding help to draft public health policy?
The only way Big Data can have an impact in public policy making is to raise questions, not to answer them. As for the dangers of jumping to conclusions after a Big Data finding, consider this example: another agency finds that a certain area of a city has high crime rates and thus increases the cost for the police forces. Based on this finding, the agency then proposes to tax inhabitants of this area at a higher rate to account for the higher cost. Would you not agree that something that makes sense for your health insurance, such as incentivizing leading a healthy life by raising premiums for smokers, does not seem to be a good idea for public policy?
While Big Data may be of limited use for public policy, governments will still have to deal with it for different reasons. In light of major IT scandals, users and citizens are increasingly “data-aware”. As mentioned in the introduction, Big Data relies on the presence of huge datasets that today are often provided by users for free and without clear knowledge about the use of this data. This raises concerns among civil society that eventually will have to be addressed by states: who collects what data about me for what purpose and how long? Given the trans-border nature of the internet as one of the main data-gathering mediums, international fora such as the G7 and G20 seem best suited for addressing these regulatory questions. States should, however, keep in mind that Big Data governance should not be addressed by one stakeholder alone, but this discussion must include the private sector and civil society as well.
Nicolas Zahn is currently a Master candidate in International Affairs at the Graduate Institute of International and Development Studies in Geneva. He holds a B.A in Social Sciences from the University of Zurich. His main research interests include security policy and Internet governance.
This article was originally published in the Diplomatic Courier's November/December 2014 print edition.