The data commons: Taking big data global

Taking big data global
Spread the love

In March 2018, members of the International Monetary Fund’s (IMF’s) executive board gave their blessing to a dramatic overhaul of the way the organisation gathers, governs and uses data. The Overarching strategy on data and statistics, the first of its kind, lays out how the fund plans to improve the quality of data, boost the ease with which it can be shared, and start making greater use of innovations in big data and artificial intelligence (AI).

Key to the strategy is the “global data commons” – an ambitious, cloud‑based platform for gathering large quantities of data from IMF members. The aim is to bring all of the data together in one place in an readily comparable format, making use of common data standards and methodologies. Researchers, journalists and members of the public should no longer be required to trawl through an array of often-labyrinthine websites belonging to national statistics offices and instead be able to access all of the data through a single portal.

There is a long way to go, however, before that stage is reached. Louis‑Marc Ducharme, director of the IMF’s statistics department, says the data strategy is expected to take around five years to implement, by which time the fund may well find it has to draw up a new one. “This is a very dynamic issue, and in five years the landscape may be completely different,” he says.

The new strategy is designed to respond to that volatility, and is built on six priorities: the fund should be more agile in how it determines data needs, make use of innovative methods such as big data, streamline data‑sharing internally, improve the comparability of statistics, tackle weaknesses in official data, and construct the global data commons.

The strategy owes its genesis to a 2016 report by the IMF’s Independent Evaluation Office (IEO). “In general, the IMF has been able to rely on a large amount of data of acceptable quality,” the report notes. “Nonetheless, problems with data or data practices have, at times, adversely affected the IMF’s surveillance and lending activities. In the aftermath of crises, data has often been put at the forefront, prompting important changes in global initiatives and in the fund’s approach to data. Yet, once these crises subside, data issues are usually viewed as mere support activities to the fund’s strategic operations.”

The IMF seems to have taken the IEO’s criticisms on board. “We are going to go from a very fragmented universe of policy frameworks to something much more holistic,” says Ducharme. “The new data governance will focus on standardisation, harmonisation and automation of data, so that the burden on the country teams will be less, and there will be an easier way to have comparable data so that policy can be better made.”

The fund hopes to build both the capacity of its members to make use of big data and develop its own capabilities, though Ducharme says it sees itself more as a “broker” or “enabler” of big data than a producer in its own right. He says the fund has traditionally not been “very active” in the field of big data, though it is “catching up”. More traditional data has been a core part of the IMF’s mandate since it was founded at the 1944 Bretton Woods conference.

“At the inception of the fund, [John Maynard] Keynes stated that data would be an important tool for policymaking,” says Ducharme. Keynes’s stance seems far from radical in the data‑saturated world of the 21st century, but in 1945 the use of statistics such as GDP was still in its infancy, and Keynes – for better or worse – was one of the driving forces behind GDP’s establishment as a core component of how economists think about the macroeconomy.1

However, even GDP could be due for an update. Ducharme envisages big data playing a key role in how IMF members gather data, and suggests it could come to be an important part of the process of compiling official statistics, particularly where countries have more modest resources.

In some fragile states, tourism or agriculture are key to the economy, so mismeasuring these parts of output can lead to weak GDP statistics. But it can also be expensive and time-consuming to properly survey agricultural output – a country must assemble a team, design questionnaires, hire interviewers, and compile, edit and validate the data. In such cases, new forms of data could cut costs considerably, Ducharme suggests. He notes, for example, that it is possible to estimate agriculture production “quite accurately” using satellite imaging. Furthermore, information on construction in a country can be estimated through crowdsourcing by asking people to take pictures on their mobile phones.

Similarly, the fund has already signed a handful of memoranda of understanding to tap into various sources of big data. Ducharme mentions scanner data, a way of collecting data on prices directly from the point of sale.

Progress on big data has largely been concentrated in more advanced economies so far, and Ducharme thinks the IMF can be a mechanism to help expertise filter out more broadly. “We want to capitalise on this experience across the more advanced member countries, to share this experience with our less advanced member countries so we can create a situation where they would leapfrog the technological paradigm,” he says.

In the fund’s strategy document, executive directors say they “see merit” in big data as a tool for detecting risks and complementing the compilation of official statistics. “Directors noted that, in the absence of internationally accepted standards, exercising quality assurance will be necessary to ensure the sound use of big data, taking into account reliability and privacy concerns. In this connection, they recognised that the fund could play a key role in facilitating peer learning across the membership.”

 

Internal operations

The fund is also considering how it can use big data in its own operations. An area that seems ripe for a machine-learning algorithm is the vast collection of documents the IMF maintains on both paper and on its servers as PDFs. Every mission produces new information, and each report is rich with policy recommendations and information on the economic situation.

Ducharme says text mining seems a promising option for dealing with the sheer volume of information. “Here you can use natural language processing to build text-mining algorithms, to drill down in those technical assistance reports and compile information.” The alternative – trawling through stacks of documents in a library – would be almost “humanly impossible”, he adds. Text mining might allow researchers to quickly identify similar situations, helping them identify what went well, or badly, in past cases.

Furthermore, as with any large data-gathering organisation, IMF staffers spend a lot of time cleaning and validating data. Ducharme believes AI could play a role here as well. “Data quality has always been a very important priority for the fund. Here we can use machine-learning algorithms to develop validation checks that are independent and that could easily pick up the outliers of data series and correct them automatically. That means greater potential to explore more series independently.”

When it comes to data quality, the first step is the upstreaming stage – countries gathering the required data to ensure their methods are as good as they can be. The IMF provides technical assistance to those that sign up to its various data standards and, in return, countries commit to providing data of sufficient quality at set intervals. “We teach our member countries what are the right definitions, what are the right concepts, and provide technical assistance to gather data to construct indicators like GDP,” says Ducharme.

The second step is cleaning and validation. As Ducharme says, AI should go some way towards helping with that. So too should the global data commons.

 

Common goals

“We are building a global data commons where we have a lot of series from our membership gathered together using cloud technology,” says Ducharme. “We will be able to confront the data – the best way to improve data is to confront it. It is like having a mirror of yourself, you see where the things are that need to be corrected.”

The data commons will link national statistics websites in the cloud, translating everything into the common Statistical Data and Metadata eXchange (SDMX) standard, which allows straightforward machine‑to‑machine data sharing. From there, anyone accessing the data can download it in whichever format is most convenient.

As the fund’s strategy document notes, the data commons should benefit members and staff in conducting surveillance. “While all members will benefit from the global data commons, there will be even greater gains for low‑income countries, and small and fragile states where streamlining data requests would ease capacity constraints while also allowing the fund to conduct real‑time surveillance beyond the time of the Article IV consultation,” it says. With sizeable potential benefits in mind, the document calls for a redoubling of efforts to sign more members up to the fund’s data standards.

The IMF is working with the African Development Bank (AfDB) to build the platform, along the lines of its “Africa Information Highway”, which links all countries on the continent along with 16 regional organisations. The AfDB is providing funding to support its lower‑income members in establishing the platform, although Ducharme says the costs are relatively low. The IMF is providing technical support, for instance, with converting existing data‑handling methods such as Excel spreadsheets into SDMX. “We have already started doing this with our general data dissemination system, which is our lower standard, and we intend to extend this to all our data standards, which will cover more than 100 countries,” Ducharme says.

At the same time, the IMF is upgrading its internal systems in an effort to make data sharing within the organisation more straightforward. It is investing in a new data platform that will make it easier for economists to search for different data series, while simultaneously streamlining and updating its databases to ensure data is comparable.

 

Sharpening skills

The IMF already has the expertise to help countries transfer to the global data commons, but “for developing our capacity in enabling countries to use big data, we will need different skill sets,” says Ducharme. A first step in that direction is boosting the skills of one of the divisions in the statistics department so the team is able to handle the unstructured, messy data that falls under the heading of big data. The fund will also be hiring externally.

“We are going to need more data scientists – people who have the skill to dive into very large unstructured data,” says Ducharme. “In the past, our people dealt with very big, but very structured, datasets.”

He notes that data scientists are in high demand but, as a data broker, the IMF will need a relatively small number of specialists and the “upskilling” process should also help ensure there are enough experts at the fund’s disposal. “We’re confident we’re going to be able to build a team,” he says.

The overall strategy is designed to be cost-neutral in the medium term, and relatively low cost in the short term, with many of the necessary resources capable of being reallocated from elsewhere. The IMF says it expects “transitional” costs of $3.9–6.5 million over the next three years, and “structural” costs of $2.6–4.3 million. By way of comparison, total operating expenses were around 1.1 billion SDR($1.6 billion) in the 2016/17 fiscal year, the latest for which annual accounts are available.

“Over the medium term, the decline in time spent on data collection is expected to generate savings to support the expansion of the global data commons,” the strategy document says. “Eventual actions to address the broader challenges of the digital world, beyond the scope of the strategy document,1 will emerge as thinking in this area is further developed.”

Clearly there is a lot of work to do in the years ahead and, as Ducharme notes, the problems inherent in the digital age will most likely have shifted by then anyway, amid rapid advances in computing power, AIand big data techniques. “It is very challenging, I can tell you, but very exciting,” he says.

Louis Marc Ducharme – IMF

Louis Marc Ducharme was appointed director of the International Monetary Fund’s (IMF’s) statistics department in June 2013, prior to which he spent 30 years at Statistics Canada where he held various positions in the areas of economic statistics, including as assistant chief statistician. During his career, he has provided extensive technical assistance to a number of Latin American countries on price indexes and the operational organisation of statistical activities. Between 2004 and 2010, Ducharme was chair of the UN Voorburg Group on Service Statistics and contributed to the development of the vision and work plan for the development of international standards for statistics and services. He also taught macroeconomics at the Graduate School of Public and International Affairs at the University of Ottawa. Ducharme has a D.Phil in economics and science policy from the University of Sussex in the UK, and master’s and bachelor’s degrees in economics from the University of Montreal in Canada.

Read More..

One Comment on “The data commons: Taking big data global”

  1. Hey! Someone in my Myspace group shared this site with us so I came to check it out.
    I’m definitely enjoying the information. I’m book-marking and will be tweeting
    this to my followers! Fantastic blog and wonderful style and design.

Leave a Reply

Your email address will not be published. Required fields are marked *