High-resolution gridded population estimates are increasingly used in research and for applications that require fine spatial detail and frequent updates. Since the first globally available Gridded Population of the World (GPW) introduced in 1995, several global or regional gridded population datasets have been produced, mirroring the growing role of remotely sensed data in socio-economic research. These data have proven valuable in many contexts, such as disease mapping and disaster assessments, and supporting survey sampling when census data are outdated or unavailable. However, in our new Population and Development Review article, “How Well Do Gridded Population Estimates Proxy for Actual Population Changes? Evidence From China,” evidence from four of the most popular gridded datasets illustrates that gridded estimates are less accurate in capturing population change over time—especially at local scales.

China provides a challenging test for the predictive accuracy of gridded population data. The massive scale of internal migration has been temporally concentrated due to previous restrictions on internal migration based on the hukou household registration system. In China’s most recent census, in 2020, 376 million people lived in a different prefecture than their place of registration, with this ‘floating’ population comprised of 125 million who moved inter-provincially and 251 who moved million intra-provincially. In contrast, in the 2010 census there were only 155 million inter-prefectural migrants. Thus, the (internal) migrant stock is growing almost 15-times faster than China’s rate of natural population increase. Moreover, the hotspots for this migration have changed sharply over the last decade, with less migration-driven population growth in the biggest cities and more in smaller cities.

Using China’s census data from 2000, 2010, and 2020 as our benchmark, we studied predictive accuracy of four sets of gridded estimates (GPW, Global Human Settlement Layer – GHS-POP, LandScan, and WorldPop) at three spatial scales: counties/districts (n=2814), prefectural cities (n=297), and provinces (n=31). While the gridded estimates understated spatial inequality in the distribution of population, especially in 2020, they still provided excellent predictive accuracy—the gridded estimates predict over 95% of the variation in the location of census populations regardless of whether we looked at the provincial level or at the more local levels. But for changes over time, accuracy was far lower and even lower still for changes in the most recent decade from 2010 to 2020.

For example, at county level, gridded estimates predicted less than 20% of inter-census change, for results weighted by population (and R² values averaged just over 0.1 without population-weighting). At the prefectural city level, which aggregates about ten counties or districts per prefecture, the R² values were about 0.2 (unweighted) or 0.4 (weighted), and were somewhat higher, at about 0.6, at province level. So the weakest predictive power is at the local level, which has few alternative data sources (e.g. sample surveys are rarely representative for counties). In contrast, in years between a census one may still estimate provincial population growth from surveys, so better predictions from gridded estimates at this more aggregated level are not really where they are needed.

We also study the spatial dimensions of economic inequality to provide an example of how this temporal inaccuracy could cause an analysis to go astray when changes in the gridded population estimates are used as a proxy for changes in actual population. If census population data from 2000, 2010 and 2020 are combined with county-level GDP data, it shows that there was a rise in spatial inequality from 2000 to 2010, which then reversed between 2010 and 2020 with the Gini index returning to the initial value. But if the GPW gridded estimates are used, where these had the best predictive performance of the four datasets we studied, an ongoing rise in spatial inequality appears, with the Gini index always higher than in the previous census year. The reason for the gap seems to be that the gridded estimates missed the recent migration-driven shift in population growth toward smaller, less affluent cities.

Study beyond China is needed to see if the pattern of weak predictive performance for time-series changes holds elsewhere. Many gridded population estimates use remote sensing inputs, such as satellite-detected nighttime lights, and the gap between predictive performance for time-series changes versus cross-sectional spatial patterns has also been observed in the literature that evaluates nighttime lights as a proxy for local economic activity. Thus, it is possible that prediction errors when studying temporal changes have a common source for gridded population estimates and for nighttime lights data.

For now, gridded datasets remain best for mapping population spatial distributions, not for tracking rapid local demographic change. Researchers and policymakers should validate them carefully before using them as stand-ins for actual population changes.

About the authors

Xiaoxuan Zhang, College of Geographical Sciences, Henan University, China

John Gibson, Department of Economics, University of Waikato, New Zealand