Skip to main content

Geography and population density: modeling order considerations

Learn how modeling order between geography and regional variables affects interpretation and model performance

Priscille De Mascarel avatar
Written by Priscille De Mascarel
Updated over a week ago

When modeling with both geographic smoothing and regional variables (e.g., population density, precipitation history, number of schools, etc.), which should come first? While both approaches are statistically valid, they lead to different interpretations and can impact model performance.

For simplicity, we use population density, a common regional variable that captures urban versus rural patterns, as the running example in this section. The same considerations apply to other highly predictive regional variables.

Geography first (Akur8's standard recommendation)

Akur8's standard recommendation follows this order:

  1. Model without geography or external regional variables

  2. Enrich with geography

  3. Add more variables with external regional variables

In this approach, the geographic smoother captures signals first—geography acts as a bucket for everything spatial, known or unknown. When population density is then added as an additional variable to the geographic model, the model extracts from it only the part of the signal which is proper to it, focusing on local effects rather than global effects.

Advantages: This approach captures all spatial signals upfront, ensuring no local variation is initially ignored and that the model captures even subtle spatial risk patterns. By fitting geography first, population density then focuses on local effects rather than broad, global patterns, capturing only the part of the signal that is specific to it.

Limitations: When population density is highly predictive and geographic smoothing is applied first, the smoother attempts to capture urban versus rural patterns that are already available—essentially re-discovering known information. This can be statistically inefficient.

Population density first (alternative approach)

An alternative order that can be effective:

  1. Model with population density in the initial model (without geo or other external regional variables)

  2. Enrich with geography

  3. Add additional external regional variables

Advantages: It aligns with standard GLM practice: include known predictive factors explicitly, then add a geographic term to capture remaining spatial structure. This approach allows population density to absorb the broad urban-versus-rural component of geographic signal, letting the geographic smoothing step identify localized hotspots and coolspots while controlling for this effect. This provides better interpretability—you can identify areas that are unusually good or bad risks after accounting for their density level.

Limitations: Population density may absorb spatial variation that reflects meaningful place-specific risk differences rather than urbanization alone, preventing geography from capturing these patterns. The density coefficient itself can also become confounded with spatially correlated factors (e.g., coastal proximity), reducing interpretability.

Note: Even after controlling for density, the geographic component still captures unmodeled factors, meaning the seemingly “clean” geography may include spurious patterns alongside genuine location effects.

Market-specific considerations

The optimal approach often depends on regional rating practices:

North American markets typically use broad rating territories. The geography-first approach is generally preferable to ensure comprehensive spatial signal capture, producing a map of variation you understand well.

European markets work with highly granular geographic rating at the postal code level, differentiating risk according to granular ZIP codes. In these contexts, the density-first approach can be effective—a variable you understand well is fitted first, and the remaining "clean" geographical signal picks up factors like weather patterns.

We recommend starting with the standard geography-first approach, then testing the reverse order if:

  • Population density is highly predictive in your dataset

  • You need to clearly distinguish between urbanization effects and location-specific risk patterns

  • You're working in a market with highly granular geographic rating factors

Did this answer your question?