Modelling hierarchies

In a discipline otherwise built upon the tenets of objectivity, mathematical modellers are a bit of an anomaly. Most, including myself, will admit that mathematical modelling is as much of an art as it is a science. Art, however, still relies on a certain order. Whether or not they do so consciously, mathematical modellers often follow some hierarchy of reasoning to aid in the development of their models. I’d like to highlight two of those hierarchies here. 

The first is a hierarchy of data analysis by Jeffery Leek and Roger Peng, published in the 20th March 2015 edition of Science. It’s worth reading the full article (it’s brief), or at least glancing at the flow chart that offers a dichotomous key for analysing data. For those who cannot access the article, I’ll summarise the approach here (with a bit of interpretation, so that makes this an Exploratory Analysis!): 

  • A summary of data without interpretation is a descriptive analysis.
  • Add in some interpretation, and you get an exploratory analysis.
  • If the interpretation quantifies how the data generalises to new samples, it’s an inferential analysis.
  • Speculation on how the data might apply to new individuals makes it a predictive analysis. 
  • Quantifying how a change in one measurement affects another, without regard to precisely how the change takes place, gives a causal analysis.
  • And, quantifying the process by which a change in one measurement affects another constitutes a mechanistic analysis. 

These categories aren’t totally disjoint, nor do they necessarily flow linearly one from the other. While this list places analytical strategies loosely in order of level of inference, it’s not a ‘ladder' that should be climbed at every opportunity. The available data, and the central question to be answered, will indicate which type of analysis is most appropriate. 

The second hierarchy relates specifically to spatial data analysis. It nests within the mechanistic analysis step from above. These ideas are from Prof. Aaron King, drawn from a presentation he gave in September 2015 during a workshop on human mobility models. All (mis-)interpretations of his words are my own. Given data of the format

y1(t1) y2(t1) … yn(t1)

y1(t2) y2(t2) … yn(t2)

.

.

.

y1(tk) y2(tk) … yn(tk) 

at n spatial locations and k time points, how do we characterise it? He proposed building successive models according to the following criteria, and stopping once a step up in the hierarchy yields no better description of the data:

  • independent dynamics, distinct parameters (each location follows its own rules)
  • independent dynamics, shared parameters (locations share the same rules, but otherwise don’t communicate)
  • independent dynamics, shared parameters, spatial covariates (the rules a location follows depend on its location in space)
  • independent dynamics, random effects (same as above, but we account for unobserved factors by allowing some random variation in those rules)
  • full spatio-temporal model (locations directly influence one another)

The guiding principle behind this hierarchy is embeddedness in real space. At the first level, the locations might as well be completely disconnected. At the second, we recognise that they live in the same world, but claim that their spatial position doesn’t otherwise matter. At the third, we begin to account for spatial variations. At the fourth, we reluctantly admit that we don’t know everything about the space in which the locations live, so we allow the model to vary some, and therefore to clean up the messes where our powers of explanation have fallen short (did I mention that this is an art?). At the fifth, we assert not only that position matters, but also that dynamics in one location propagate to the next, again with some random variation.

Hierarchies like this one help to avoid the temptation to connect things that aren’t necessarily linked - a temptation that is especially strong in spatial analysis. It’s second-nature to assume that two events that take place near one another, in space and/or time, must somehow relate. However, if we can explain the phenomenon without recourse to such a link, then we ought to accept the simpler explanation.