TL;DR: Status statistics, such as the coronavirus-tracking daily counts (e.g. of patients ventilated or deceased), are commonly used but are inherently insufficient without additional transition statistics.
Audience: data scientists and their managers, experiment designers, data journalists, and anyone with a critical approach to statistics.
Read time: 7 minutes.
Status statistics are commonly reported in a variety of contexts. These statistics report on the status of particular subjects. Well-known examples of status statistics include:
- number of unemployed, reporting on employment status of citizens;
- number of tourists, reporting on the intent status of travelers;
- amount invested in funds, reporting on the status of money;
- and the aforementioned coronavirus-tracking statistics, such as number of ventilated patients, reporting on health status of patients.
Because status statistics are so ubiquitous, it can come as a surprise that they present an inherently incomplete picture of their subjects. This post aims to show why this is so and how it can be fixed, be the context of the status statistics as it may.
The Nature of Status
Before considering the problem with status statistics, let’s consider first the nature of status. The concept of status is related to a couple other concepts:
- Subject: this is what the status describes.
- Group: this is the set of subjects who have a common status.
- Population: this is the set of subjects who have any status.
- Status-set: this is the set of possible statuses.
To keep things simple, we consider the case where one status is associated with a subject. The case of concurrent statuses associated with a subject can be reduced to the former case by working with a subset of statuses in place of a status; we won’t consider this further here.
A couple of properties of statuses are important to our discussion:
- Partitioning: statuses partition the population into groups, i.e. the groups are disjoint and their union is the population.
- Dynamic: statuses change over time.
Status Statistics are Incomplete
We now consider the coronavirus-tracking daily counts to demonstrate that status statistics are incomplete. The following table shows such statistics for Israel on April 29th and 30th, taken from Israel Ministry of Health:
Date | Tested | Positive | In hospital | Serious condition | Ventilated | Deceased |
---|---|---|---|---|---|---|
April 29th | 370,505 | 15,839 | 370 | 118 | 93 | 217 |
April 30th | 380,339 | 15,983 | 371 | 110 | 87 | 223 |
With these statistics alone, we can only guess what the story of the patients behind them is. Many questions are impossible to answer, for example:
- Why are there fewer patients in serious condition? Did they recover? Or are they still in hospital just in a less than serious condition?
- In what condition were the 6 patients who deceased? Did all 5 patients who are no longer ventilated pass away?
- What happened to the 144 additional patients who tested positive? Did they get hospitalized?
This informational incompleteness is not a coincidence specific to this example or to this type of data. It is inherent to status statistics which by their nature do not give us sufficient information on dynamics. In general, when a change in a status statistic is reported, we are left to guess what happened to the subjects behind the change.
Transition Statistics to the Rescue
We can fix the inherent incompleteness of status statistics by introducing transition statistics. These are statistics about a transition from one status to another. They tell a story of what happened to subjects at the time of transition.
Let’s add some hypothetical transition statistics the above example. In each column the transition statistics show how many patients transitions from one status to another.
From: | Non-Diagnosed | Positive | Negative | Serious condition | Ventilated | Ventilated | … |
To: | Positive | Negative | Positive | Ventilated | Deceased | Serious condition | … |
April 30th | 52 | 2 | 94 | 0 | 5 | 1 | … |
Isn’t that better? Much more detailed. We can now clearly see what happened to patients on the transition from April 29th to April 30th.
A couple of points to note about transition statistics:
- The above-mentioned properties of statuses, which are partitioning and dynamic, imply similar properties for transitions.
- Some transitions are infeasible, e.g. a deceased status cannot change. It can be helpful to display a diagram showing each possible status as a block and each possible transition as an arrow from one block to another.
- Some statuses are independent of each other, e.g. both a positive and a negative diagnosis status could co-occur with a serious condition status. It is important to model the status accordingly.
Adding transition statistics can provide lots of benefits over using just status statistics. For example:
- For data journalists, the stories told using transition statistics can be much more detailed and vivid.
- For data scientists, the models built using transition statistics can be much more refined and accurate in capturing status dynamics.
- For experiment designers, the experiments designed to capture transition statistics can produce a richer set of observations.
Finally, the alert reader may have recognized that some domains already use transition statistics as a matter of routine. One example is in the financial domain, where spending statistics tell us in which categories consumers spend their money, i.e. the transition of money from consumer liquidity status to a specific spending category status.
Conclusions
We argued that status statistics are inherently insufficient. The nature of statuses generally leads to status statistics failing to capture the dynamics of subjects. Transition statistics are able to capture this dynamics and tell a much more detailed story of subjects. The understanding and use of transition statistics can benefit data journalists, data scientists, and experiment designers.
A major goal in this post is to advocate the use in many domains of transition statistics in addition to status statistics. Ask your health authorities, and any status statistics provider, to also collect and provide transition statistics!