close up photo of calculator display on a smartphone

Will innumeracy cause this study to be retracted? Don’t count on it…

I used to be concerned about bad science. These days, what gets me going is wrong science: blatant error somehow surviving peer-review and ending up published as if it were fact. It seems that is where we have got to with modern academic publishing. Standards have slipped so badly, even outright innumeracy doesn’t matter anymore.

Publish or perish. There is a journal for everything. If at first you don’t succeed… Such are the mantras that motivate modern scholarship. The attempt at meritocracy has all but dissipated, and in its place mediocrity has been normalised.

Is it a guild? Is it an echo chamber? Or is it just a racket?

Whatever it is, it is truly broken.

* * *

Here is a brief arithmetic challenge for those of you with the back of an envelope to hand.

Consider a research study in which 316 patients undergo a given therapy. The researchers then examine the patients’ employment status before and after treatment. The data turn out as follows:

CategoryNumber of patients
Patients who WERE WORKING BOTH BEFORE AND AFTER treatment167
Patients who WERE WORKING BEFORE treatment, but NOT AFTER18
Patients who WERE NOT WORKING EITHER BEFORE OR AFTER treatment104
Patients who WERE NOT WORKING BEFORE treatment, but WERE WORKING AFTER27
Total316

And here is a question:

  • Of the patients who were working BEFORE treatment, what percentage were still working AFTER treatment?

Go on, it isn’t that difficult. In total, 185 patients were working at baseline (i.e., 167 + 18), of whom 167 were working at follow-up. 167 is 90% of 185. So the answer is 90%.

As I said, not difficult!

As you’ve probably guessed, these figures are from a real paper that has just been published in a real journal. But here’s what how the paper’s real authors attempted to describe this very same finding:

Patients were followed up for an average of 285 days and over this period 53% of patients who were working remained in employment.

What now? 53%? That’s not correct at all! 167 is not 53% of 185.

Sorry, it just isn’t.

* * *

Let’s have another go. Here is a different question:

  • Of the patients who were NOT working BEFORE treatment, what percentage WERE working AFTER treatment?

Again, the required computation is not hard. Have another look at the table: a total of 131 patients were recorded as not working at baseline (i.e., 104 + 27), while 27 were recorded as working at follow-up. So the answer here is 21% (i.e., 27 as a percentage of 131).

But here’s what the authors said it was:

Of the patients who were not working at baseline, 9% had returned to work at follow-up.

Huh? 9%? Again, that is totally incorrect…

* * *

So let’s have one last try. Here is the final question:

  • Of the patients who WERE working BEFORE treatment, how many were NOT working AFTER treatment?

As mentioned above, 185 patients were working at baseline, and as shown in the table, 18 patients were working at baseline but not at follow-up. So the answer to this question is 9.7% — i.e., 18 as a percentage of 185.

But instead of 9.7%, the authors came up with this:

However, of those working at baseline, 6% were unable to continue to work at follow-up.

What the? Again, this is just wrong! Wrong, wrong, wrong! Somehow they managed to get all their percentages wrong.

It is almost unbelievable.

Call me old-fashioned, but this is basic arithmetic we’re talking about here…

* * *

So what happened? Well, the authors seem to have systematically garbled all their percentages by computing them against the wrong category quantities.

Whenever they tried to look at the subgroups “patients who were working at baseline” or “patients who were not working at baseline,” they didn’t use the actual numbers in these categories (i.e., 185 and 131). Instead they used the total number of patients for which they had follow-up data (namely, 316). As such, all their percentages were calculated — wrongly — out of 316. All of them!

It’s as though the concept of a-fraction-of-a-fraction just blew their minds. The whole exercise exceeded their arithmetic abilities. Or maybe they just didn’t have any envelopes.

But worst of all, none of the paper’s five authors, none of the peer reviewers, and none of the journal editors or assistant editors seem to have noticed. All were happy to sign off on the publication of numerical nonsense. It reminds me of those hoax articles that conscientious academics sometimes submit to journals just to make a point about publication standards.

Except it this one isn’t a hoax.

Then again, maybe those who were involved in producing this mess just don’t care about publication standards. After all, in modern academia paper published equals job done. You get the credit even when you’re wrong, so why waste time getting things right?

When it comes to academic CVs, it’s a case of Never Mind the Quality, Feel the Width.

* * *

The rogue paper appeared last week in Occupational Medicine, a journal published by Oxford University Press. By way of a response, David Tuller and I have submitted a letter to the journal’s editor for their urgent attention. You can read the full content of our letter in this preprint and over at David’s blog. Here is an flavour of what we’ve said:

In several sections of the paper, the authors’ description of their own statistical findings is incorrect. They make a recurring elementary error in their presentation of percentages. The authors repeatedly use the construction “X% of patients who did Y at baseline” when they should have used the construction “X% of all 316 patients (i.e., those who provided follow-up data)”. This recurring error involving the core findings undermines the merit and integrity of the entire paper.

For example, in the Abstract, the authors state that “53% of patients who were working [at baseline] remained in employment [at follow-up].” This is not accurate. Their own data (Table 2) show that 185 patients (i.e., 167 + 18) were working at baseline, and that 167 patients were working at both time points. In other words, the proportion working continuously was in fact 90% (i.e., 167 out of 185). The “53%” that the authors refer to is the percentage of the sample who were employed at both time points (i.e., 167 out of 316), which is an entirely different subset. They have either misunderstood the percentage they were writing about, or they have misstated their own finding by linking it to the wrong percentage.

…the technical errors that undermine this paper’s reporting of percentages render its key conclusions meaningless. The sentences used to describe the findings are simply incorrect, and the entire thrust of the paper’s narrative is thereby contaminated. We believe that allowing the authors to publish a correction to these sentences would create only further confusion.

We therefore call on the journal to retract the paper.

Each of the core conclusions presented in this paper is worthless, because the authors’ inferences are based on statistical statements that are computationally incorrect.

Let’s see what the journal does next.