A spirometry quality grading system. Or is it?
A set of guidelines for grading spirometry quality was included with the recently published ATS recommendations for a standardized pulmonary function report. These guideline are similar to others published previously so they weren’t a great surprise but as much as I may respect the authors of the standard my first thought was “when was the last time any of these people performed routine spirometry?” The authors acknowledge that the source for these guidelines is epidemiological and if I was conducting a research study that required spirometry these guidelines would be useful towards knowing which results to keep and which to toss but for routine clinical spirometry, they’re pretty useless.
I put these thoughts aside because I had other projects I was working on but I was reminded of them when I recently performed spirometry on an individual who wasn’t able to perform a single effort without a major errors. The person in question was an otherwise intelligent and mature individual but found themselves getting more frustrated and angry with each effort because they couldn’t manage to perform the test right. I did my best to explain and demonstrate what they were supposed to do each time but after the third try they refused to do any more. About the only thing that was reportable was the FEV1 from a single effort.
This may be a somewhat extreme case but it’s something that those of us who perform PFTs are faced with every day. There are many individuals that have no problems performing spirometry but sometimes we’re fortunate to get even a single test effort that meets all of the ATS/ERS criteria. The presence or absence of test quality usually isn’t apparent in the final report however, and for this reason I do understand the value in some kind of quality grading system. But that also implies that the grading system serves the purpose for which it is intended.
In order to quantify this I reviewed the spirometry performed by 200 patients in my lab in order to determine how many acceptable and reproducible results there were. To be honest, as bad as I thought the quality problem was, when I looked at the numbers it was worse than I imagined.
The spirometry quality grading system is:
Grade: | Criteria: |
A | ≥3 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater |
B | ≥2 acceptable tests with repeatability within 0.150 L (for age 2–6, 0.100 L ), or 10% of highest value, whichever is greater |
C | ≥2 acceptable tests with repeatability within 0.200 L (for age 2–6, 0.150 L ), or 10% of highest value, whichever is greater |
D | ≥2 acceptable tests with repeatability within 0.250 L (for age 2–6, 0.200 L ), or 10% of highest value, whichever is greater |
E | 1 acceptable test |
F | No acceptable tests |
It’s important to note that this grading system is based primarily on the reproducibility of acceptable tests. Acceptable tests are:
- A good start of exhalation with extrapolated volume , <5% of FVC or 0.150 L, whichever is greater.
- Free from artifacts
- No cough during first second of exhalation (for FEV 1 )
- No glottis closure or abrupt termination (for FVC)
- No early termination or cutoff (for FVC)
- Maximal effort provided throughout the maneuver
- No obstructed mouthpiece
There were 703 spirometry tests from the 200 patients for an average of 3.5 tests per patient. The lowest number of tests performed was 3, the maximum was 7. Out of 200 patients, 50 patients (25%) were unable to perform a single, acceptable test and would have received an ‘F’ quality grade. Another 51 patients (26%) were able to perform one acceptable test and would have received an ‘E’ quality grade. Only 38 patients (19%) were able to perform three (or more) acceptable tests and receive a ‘A’ quality grade. The remaining 61 patients would have gotten a ‘B’, ‘C’ or ‘D’ quality grade.
The distribution of errors were (some efforts had more than one error):
Expiratory time < 6 seconds: | 314 |
End-of-test: | 268 |
FVC > 0.15 L or 10%: | 201 |
FEV1 > 0.15 L or 10%: | 126 |
PEF < 20% max | 117 |
Back-extrapolation: | 45 |
Pauses that affected FVC or FEV1: | 43 |
FIVC > FVC: | 6 |
It’s apparent from this that the biggest problem most patients have is with the length of their exhalation (EOT criteria, expiratory time and FIVC > FVC) and that this primarily impacts the FVC and not the FEV1. The number of factors that affect the FEV1 (back-extrapolation, peak flow, pauses) are a lot smaller. To some extent this doesn’t surprise me since I’ve always felt that in spirometry testing the FEV1 was more reliable than the FVC.
There is an additional point the quality grading system does not address, and that is composite results. Specifically, reporting the highest FVC (regardless of which effort it came from) along with the highest FEV1 which is allowed and even encouraged by the ATS/ERS spirometry standards. Composite results were reported for 69 out of the 200 patients (35%). I did not try to analyze these closely but I can say that 22 out of these 69 (32%) had no acceptable test efforts. Some fraction of these however, combined an effort with an acceptable FEV1 and an effort with an acceptable FVC but the grading system would still have given them an ‘F’.
Note: I didn’t try to correlate the number or type of spirometry errors with the technicians that performed their tests. Partly because I wasn’t interested, partly because which patient you get is usually the luck of the draw and partly because in the past when I was the lab manager I always took the toughest patients and probably would have had one of the highest error rates so there isn’t necessarily any correlation here.
I can’t prove it but I think that these statistics are reasonably representative of the experience in most PFT labs. Some labs are going to be better, some are going to be worse. I like to think that my lab is better than most but that’s purely subjective and regardless of how good (or bad) a lab’s staff are, in the final analysis it comes down to the patient’s ability to perform spirometry and that really isn’t as good as you might think it ought to be. To (badly) paraphrase Clauswitz, “even though spirometry is simple, when testing humans even the simple is very difficult.”
In the ICU there’s something called alarm fatigue where alarms are going off more or less continuously because a patient moved or because of bad connections or because the alarm limits are set too stringently (or whatever). Medical staff often become deaf to these alarms and stop paying attention to them, sometimes with adverse consequences for their patients.
So, the problem is that over 50% of my lab’s patients would have gotten an ‘E’ or and ‘F’ grade. If you were interpreting reports, how quickly would you get ‘alarm fatigue’ if those were the most common quality grades you saw? For that matter, how long would it take you to get the idea that your PFT lab was mostly staffed with incompetents?
I’m sure the authors of the quality grading system would argue that the results should be used as part of a quality improvement plan, and although I would agree with the sentiment, the reasons for suboptimal test quality (probably partly psychological, partly physiological and partly medical) are not easily quantifiable. In addition, what’s labeled a spirometry quality grading system is really a reproducibility grading system for ‘acceptable’ quality tests. I’m not going to say that this doesn’t serve a useful purpose but it should be labeled for what it is.
A problem that everyone who interprets pulmonary function results faces (with varying degrees of success since it is usually only acquired from experience) is assessing suboptimal quality tests in order to determine what parts are meaningful and informative, and what parts aren’t. Given that over half our patients would only have gotten and ‘E’ or an ‘F’ grade what would have been far more useful than a grading system would be official guidelines for determining the information content of suboptimal quality tests. A spirometry effort that doesn’t meet acceptability criteria may still have something useful to say about expiratory volume or flow rates. This in turn could be used to say something useful about the probable presence or absence of airway obstruction and restriction and allow us to at least salvage something out of suboptimal spirometry test quality.
References:
Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Standardisation of spirometry. Eur Respir J 2005; 26(2): 318-339.
Brusasco V, Crapo R, Viegi G. ATS/ERS task force: Standardisation of lung function testing. Interpretive strategies for lung function tests. Eur Respir J 2005; 26(6): 948-968.
Graham BL, Coates AL, Wanger J et al. Recommendations for a standardized pulmonary function report. An official American Thoracic Society technical statement. Am J Respir Crit Care Med 2017; 196(11): 1463-1472