Because our lab database goes back 24 years, we’ve started to see a certain number of patients who had last been seen ten or even twenty years ago fairly often. For this reason I’ve been thinking about what is a clinically significant change over that long a time period. The guidelines my lab uses for interpreting change in test results came about from a consensus among the department’s pulmonary physicians close to twenty years ago. As usual there are some discrepancies between our guidelines and those the ATS-ERS have published.
| Test: | %Change | Minimum Change: |
| FVC | >=10% | >= 200 ml |
| FEV1 | >=10% | >= 200 ml |
| TLC | >=10% | ? |
| DLCO | >=10% | >= 2 ml/min/mmHg |
Our criteria came primarily from the standards for repeatability in test results. The ATS-ERS guidelines for interpretation takes repeatability into consideration but also what appears to the minimum statistically significant clinical change. For year to year changes these are:
| Test: | %Change |
| FVC | >=15% |
| FEV1 | >=15% |
| DLCO | >=10% |
Interestingly a significant change in TLC is not discussed, but since I’ve searched the literature and have been unable to find any longitudinal studies on lung volumes in normal subjects I am not surprised that it was not included.
I think that both of these standards are based on the assumption that patients are seen on a relatively regular basis and that when results are compared they usually come from the more-or-less recent past, not from a decade or more. A change of 10 or 15 percent in FVC, FEV1 and DLCO is not unusual for a 10 or 20 year period however (I’m less certain about TLC). When comparing results over a long time period there at least of couple of issues that should be addressed, the first of which is normal age-associated declines in test results.
The ATS recently released their standards for Occupational Spirometry which included a discussion of assessing changes in FEV1 over time. The basic idea that was presented is to compare the percent predicted values, not the actual test results. Doing this adjusts the results for age and the recommendation was that a decline in FEV1 of 15% was significant. This threshold had been suggested in the prior ATS/ERS statement on interpretation but was formalized in the Occupational Spirometry standard with algorithms for calculating the change. This is an important guideline for assessing longitudinal changes in an individual but was notably limited solely to FEV1 because “it is less affected by technical factors than the FVC”.
In light of this standard my lab has decided to add a comment on an age-adjusted 15% decrease in FEV1 from any prior test whenever it is seen to occur (but only when there has been no significant decline from the last time spirometry was performed otherwise the comment would be somewhat redundant). Despite the specific exclusion of FVC, it would seem that this standard is at least a starting point for assessing significant decreases in FVC, TLC and DLCO.
An important aspect of aging was not addressed in either ATS-ERS standard, however, and that was decreases in height. My lab is fairly obsessive-compulsive about measuring patient height at every visit. When a patient hasn’t been seen in a while it is common to see a significant change in their height (which I can relate to since I’ve lost an inch and a half since I was twenty). The height decrease is occasionally enough, particularly over a long time period, that if their height had not been re-measured, the age-adjusted decrease in FVC, FEV1, TLC or DLCO would have been significant whereas with the change in height it was not.
Should decreases in height be ignored when making comparisons over long periods of time? There are relatively few longitudinal studies of changes in lung function over time. This is not surprising given the difficulties involved in following a group of people over a prolonged period of time but it does leave a significant gap in our knowledge. I was able to find and review about a half dozen longitudinal studies of spirometry and DLCO but in half of them only the original height was reported and in the others changes in height were noted, but not included in any statistical analysis of change.
A pulmonary physician I worked with at one time said that a patient’s height should not be updated because the percent predicted should always be compared to their original height. My counter argument was that reference equations are developed using a subject’s current height, not their height at some time in the past and that if the patient had not been seen previously their report would also be interpreted in light of their current height, not their prior height. For these reasons I am going to say that height should be updated and comparisons, even over long periods of time, should be based on the percent predicted of height and age at the time of the test.
Since I think that comparing age- and height-adjusted percent predicted values is the best approach for assessing changes over long periods of time, the question then becomes whether a 15% threshold for FVC, TLC and DLCO is too high, too low, or just right. As I mentioned previously, 10% is roughly the threshold for the repeatability of these tests but repeatability really applies to within-session testing, not I between sessions. In addition, a critical component when comparing results over time has to be test quality. Over the years I’ve become all too aware of the myriad of problems involved in obtaining accurate FVC, TLC and DLCO test results. These tests cannot be interpreted accurately in the first place without an assessment of their quality and comparing test quality, particularly for tests that occurred years apart and that were performed with different (and often long-since replaced) test equipment and technicians is difficult. Having said that, ongoing calibrations and quality control should have kept these differences to a minimum (otherwise what’s the point of trending results in the first place?) and that a threshold of 15% is probably reasonable. I would be concerned that a threshold less than 15% would have too many false positives and one that was higher would have too many false negatives but this is admittedly a guess. Nevertheless, when a significant change is detected in any of these results at least some attempt to assess test quality should also be made.
Computerized pulmonary function testing has been around for decades and lab databases have the potential to extend well into the past. Even if a lab makes a decision to limit the size of their database hospital information systems now collect patient records over extended periods of time and at some point guidelines for assessing changes over a prolonged period of time need to be made. The relatively small number of longitudinal studies limits our ability to accurately assess clinically significant changes, but an age- and height-adjusted threshold of 15% seems to be a good starting point.
References:
Brusasco V, Crapo R, Viegi G, et al. ATS/ERS Task Force: Standardisation of Lung Function Testing. Interpretive strategies for lung function testing. Eur Respir J 2005; 26: 948-968.
Redlich CA, Tarlo SM, Hankinson JL, Townsend MC, Eschenbacher WL, Von Essen SG, Sigsgaard T, Weissman DN. Official American Thoracic Society Technical Standards: Spirometry in the occupational setting. Amer J Respir Crit Care Med 2014; 189: 984-994.

PFT Blog by Richard Johnston is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
