When should progression-free survival be used as a surrogate for overall survival in oncology? A recent article in the International Journal of Technology Assessment in Health Care, co-authored by OHE Visiting Fellow Alastair Fischer, demonstrates that uneven reporting of results, different definitions and methods of analysis, and a lack of rigour in applying methodology, make progress difficult.
Most drug therapies in oncology to date cause solid tumours to shrink for a time, after which they begin to grow again. A way of understanding progression-free survival (PFS) is to consider the time from initiation of treatment until the tumour returns to its size at initiation. A PFS of a certain length of time for a particular patient could be expected (if other things remain unchanged) to lead to a similar increase in the length of their overall survival (OS), one of the main outcomes in oncology. However, OS may take many months or years to measure accurately, as it requires most or all patients to have died.
If PFS and OS were perfectly correlated, PFS could be substituted for OS, leading to quicker results and a smaller sample size from trials, with no reduction in accuracy for the estimate of OS. Given that perfect correlation will not be attained, use of PFS to measure OS will introduce error.
How low can the PFS-OS correlation be before the error nullifies the benefits that a perfect surrogate for OS would bring?
This is difficult to answer because, in a randomised controlled trial (RCT) of a new drug versus current care, PFS can be measured for each individual in the treatment group, but the change in OS cannot, because it is not known when the patient would have died without treatment. Thus, the only possible measure of the increase or decrease in OS because of treatment is to subtract the average length of subsequent survival in the control group from the average in the treatment group. Together with the average PFS from the treatment group, this forms a single observational pair from the trial. For a particular type of cancer, each of a number of trials contributes a single observational pair (average change in OS, average PFS) and from the distribution of such points the correlation between OS and PFS is estimated. To gain more observations, a different drug may be used as a treatment, and for some observations, the drug used in the control arm may also be changed. There is thus a trade-off between the lower random error from increasing the number of observations and an increase in heterogeneity, which is likely to bias the estimated correlation downwards.
We updated the work of Davis et al (2012) using the same methodology for comparability of results over the period 2012 to 2016. We found that there was little or no change in the factors at work in deciding the circumstances under which PFS was an adequate surrogate. The study confirmed Davis’ finding that the adequacy of the surrogate depends on
- the stage of the tumour,
- the line of treatment (first, second or subsequent),
- whether crossovers are allowed, and
- being unable to distinguish the residual effect of first-line treatment from the effects of subsequent treatments.
The paper also describes the way that the results of the reviewed studies have been analysed and reported. While usually adhering to reporting standards, the standard of scholarship in the literature appears sometimes to be questionable and the reporting of results haphazard. Criteria for what makes a good surrogate also differs from study to study. Researchers who analyse the PFS-OS relationship commonly complain that the definition of PFS appears to differ between trials. Treatment of outliers – a cause of unstable results – has often not been undertaken or has not been reported. The frequency of usage of the main statistical methodology for the adequacy of a surrogate (developed by Buyse and colleagues in 2000) has increased since 2012 but is still far short of 50% of studies. This method requires estimation of both trial level data (the single data-point pair per trial) and individual patient data (IPD) within trials. Yet the proportion of studies reporting the use of IPD remains at its pre-2012 level of 33%.
Ciani (2016) reports on efforts to improve reporting standards. Standardization, in the form of adhering to common definitions, statistical techniques and a checklist of necessary items in reporting results, would often be virtually costless. Despite such advances, it is likely that the use of surrogacy will remain controversial because, at the heart of the problem, the uncomfortable truth is that only a single summary data-point pair (average PFS, average OS) can be collected per trial, whatever the trial’s size.
Hernandez-Villafuerte, K., Fischer, A., and Latimer, N. 2018. Challenges and Methodologies in Using Progression Free Survival as a Surrogate for Overall Survival in Oncology. International Journal of Technology Assessment in Health Care 34(3), pp.300-316. DOI
Posted in Health Technology Assessment, NICE, Research | Tagged External publications