Performance of seven criteria to assess CA125 increments among ovarian cancer patients monitored during first-line chemotherapy and the post-therapy follow-up period

Aim: To investigate seven CA125 criteria to monitor progressive ovarian cancer among patients with stage IC–IV disease. Materials & methods: Four criteria were used to asses CA125 increments starting from concentrations ≥35 U/ml and three criteria to asses increments starting from concentrations <35 U/ml. Results: A total of 231 patients were allocated to CA125 monitoring. The performances of the CA125 criteria were similar with sensitivities of 30–55%, negative predictive values of 28–46%, positive predictive values of 90–100% and median lead times of 26–87 days. Conclusion: The criteria showed low sensitivity and inability to exclude progressive ovarian cancer. The study suggests that CA125 information cannot stand alone but should be considered used in conjunction with other investigative procedures.

A change of tumor size is routinely measured by radiological imaging according to the Response Evaluation Criteria in Solid Tumors (RECIST 1.1) [1]. However, this may be dif ficult among patients with ovarian cancer as they often have no macroscopic detectable disease after initial surgery or they present with widespread diffuse peritoneal meta stases [2][3][4]. The serological tumor marker, cancer antigen (CA125), is frequently added as a biochemical monitor of patients with epithelial ovarian/fallopian tube or primary serous peritoneal cancer [5]. However, it is a challenge to define increments of CA125 concentrations that reliably correlate with increasing tumor burden, in other words, recurrence and progressive disease. In the last three decades, a number of evaluation crite ria have been proposed to interpret serially increasing CA125 concentrations [5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20].
Recently, a systematic review [21] identified seven criteria to assess CA125 increments from below to above the applied cutoff con centration and from above the applied cut Performance of seven criteria to assess CA125 increments among ovarian cancer patients monitored during first-line chemotherapy and the post-therapy follow-up period off to higher levels as proposed by Rustin et al. [6,8] and by Tuxen et al. [7,20]. The criteria by Rustin et al. were generated from epithelial ovarian cancer (EOC) patients during followup after firstline chemotherapy and incorporated into the RECIST 1.1 by the Gyne cological Cancer Intergroup [1]. The criteria suggested by Tuxen et al. were generated from ovarian cancer patients monitored during firstline chemotherapy as well as the subsequent control period [7,20]. The seven criteria identified in the review were further com pared in a Phase I monitoring trial according to the design recommendations from the European Group on Tumor Markers (EGTM) [22]. They were compared under standardized conditions, and their individual ability to detect early tumor growth was evaluated in a preclinical model system based on computer simulated CA125 concentrations [23].
The current Phase II monitoring study estimated whether the criteria that performed the best [24] in the simulation model also performed the best when applied to serial CA125 concentrations obtained from ovarian cancer patients monitored during firstline chemotherapy and the postchemotherapy followup period [23]. The study also estimated whether the cri teria introduced by Rustin et al. were useful during firstline chemotherapy even though their criteria were generated from patients during followup after first line chemotherapy. Overall, the current study was per formed to challenge a previous report by Rustin et al. which suggested that CA125 should not be used as a standard test for monitoring patients with ovarian cancer.

Design
The study complied with the general recommenda tions for study design as specified by the Standards for Reporting Diagnostic Accuracy Studies (STARD) [25]. However, it was not a crosssectional diagnostic study but a longitudinal monitoring trial based on serial measurements of CA125 among individual patients. Additionally, the study followed a phased approach as proposed by the EGTM and was designed as a prospec tive Phase II biomarker monitoring trial embedded CA125 increments starting below the cutoff. CD denotes the required critical difference. RCV denotes the reference change value. † Criterion 1B showed the best monitoring performance for increments starting above the applied cut-off in the simulation models [23]. ‡ Criterion 2A showed the best monitoring performance for increments starting below the applied cut-off in the simulation models [23].

Quality assurance
To ensure a stable analytical quality throughout the study, three control samples were included in each assay run having different concentrations of the ana lytes. A Westgard multirule combination was used to accept or reject runs [28]. The analytical imprecision comprised both the intra and the interassay variation because each sample from an individual subject was analyzed consecutively in different assay runs.

Criteria to interpret CA125 increments
Seven criteria were tested during firstline chemother apy and the subsequent followup period (Figure 1). Four criteria were tested for increments starting from baseline concentrations above cutoff to higher levels ( Figure 1A), and three criteria were tested for incre ments from below to above cutoff ( Figure 1B). In both situations, Rustin et al. used an approach by which the increment between two concentrations had to exceed a defined arbitrary percentage of change before con sidered indicative of progression [6,8]. Tuxen et al. [20] used two approaches to generate their CA125 assess ment criteria. Their first approach involved a statistical    = 0 days) if the clinical progression and the marker progression were obtained simultaneously. The lead time was negative (<0 days) when the clinical pro gression preceded the marker progression. When a clinical evaluation and a marker assessment differed, the data were registered as discordant; identical data were registered as concordant. Thus, the truepositive (TP) results denoted concordant information in terms of progressive disease with lead time ≥0 days. True negative (TN) results denoted concordant informa tion in terms of nonprogression. Falsenegative (FN) results denoted discordant information when there was clinical progression without CA125 progression. False positive (FP) results denoted discordant information when CA125 progression was not followed by clinical progression.

Statistics
A power calculation of the sample size was performed prior to opening the Phase II trial database. For a cri terion to be valid for detecting CA125 increments it was assumed that the criterion should provide 70 TP signals in terms of progressive disease. By fixating the type 1 error on 0.05 and the type 2 error on 0.10, cal culation of power showed that a total of 156 patients should be included for each criterion in order to detect a difference in their performance. The number of TP, FN, FP and TN results was counted. The sensitivi ties (the percentage of patients with tumor growth detected by CA125 increments), the specificities (the percentage of patients without new tumor growth confirmed by unchanged CA125 concentrations), the positive predictive values (the probability of clini cal progression following CA125 progression), the negative predictive values (the probability of clini cal nonprogression, given a marker nonprogression), the FP rates (the percentage of CA125 increments among patients without new tumor growth) and the FN rates (the percentage without CA125 increments among patients with new tumor growth) were calcu lated. The 95% CI were estimated according to Geigy formulas 771 and 772 [29]. Table 4. CA125 lead times among progressive ovarian cancer patients provided by the applied assessment criteria cumulated from first-line chemotherapy and follow-up.    Based on data cumulated from firstline chemotherapy, the subsequent followup and all histological tumor types, the accuracies of the criteria to detect CA125 incre ments starting from above and below the applied cutoff were similar with overlapping 95% CI (Table 3A). Crite ria 1A-1D did not provide FP increments but the num bers of FN events were high, 53-70%. Criteria 2A-2C provided 3-4 FP increments (same patients except one); all FP increments were registered during the posttherapy followup period at different time points depending on the individual criterion. The numbers of FN events were high (49-59%) ( Table 3A). Inclusion of serous tumor types only, did not improve the accuracy (Table 3B). Thus, the number of FN events remained high both among patients with above and below cutoff concen trations (45-60 and 47-55%, respectively) (Table 3B) Figure 2 illustrates a new format to pres ent detailed information of lead times in terms of pro gression among individual patients. Events with posi tive lead times (>0 days) are marked below the solid line; events with no lead time (=0 days) are marked on the solid line; and events with negative lead times (<0 days) are marked above the solid line.

Discussion
Tumor marker monitoring studies investigating the accuracy of criteria to detect increasing concentrations have frequently been based on heterogeneous designs making interpretation of results difficult [31]. The cur rent study followed the proposals of the EGTM and was designed as a prospective Phase II biomarker monitoring trial embedded into clinical drug trials where the tumor marker investigation was a secondary objective in relation to the clinical drug trial [26]. The current Phase II monitoring trial investigated the best performing criteria previously identified in a Phase I biomarker simulation study. The criteria were investi gated among a cohort of ovarian cancer patients with disease stages IC-IV receiving firstline chemotherapy and during the subsequent posttherapy followup period. A distinction was made when interpreting CA125 increments starting from baseline concentra tions above and below cutoff, respectively, because published criteria are focused on the nadir concentra tion of the increment in relation to the applied cutoff concentration. It appears from Table 3A & B that approximately 80% of the patients had baseline CA125 concentra tions below the applied cutoff. This was due to the inclusion procedure where 21% of the patients had earlystage disease (FIGO IA-IIC) with a low tumor burden (Table 1). For increments starting from concen trations above cutoff, neither of the criteria 1A-1D provided FP signals when all histological ovarian tumor types were considered (Table 3A). For baseline concentrations starting below cutoff, criteria 2A-2C each provided one temporary FP increment to above cutoff at different time points among the same three patients during posttherapy followup (Table 3A). Cri terion 2C provided an additional asynchronous FP increment. Three patients were under surveillance for additional 2-4 years without developing clinical pro gression, and one patient was followed for 6 months before the 5year routine surveillance was completed and the patient was discharged from further follow future science group future science group Performance of seven criteria to assess CA125 increments among ovarian cancer patients Research Article future science group up without evidence of disease. This condition illus trates one of the most difficult situations in monitor ing where patients have a rising CA125, no evidence of progressive disease on imaging and no clinical symp toms of disease progression. It may be suggested that the patient should have been offered an additional year of surveillance. Bias may be less likely as the cause of the observed FP events because all measurements were performed at the same laboratory closely following the analytical quality. Additionally, the FP increments did not occur at the time point when the method to mea sure CA125 was changed. However, it may be specu lated that undetected temporary benign disease may have caused the FP increments [22,32].
Criteria 1A and 2A were developed to monitor patients during followup after primary therapy and have not been validated during firstline chemo therapy; consequently, it is difficult to compare the current data with former studies. As regards criteria 1B-1D and 2B-2C, this is the first clinical study to report on their individual accuracy; their combined accuracy has been reported [7,20]. Overall, the accura cies of the criteria to interpret increments from above cutoff to higher levels among all histological tumor types were similar as were the accuracies of the crite ria interpreting increments from below to above cut off (Table 3A). However, a closer examination indi cates that the accuracies of the criteria in the current Phase II monitoring trial may support the accuracies reported in a previous Phase I simulation study [23]. Based on simulated data stratified for the lowest number of FP increments criterion 1B performed best among increments starting from above cutoff fol lowed by criteria 1A, 1D and 1C, respectively; and criterion 2A performed best among increments start ing from below cutoff followed by criteria 2B and 2C. Owing to the nature of quantitative biochemi cal tests, there is an inverse relationship between the number of FP and FN events; consequently, the lower the number of FP events, the higher the number of FN events [33]. Accordingly, the criteria with the low est number of FP events in the Phase I simulation tests (criteria 1B and 2A) provided the highest num ber of FN events in the current Phase II monitoring trial (Table 3A).
It is often stated that CA125 is mainly expressed by ovarian tumors of the serous type. We therefore inves tigated the accuracy among patients whose tumors had this histological classification (Table 3B). However, the accuracy among serous tumors did not improve as compared with the accuracy when all histological types were included (Table 3B & A, respectively). In both situ ations, the observed numbers of FN events were rela tively high indicating that the validated CA125 criteria are unreliable to exclude clinical progression indepen dent of the baseline concentrations. Again, the criteria with the lowest number of FP events in the Phase I sim ulation tests (criteria 1B and 2A) provided the highest number of FN events in the current Phase II monitor ing trial (Table 3B). Other issues should also be consid ered in association with the reported FN events. Since the study period 1995-2001, the histological classifi cation system has been changed. Recent evidence has identified EOC as a heterogeneous disease with five distinct subtypes: highgrade serous, lowgrade serous, clear cell, endometrioid and mucinous. Each subtype is associated with different biological characteristics, clinical behavior and prognosis [34,35]. There is now persuasive evidence to classify these five types of ovar ian carcinoma as different diseases [34,36,37]. Reclassifi cation of the patients according to current standards may provide an alternative distribution in the different subtypes. Situations with slow rate of CA125 increase due to low production in the tumor represent a chal lenge in terms of FN events. In an effort to elucidate whether the rate of FN events can be reduced with out numerous FP signals, it seems relevant to validate criteria specially designed to assess increments within the normal range and from below to slightly above the applied cutoff [38]. The reliability and the length of the positive lead time are important parameters for monitoring of cancer patients. A positive lead time enables early supplementary investigations, in other words, imaging and/or institution/change of therapy. The lead time provided by the criteria during clini cal monitoring was in accordance with the leadtime potential obtained in the simulation studies. The cri teria with the longest lead times in the current clinical Phase II monitoring trial provided the shortest time interval needed to detect 100% of TP CA125 incre ments in the simulation study [23]. A multicenter study by Rustin et al. reported on women with histologically confirmed epithelial ovarian, fallopian tube or serous primary peritoneal cancer in complete remission after firstline chemotherapy and baseline CA125 concen trations <35 U/ml [24]. When CA125 concentrations rose to ≥70 U/ml during the followup, the patients were randomized to early secondline chemotherapy or initiation of therapy at clinical or symptomatic relapse. They demonstrated a median lead time of 4.8 months; however, early treatment based on CA125 increments led to more chemotherapy, no difference in survival and worse quality of life. The median lead time in the current study among patients with CA125 increments starting from baseline CA125 concentrations <35 U/ ml cumulated from firstline chemotherapy and follow up was 1.0-1.7 months (Table 4A & B). It is difficult to argue that the short lead times observed in the cur future science group future science group Research Article Abu Hassan, Nielsen, Tuxen, Petersen & Sölétormos rent investigation would benefit the patients in terms of prolonged survival following early CA125guided therapy.
One of the limitations of the current study is that the investigation was not blinded; the CA125 data were available throughout the study period together with the results of clinical examinations and imag ing [25]. This may have influenced the length of the positive lead time because the clinicians had the opportunity to request earlier imaging based on CA125 increments and thereby shorten the poten tial lead time. Another weakness is the changed histological classification of EOC which now con sidered a heterogeneous disease with five different subtypes [35]. Most likely, the new classification sys tem will provide a different distribution of patients among the two groups, all tumors and serous tumors only, respectively, inflicting the presented results with some uncertainty.
A further weakness adheres to the clinical response evaluation, which was based on criteria of the WHO in use at the time of the present study [27]. In 2000, the WHO standards were replaced by a set of new guidelines to evaluate the RECIST [39]. Reevaluation of clinical response among the investigated patients according to the new standards may have some impact on the obtained results, but would hardly influence the overall impression of the validity of CA125 as a monitor of patients with EOC. Further limitations of the study could be due to the fact that firstline che motherapy and the subsequent followup period were not investigated individually because there were not enough patients entering the followup period allow ing a meaningful statistical analysis, and the perfor mance of each criterion was not investigated individ ually for each stage of disease due to a low number of patients within the individual subgroups.
Overall, the study supports a previous multicenter investigation suggesting that CA125 information can not stand alone but should be used in conjunction with other investigative procedures [24]. Performing sched uled CA125 testing to follow patients has a signifi cant cost. Therefore, it should be considered whether sustaining this cost is relevant without substantial benefit for the individual patient. In conclusion, the applied CA125 assessment criteria showed low sensi tivity (30-55%), low negative predictive value (28-46%), high positive predictive value (90-100%) and short median lead time (26-87 days) among several patients.

Conclusion & future perspective
There is a need for supplementary markers and alternative assessment criteria for patient surveillance.
The monitoring performance of the promising bio marker HE4 in combination with CA125 needs fur ther investigated among EOC patients undergoing firstline chemotherapy and during the subsequent followup period.
Evidently, identification of new biomarkers is impor tant, and the area is developing fast, in other words, circulating tumors cells, DNA and RNA fragments as well as epigenetic alterations [40,41].
Guidelines for conducting monitoring studies pro vided by the EGTM may be helpful when designing investigations of new serological markers for ovarian cancer [26,42]. conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.

Ethical conduct of research
The authors state that they have obtained appropriate institutional review board approval or have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations. In addition, for investigations involving human subjects, informed consent has been obtained from the participants involved.

Background
• The current Phase II study investigated the performance of seven CA125 criteria to monitor progressive ovarian cancer.

Material & method
• Four criteria were used to asses CA125 increments starting from concentrations ≥35 U/ml and three criteria to asses increments starting from concentrations <35 U/ml.

Results
• The performances of the CA125 criteria were similar with sensitivities of 30-55%, negative predictive values of 28-46%, positive predictive values of 90-100% and median lead times of 26-87 days.

Discussion
• The current study supports a previous multicenter investigation suggesting that CA125 information cannot stand alone but should be used in conjunction with other investigative procedures.