Abstract

We are writing in regard to an article published in the March 2007 issue of Health Physics by Eduard Hofer (Hofer 2007) on the interesting and important subject of dealing with subjective uncertainty in radiation dosimetry when dose estimates are applied to epidemiological studies, especially when a sequence of alternative dose estimates rather than a single "best estimate" of dose is provided for the epidemiological application. We have described several approaches to this problem (Stram and Kopecky 2003; Kopecky et al. 2004), and while these may not provide the last word on this problem we have an important concern regarding the proposed analysis of Hofer, namely the type I error (or "false positive" error) properties of the proposed analysis. Our approach (Stram and Kopecky 2003) and that of Hofer are similar in that both start with an assumption (often requiring a considerable leap of faith) that the dosimetry system used to estimate dose for each individual in the study can be regarded as providing estimates from a distribution of true dose conditional upon what is known about the determinants of the actual exposure. Specifically, both Hofer and we assume that the dosimetry system generates m independent sequences (or "replications") of dose estimates {xi,j} (i = 1,..., n, is the index for individuals in the study and j= 1,..., m for the sequence number) for the n subjects from the conditional distribution of true dose given all that is known about the parameters, source terms, individual data (excluding outcome data), etc., determining the true exposures. In our 2003 paper, we described some of the operating characteristics of treating the sequences in a manner analogous to what is done in the so called "Berkson error" problem (Thomas et al. 1993). Specifically, we described some statistical implications of using the mean, {zi}, of true dose given "all that is known" about true dose as the dose variable in a linear regression analysis relating disease to exposure. (We may estimate {zi} by averaging the m sequences, {xi,j}, over j, assuming that m is large enough so that the estimation of this mean is very accurate.) Thus, for example, we reject the null hypothesis of "no exposure effect" in this analysis only if standard statistical tests (ignoring dosimetry error) concluded that there was an association between the mean doses, zi, and the outcome of interest, Yi, with the appropriate degree of confidence. Hofer (2007) suggests a different test of the null hypothesis (this is most clear from the simulation experiment performed to compute power given in the Appendix of that paper), namely to use each replication {xi,j}, j = 1,..., m in turn in m separate regression analyses (regressing Yi on each sequence separately) so that a total of m tests are performed at a specific type I error rate (denoted as [alpha]). Then Hofer suggests (point 2 on page 233-234) rejecting the null hypothesis if more than 100[alpha] percent of these m separate regressions give a significant p-value (p < [alpha]). The problem with the proposed procedure is that it doesn't properly control the false positive rate, i.e., the type I error, [alpha], of the test. That is, the new procedure will reject a true null hypothesis more often than 100[alpha] percent of the time. To see this, consider the special case when both the null hypothesis is true, i.e., that disease, Yi, and true dose are independent of each other, and when xi,j and xi'j are also independent over the replications j. (The second assumption would hold when there is no information at all about true individual dose in the output from the dosimetry system.) In this case, a count of the number of times that the p-value is less than [alpha] (Rm, say) will be distributed as a binomial random variable with rate parameter [alpha] and m as the number of trials (since each sequence of x is independent and related only by chance to disease). As m increases to infinity the false positive rate of the procedure will therefore approach ½. (To see this, note that a false positive result from Hofer's proposed test corresponds to Rm > [alpha] × m, and that for sufficiently large m, Rm is approximately normally distributed with mean [alpha] × m.) For smaller m the actual false positive rate will still be considerably greater than the desired rate [alpha]. (For example, with [alpha] = 0.05 and m = 100 the expected false positive rate is 38.4 percent.) Dropping the assumption that xi,j and xi'j are independent over the replications j decreases the type I error of the proposed procedure, but it will remain inflated so long as these variables are not perfectly correlated (i.e., when there is no dosimetry error). Indeed, in the simulation experiment presented in the Appendix of Hofer (2007) we see that the false positive rate using [alpha] = 0.05, while not 38.4%, was 13%, far greater than the 5% required for a test to be valid, while the error using the Kopecky (2004) approach was 3% (not statistically different than the desired 5% given the number of simulations performed). The conclusion made at the end of the Appendix, that the proposed procedure is more powerful than that of Kopecky et al., cannot be trusted because it ignores the overwhelming evidence that the proposed procedure is anti-conservative under the null. In Stram and Kopecky (2003) we described several other candidate approaches to dealing with dose uncertainty, specifically when (as with the Hanford dosimetry) there are errors that are "shared" over many subjects. The method that is nearest in spirit to Hofer's proposal is Monte Carlo maximum likelihood (MCML). In this method the likelihood function itself is averaged over a large number of replications, m, and then maximized (with respect to its parameters) to find estimates and confidence intervals for the dose-response parameters of interest. This procedure, while computationally intensive, does show promise in dealing with errors in dosimetry systems that include both shared and unshared components. Further statistical work on this interesting and challenging problem is encouraged.