Nathalie Støer: Reuse of controls from nested-case control studies in cancer research
To investigate how potential risk factors or protective factors influence an event of interest, for instance incidence or death of a disease, large groups of people (cohorts) are often studied. However, it can be difficult or expensive to collect all information needed from large cohorts, and an alternative is to only use a subset. Intuitively, the relatively few subjects who experience the event of interest are the most informative, and usually all of them are chosen. However, we need healthy subjects to compare with (controls), and one, or a few, still healthy subjects are sampled each time a subject experience the event of interest. When the data collection is done, the statistical analysis is carried out on all the cases and and their sampled controls. This type of studies are referred to as nested case-control (NCC) studies.
As an example, we want to investigate how the amount of vitamin D in the blood (risk/protective factor) affect the risk of prostate cancer (event of interest). Sampling and analyzing blood is both time consuming and expensive and therefore a nested case-control study with one healthy subject per prostate cancer case was carried out.
We say that the cases and controls are matched on time in the NCC design since we sample the controls at the event time of the case. Traditionally, it has been considered that the controls only can be used as controls for the case it is sampled for and nobody else, i.e. a case can only be compared to its own control. However, one is sometimes interested in more than one type of event, for instance incidence and death of prostate cancer. When looking at death of prostate cancer, only the controls for the cases that actually died can be used, even though all information needed is also available for the controls for the cases that only got prostate cancer but did not die from it.
My Ph.D. project is about evaluating another way of analyzing these type of data which allow for the controls to be used for all types of events. This method involves calculating the probability of actually being sampled as a control. The evaluation includes simulations and analysis of real data. By using simulations we use the computer to generate a number of data sets that in a simplistic way mimic real data. These data sets are generated in such a way that we actually know what the true answer is, i.e the true association between prostate cancer and vitamin D, and we can then compare the truth with the answer from the alternative method and the traditional method. The two most important things to compare are bias and uncertainty. Bias is the difference between the truth and the result from the estimation method, and uncertainty is a measure of how certain we are about the the result from the analysis. The best method is then the method with smallest bias and uncertainty. We also do evaluations on real data sets, for instance data about prostate cancer and vitamin D. In that case we do not know the “truth” but can still compare with the traditional method.