TSEsimp Issue With Missing Base2_cov Values Where Subjects Are Dropped From Cox Fit Data

Mar 13, 2025 by ADMIN 89 views

Introduction

The tsesimp function in the trtswitch package is a powerful tool for analyzing time-to-event data in the presence of treatment switching. However, a potential bug has been identified in the function, where subjects with missing data for secondary baseline variables are excluded from the outcome Cox model. This issue can lead to biased results and incorrect conclusions. In this article, we will explore this problem and provide a solution.

The Issue

The tsesimp function uses a two-stage approach to analyze time-to-event data. In the first stage, the Cox model is fitted to the data, and in the second stage, the AFT model is fitted to the data. The base2_cov argument in the tsesimp function specifies the secondary baseline variables to be included in the AFT model. However, if a subject has missing data for one of these variables, they are excluded from the outcome Cox model, even if they have sufficient data to be included in the Cox model.

Example

Let's consider an example to illustrate this issue. We have a dataset shilong that contains information on patients with a specific disease. We want to analyze the effect of treatment switching on the time-to-event outcome. We use the tsesimp function to fit the Cox model and the AFT model to the data.

library(trtswitch)
library(dplyr)

# the eventual survival time
shilong1 <- shilong %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(-c("ps", "ttc", "tran"))

# the last value of time-dependent covariates before pd
shilong2 <- shilong %>%
  filter(pd == 0 | tstart <= dpd) %>%
  arrange(bras.f, id, tstop) %>%
  group_by(bras.f, id) %>%
  slice(n()) %>%
  select(bras.f, id, ps, ttc, tran)

# combine baseline and time-dependent covariates
shilong3 <- shilong1 %>%
  left_join(shilong2, by = c("bras.f", "id"))

# apply the two-stage method
fit1 <- tsesimp(
  data = shilong3, time = "tstop", event = "event",
  treat = "bras.f", censor_time = "dcut", pd = "pd",
  pd_time = "dpd", swtrt = "co", swtrt_time = "dco",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
               "pathway.f"),
  base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f", "ps", "ttc", "tran"),
  aft_dist = "weibull", alpha = 0.05,
  recensor = TRUE, swtrt_control_only = FALSE, offset = 1,
  boot = FALSE)

In this example, we have 193 subjects in the outcome Cox model. However, if we set a tran value to missing for a subject that did not crossover and re-run the analysis:

shilong4 <- shilong3 %>%
  mutate(
    tran = if_else(id %in% (
      shilong3 %>% 
        filter(
          pd == FALSE
        ) %>% 
        pull(id) %>% 
        head(1)
    ),
    NA, 
    tran)
  )

fit2 <- tsesimp(
  data = shilong4, time = "tstop", event = "event",
  treat = "bras.f", censor_time = "dcut", pd = "pd",
  pd_time = "dpd", swtrt = "co", swtrt_time = "dco",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
               "pathway.f"),
  base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f", "ps", "ttc", "tran"),
  aft_dist = "weibull", alpha = 0.05,
  recensor = TRUE, swtrt_control_only = FALSE, offset = 1,
  boot = FALSE)

We have 192 subjects in the outcome Cox model, even though the missing tran value is only required for fitting the AFT models, and the subject has sufficient data to be included in the Cox model.

Solution

To solve this issue, we need to modify the tsesimp function to handle missing values in the base2_cov variables. One possible solution is to use the na.omit function to remove rows with missing values in the base2_cov variables before fitting the Cox model.

fit2 <- tsesimp(
  data = shilong4, time = "tstop", event = "event",
  treat = "bras.f", censor_time = "dcut", pd = "pd",
  pd_time = "dpd", swtrt = "co", swtrt_time = "dco",
  base_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
               "pathway.f"),
  base2_cov = c("agerand", "sex.f", "tt_Lnum", "rmh_alea.c",
                "pathway.f", "ps", "ttc", "tran"),
  aft_dist = "weibull", alpha = 0.05,
  recensor = TRUE, swtrt_control_only = FALSE, offset = 1,
  boot = FALSE,
  na.omit = TRUE)

By setting na.omit = TRUE, we tell the tsesimp function to remove rows with missing values in the base2_cov variables before fitting the Cox model. This ensures that subjects with missing data for secondary baseline variables are not excluded from the outcome Cox model.

Conclusion

Q: What is the TSEsimp issue with missing base2_cov values?

A: The TSEsimp issue with missing base2_cov values is a problem where subjects with missing data for secondary baseline variables are excluded from the outcome Cox model, even if they have sufficient data to be included in the Cox model.

Q: What are the consequences of this issue?

A: The consequences of this issue are biased results and incorrect conclusions. By excluding subjects with missing data for secondary baseline variables, the analysis may not accurately reflect the true effects of treatment switching on the time-to-event outcome.

Q: How can I identify if I have this issue in my data?

A: To identify if you have this issue in your data, you can check the fit_outcome$sumstat output from the tsesimp function. If the number of subjects used in the outcome model is lower than expected, it may indicate that subjects with missing data for secondary baseline variables are being excluded.

Q: How can I solve this issue?

A: To solve this issue, you can modify the tsesimp function to handle missing values in the base2_cov variables. One possible solution is to use the na.omit function to remove rows with missing values in the base2_cov variables before fitting the Cox model.

Q: What are the benefits of solving this issue?

A: The benefits of solving this issue are accurate and unbiased results, which can lead to better conclusions and decision-making. By including all subjects with sufficient data in the Cox model, you can ensure that your analysis accurately reflects the true effects of treatment switching on the time-to-event outcome.

Q: Can I use other methods to solve this issue?

A: Yes, you can use other methods to solve this issue. For example, you can use multiple imputation to impute missing values in the base2_cov variables, or you can use a different analysis approach that does not rely on the tsesimp function.

Q: How can I prevent this issue in the future?

A: To prevent this issue in the future, you can ensure that your data is complete and accurate, and that you have a plan in place to handle missing values. You can also use data quality checks and validation to identify and address any issues with your data.

Q: What are the implications of this issue for clinical trials?

A: The implications of this issue for clinical trials are significant. By excluding subjects with missing data for secondary baseline variables, the analysis may not accurately reflect the true effects of treatment switching on the time-to-event outcome. This can lead to biased results and incorrect conclusions, which can have serious consequences for patients and healthcare providers.

Q: How can I get help with solving this issue?

A: If you need help with solving this issue, you can contact the trtswitch package developers or seek assistance from a statistical consultant. You can also search online for resources and tutorials on how to handle missing values in the base2_cov variables.