Biostatistics Department Theses and Dissertations

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 10 of 50
  • Item
    Innovative Bayesian Designs for Clinical Trials
    (2022-10) He, Tian; Zang, Yong; Liu, Hao; Bakoyannis, Giorgos; Zhao, Yi; Hasan, Mohammad
    Traditional clinical trial designs are generally based on the doctrine of studying one drug for one disease at a time, which may be slow and inefficient. With a high failure rate in drug development, there is a great need to speed up the process of drug development and minimize the cost. Novel trial designs have been proposed, such as the master protocol approach, which has expanded the trial design horizon to umbrella, basket, and platform trials. Compared to traditional clinical protocols, the master protocol enables investigators to evaluate multiple drugs and diverse disease populations simultaneously in a single protocol with the capacity to modify the protocol based on the observed trial data and new drugs. While many statistical methods for trial designs have been proposed for umbrella, basket, and platform trials in the literature, most of the designs are based on a binary or continuous endpoint. However, in the context of oncology trials, there is a great need to develop novel methods for survival endpoints. In this dissertation, we propose three novel Bayesian statistical methods for three distinctive trial design problems, respectively: 1) an optimal Bayesian design for platform trials with multiple endpoints; 2) a novel Bayesian design for basket trials with survival outcomes; 3) an adaptive Bayesian design for seamless phase II/III platform trials with survival endpoints. Extensive simulation studies are performed to evaluate the operating characteristics of the proposed designs under various scenarios.
  • Item
    Evaluation of a Participant Co-designed Lifestyle Change Program for Youth
    (2022-05) Alharbi, Basmah Saleh; Perkins, Susan M.; Hannon, Tamara S.; Daggy, Joanne K.
    Introduction: Increasing obesity in children leads to an increase in the risk of Type 2 diabetes (T2D). Therefore, it is important to promote healthier lifestyles in youths and encourage their caregivers(s) to provide a healthy lifestyle environment. The PowerHouse program focuses on improving food choices, increasing physical activity, and adopting behavior changes for the reduction of obesity and the prevention of T2D. Method: The aim of this study was to assess the effects of implementing the PowerHouse program on both clinical and quality of life outcomes in high-risk, low-income youth and their caregivers. Primary outcomes were BMI standard deviation and BMI percentile in youths. Secondary outcomes included physical activity of youths and quality of life for both youths and their caregivers. Attendance rates were also calculated. Linear effect mixed models were used to test for time effects for all outcomes. Results: Clinical outcomes did not improve over time, except for youth HbA1c (p-value = 0.0447). Some improvements in youth quality-of-life outcomes were noted: specifically, the Sports Index score of the Fels Physical Activity Questionnaire for Children (adjusted p-value = 0.0213) and the Physical Summary (p-value = 0.0407), Psychosocial Summary (p-value = 0.0167), and Total score (p-value = 0.0094) for the youth-reported Pediatric Quality of Life Inventory. Quality of life did not change over time for caregivers. For attendance, there was an improvement after the intervention was modified to improve access to fresh produce (p-value = 0.0002). Conclusion: HbA1c and quality of life improved over time for youth; however, there was not an improvement in caregiver outcomes over time. The data suggest that more time may be needed to see the full effects of the intervention, and/or that a booster intervention may be needed.
  • Item
    Spatial Transcriptomics Analysis Reveals Transcriptomic and Cellular Topology Associations in Breast and Prostate Cancers
    (2022-05) Alsaleh, Lujain; Johnson, Travis S.; Fadel, William; Tu, Wanzhu
    Background: Cancer is the leading cause of death worldwide and as a result is one of the most studied topics in public health. Breast cancer and prostate cancer are the most common cancers among women and men respectively. Gene expression and image features are independently prognostic of patient survival. However, it is sometimes difficult to discern how the molecular profile, e.g., gene expression, of given cells relate to their spatial layout, i.e., topology, in the tumor microenvironment (TME). However, with the advent of spatial transcriptomics (ST) and integrative bioinformatics analysis techniques, we are now able to better understand the TME of common cancers. Method: In this paper, we aim to determine the genes that are correlated with image topology features (ITFs) in common cancers which we denote topology associated genes (TAGs). To achieve this objective, we generate the correlation coefficient between genes and image features after identifying the optimal number of clusters for each of them. Applying this correlation matrix to heatmap using R package pheatmap to visualize the correlation between the two sets. The objective of this study is to identify common themes for the genes correlated with ITFs and we can pursue this using functional enrichment analysis. Moreover, we also find the similarity between gene clusters and some image features clusters using the ranking of correlation coefficient in order to identify, compare and contrast the TAGs across breast and prostate cancer ST slides. Result: The analysis shows that there are groups of gene ontology terms that are common within breast cancer, prostate cancer, and across both cancers. Notably, extracellular matrix (ECM) related terms appeared regularly in all ST slides. Conclusion: We identified TAGs in every ST slide regardless of cancer type. These TAGs were enriched for ontology terms that add context to the ITFs generated from ST cancer slides.
  • Item
    Association Between Tobacco Related Diagnoses and Alzheimer Disease: A population Study
    (2022-05) Almalki, Amwaj Ghazi; Zhang, Pengyue; Johnson, Travis; Fadel, William
    Background: Tobacco use is associated with an increased risk of developing Alzheimer's disease (AD). 14% of the incidence of AD is associated with various types of tobacco exposure. Additional real-world evidence is warranted to reveal the association between tobacco use and AD in age/gender-specific subpopulations. Method: In this thesis, the relationships between diagnoses related to tobacco use and diagnoses of AD in gender- and age-specific subgroups were investigated, using health information exchange data. The non-parametric Kaplan-Meier method was used to estimate the incidence of AD. Furthermore, the log-rank test was used to compare incidence between individuals with and without tobacco related diagnoses. In addition, we used semi-parametric Cox models to examine the association between tobacco related diagnoses and diagnoses of AD, while adjusting covariates. Results: Tobacco related diagnosis was associated with increased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 60-74 years (female hazard ratio [HR] =1.26, 95% confidence interval [CI]: 1.07 – 1.48, p-value = 0.005; and male HR =1.33, 95% CI: 1.10 - 1.62, p-value =0.004). Tobacco related diagnosis was associated with decreased risk of developing AD comparing to no tobacco related diagnosis among individuals aged 75-100 years (female HR =0.79, 95% CI: 0.70 - 0.89, p-value =0.001; and male HR =0.90, 95% CI: 0.82 - 0.99, p-value =0.023). Conclusion: Individuals with tobacco related diagnoses were associated with an increased risk of developing AD in older adults aged 60-75 years. Among older adults aged 75-100 years, individuals with tobacco related diagnoses were associated with a decreased risk of developing AD.
  • Item
    Applications of Time to Event Analysis in Clinical Data
    (2021-12) Xu, Chenjia; Gao, Sujuan; Liu, Hao; Zang, Yong; Zhang, Jianjun; Zhao, Yi
    Survival analysis has broad applications in diverse research areas. In this dissertation, we consider an innovative application of survival analysis approach to phase I dose-finding design and the modeling of multivariate survival data. In the first part of the dissertation, we apply time to event analysis in an innovative dose-finding design. To account for the unique feature of a new class of oncology drugs, T-cell engagers, we propose a phase I dose-finding method incorporating systematic intra-subject dose escalation. We utilize survival analysis approach to analyze intra-subject dose-escalation data and to identify the maximum tolerated dose. We evaluate the operating characteristics of the proposed design through simulation studies and compare it to existing methodologies. The second part of the dissertation focuses on multivariate survival data with semi-competing risks. Time-to-event data from the same subject are often correlated. In addition, semi-competing risks are sometimes present with correlated events when a terminal event can censor other non-terminal events but not vice versa. We use a semiparametric frailty model to account for the dependence between correlated survival events and semi-competing risks and adopt penalized partial likelihood (PPL) approach for parameter estimation. In addition, we investigate methods for variable selection in semi-parametric frailty models and propose a double penalized partial likelihood (DPPL) procedure for variable selection of fixed effects in frailty models. We consider two penalty functions, least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalty. The proposed methods are evaluated in simulation studies and illustrated using data from Indianapolis-Ibadan Dementia Project.
  • Item
    An Analysis of Survival Data when Hazards are not Proportional: Application to a Cancer Treatment Study
    (2021-12) White, John Benjamin; Yiannoutsos, Constantin; Bakoyannis, Giorgos; Fadel, William
    The crossing of Kaplan-Meier survival curves presents a challenge when conducting survival analysis studies, making it unclear whether any of the study groups involved present any significant difference in survival. An approach involving the determination of maximum vertical distance between the curves is considered here as a method to assess whether a survival advantage exists between different groups of patients. The method is illustrated on a dataset containing survival times of patients treated with two cancer treatment regimes, one involving treatment by chemotherapy alone, and the other by treatment with both chemotherapy and radiotherapy.
  • Item
    New Applications of Spline-Based Learning Algorithms
    (2021-10) Zhou, Junyi; Tu, Wanzhu; Zhang, Ying; Cao, Sha; Zhang, Chi; Bakoyannis, Giorgos
    Statistical learning methods are a ecting human society and our daily lives in unprecedented ways. Most of these learning methods are motivated by practical applications, and they in turn are being used to solve real-world problems. Although generally accepted principles exist for the development of learning methods, new models and algorithms tend to emerge not as a result of theoretical extensions but as a consequence of the scienti c, technological, and societal needs of the world. In view of application-motivated method development, two classes of statistical learning methods are described: One addressing the needs of precision medicine and the other exploring the underlying longitudinal data structure in an unsupervised manner. A common thread in the two methods is combining spline-based models with learning algorithms to improve analytical accuracy. The challenges in optimizing treatment for individual patients are rst addressed. Specifically, therapeutic optimization must be based on a good causal understanding of the treatment e ects. Furthermore, given the multiple treatment options available, recommendations must be consistent regardless of the reference treatment. To address the issue of inconsistent recommendations in a newer R-learner method, a simplex R-learning algorithm to help select the best treatment for individual patients is presented. The algorithm was tested, and the analytical results of the data from the Systolic Blood Pressure Intervention Trial (SPRINT) are presented. The proposed method provided recommendations consistent with the current clinical guidelines for hypertension treatment. The second part of this dissertation addresses the clustering of longitudinal data with sparse and irregular observations. Through simulation studies, the algorithm is demonstrated to have superior clustering accuracy and numerical e ciency to those of the existing methods. In addition, the algorithm can be easily extended to multiple-outcome longitudinal data with little additional computational cost, and is capable of detecting the correct number of clusters when extremely unbalanced cluster sizes exist. The algorithm was applied to a 12-year multi-site observational study (PREDICT-HD) to investigate the disease progression patterns of Huntington's disease (HD). Finally, an R package, ClusterLong, was developed to provide a tool for the public use of this algorithm. The tool was incorporated into an R Shiny application to allow users unfamiliar with R to access the method.
  • Item
    Bayesian Adaptive Dose-Finding Clinical Trial Designs with Late-Onset Outcomes
    (2021-07) Zhang, Yifei; Zhang, Yong; Song, Yiqing; Liu, Hao; Bakoyannis, Giorgos
    The late-onset outcome issue is common in early phase dose- nding clinical trials. This problem becomes more intractable in phase I/II clinical trials because both toxicity and e cacy responses are subject to the late-onset outcome issue. The existing methods applying for the phase I trials cannot be used directly for the phase I/II trial due to a lack of capability to model the joint toxicity{e cacy distribution. We propose a conditional weighted likelihood (CWL) method to circumvent this issue. The key idea of the CWL method is to decompose the joint probability into the product of marginal and conditional probabilities and then weight each probability based on each patient's actual follow-up time. We further extend the proposed method to handle more complex situations where the late-onset outcomes are competing risks or semicompeting risks outcomes. We treat the late-onset competing risks/semi-competing risks outcomes as missing data and develop a series of Bayesian data-augmentation methods to e ciently impute the missing data and draw the posterior samples of the parameters of interest. We also propose adaptive dose- nding algorithms to allocate patients and identify the optimal biological dose during the trial. Simulation studies show that the proposed methods yield desirable operating characteristics and outperform the existing methods.
  • Item
    Treatment Effect Estimation and Therapeutic Optimization Using Observational Data
    (2021-05) Li, Ruohong; Tu, Wanzhu; Wang, Honglang; Zhao, Yi; Huang, Kun; Hasan, Mohammad Al
    In this dissertation, I address two essential questions of modern therapeutics: (1) to quantify the e ects of pharmacological agents as functions of patient's clinical characteristics; (2) to optimize individual treatment regimen in the presence of multiple treatment options. To address the rst question, I proposed a uni ed framework for the estimation of heterogeneous treatment e ect (x), which is expressed as a function of the patient characteristics x. The proposed framework not only covers most of the existing advantage-learning methods in the literature, but also enhances the robustness of di erent learning methods against outliers by allowing the selection of appropriate loss functions. To cope with high-dimensionality in x, I incorporated into the method modern machine learning algorithms including random forests, gradient boosting machines, and neural networks, for a more scalable implementation. To facilitate the wider use of the developed methods, I developed an R package RCATE, which is now posted on Github for public access. For therapeutic optimization, I developed a treatment recommendation system using o ine reinforcement learning. O ine reinforcement learning is a type of machine learning method that enables an agent to learn an optimal policy in the absence of an interactive environment, such as those encountered in the analysis of therapeutics data. The recommendation system optimizes long-term reward, while accounting for the safety of treatment regimens. I tested the method using data from the Systolic Blood Pressure Trial (SPRINT), which included multiple years of follow-up data from thousands of patients on many di erent antihypertensive drugs. Using the SPRINT data, I developed a treatment recommendation system for antihypertensive therapies.
  • Item
    Modern Monte Carlo Methods and Their Application in Semiparametric Regression
    (2021-05) Thomas, Samuel Joseph; Tu, Wanzhu; Boukai, Ben; Li, Xiaochen; Song, Fengguang
    The essence of Bayesian data analysis is to ascertain posterior distributions. Posteriors generally do not have closed-form expressions for direct computation in practical applications. Analysts, therefore, resort to Markov Chain Monte Carlo (MCMC) methods for the generation of sample observations that approximate the desired posterior distribution. Standard MCMC methods simulate sample values from the desired posterior distribution via random proposals. As a result, the mechanism used to generate the proposals inevitably determines the efficiency of the algorithm. One of the modern MCMC techniques designed to explore the high-dimensional space more efficiently is Hamiltonian Monte Carlo (HMC), based on the Hamiltonian differential equations. Inspired by classical mechanics, these equations incorporate a latent variable to generate MCMC proposals that are likely to be accepted. This dissertation discusses how such a powerful computational approach can be used for implementing statistical models. Along this line, I created a unified computational procedure for using HMC to fit various types of statistical models. The procedure that I proposed can be applied to a broad class of models, including linear models, generalized linear models, mixed-effects models, and various types of semiparametric regression models. To facilitate the fitting of a diverse set of models, I incorporated new parameterization and decomposition schemes to ensure the numerical performance of Bayesian model fitting without sacrificing the procedure’s general applicability. As a concrete application, I demonstrate how to use the proposed procedure to fit a multivariate generalized additive model (GAM), a nonstandard statistical model with a complex covariance structure and numerous parameters. Byproducts of the research include two software packages that all practical data analysts to use the proposed computational method to fit their own models. The research’s main methodological contribution is the unified computational approach that it presents for Bayesian model fitting that can be used for standard and nonstandard statistical models. Availability of such a procedure has greatly enhanced statistical modelers’ toolbox for implementing new and nonstandard statistical models.