- Biostatistics and Health Data Science Works
Biostatistics and Health Data Science Works
Permanent URI for this collection
Works authored by scholars from the Department of Biostatistics and Health Data Science, a dual department of the Richard M. Fairbanks School of Public Health and the IU School of Medicine.
Browse
Recent Submissions
Item A clinical trial method to show delay of onset in Huntington disease(Wiley, 2019-02) Paulsen, Jane S.; Lourens, Spencer; Kieburtz, Karl; Zhang, Ying; Biostatistics, School of Public HealthBackground: Disease-modifying clinical trials in persons without symptoms are often limited in methods to assess the impact associated with experimental therapeutics. This study suggests sample enrichment approaches to facilitate preventive trials to delay disease onset in individuals with the dominant gene for Huntington disease. Methods: Using published onset prediction indexes, we conducted the receiver operating curve analysis for diagnosis within a 3-year clinical trial time frame. We determined optimal cut points on the indexes for participant recruitment and then conducted sample size and power calculations to detect varying effect sizes for treatment efficacy in reducing 3-year rates of disease onset (or diagnosis). Results: Area under the curve for 3 onset prediction indexes all demonstrated excellent value in sample enrichment methodology, with the best-performing index being the multivariate risk score (MRS). Conclusions: This study showed that conducting an intervention trial in premanifest and prodromal individuals with the gene expansion for Huntington disease is highly feasible using sample enrichment recruitment methods. Ongoing natural history studies are highly likely to indicate additional markers of disease prior to diagnosis. Statistical modeling of identified markers can facilitate participant enrichment to increase the likelihood of detecting a difference between treatment arms in a cost-effective and efficient manner. Such variations may expedite translation of emerging therapies to persons in an earlier phase of the disease.Item Factors Associated with Comprehensive Medication Review Completion Rates: A National Survey of Community Pharmacists(Elsevier, 2020-05) Snyder, Margie E.; Jaynes, Heather A.; Gernant, Stephanie A.; Lantaff, Wendy M.; Doucette, William R.; Suchanek Hudmon, Karen; Perkins, Susan M.; Biostatistics, School of Public HealthBackground: Completion rates for medication therapy management (MTM) services have been lower than desired and the Centers for Medicare and Medicaid Services has added MTM comprehensive medication review (CMR) completion rates as a Part D plan star measure. Over half of plans utilize community pharmacists via contracts with MTM vendors. Objectives: The primary objective of this survey study was to identify factors associated with the CMR completion rates of community pharmacies contracted with a national MTM vendor. Methodsl: Representatives from 27,560 pharmacy locations contracted with a national MTM vendor were surveyed. The dependent variable of interest was the pharmacies' CMR completion rate. Independent variables included the pharmacy's progressiveness stratum and number of CMRs assigned by the MTM vendor during the time period, as well as self-reported data to characterize MTM facilitators, barriers, delivery strategies, staffing, selected items from a modified Assessment of Chronic Illness Care, and pharmacist/pharmacy demographics. Univariate negative binomial models were fit for each independent variable, and variables significant at p < 0.05 were entered into a multivariable model. Results: Representatives from 3836 (13.9%) pharmacy locations responded; of these, 90.9% (n = 3486) responses were useable. The median CMR completion rate was 0.42. Variables remaining significant at p < 0.05 in the multivariable model included: progressiveness strata; pharmacy type; scores on the facilitators scale; responses to two potential barriers items; scores on the patient/caregiver delivery strategies sub-scale; providing MTM at multiple locations; reporting that the MTM vendor sending the survey link is the primary MTM vendor for which the respondent provides MTM; and the number of hours per week that the pharmacy is open. Conclusions: Factors at the respondent (e.g., responses to facilitators scale) and pharmacy (e.g., pharmacy type) levels were associated with CMR completion rates. These findings could be used by MTM stakeholders to improve CMR completion rates.Item Spatially and Robustly Hybrid Mixture Regression Model for Inference of Spatial Dependence(IEEE, 2021) Chang, Wennan; Dang, Pengdao; Wan, Changlin; Lu, Xiaoyu; Fang, Yue; Zhao, Tong; Zang, Yong; Li, Bo; Zhang, Chi; Cao, Sha; Biostatistics, School of Public HealthIn this paper, we propose a Spatial Robust Mixture Regression model to investigate the relationship between a response variable and a set of explanatory variables over the spatial domain, assuming that the relationships may exhibit complex spatially dynamic patterns that cannot be captured by constant regression coefficients. Our method integrates the robust finite mixture Gaussian regression model with spatial constraints, to simultaneously handle the spatial non-stationarity, local homogeneity, and outlier contaminations. Compared with existing spatial regression models, our proposed model assumes the existence a few distinct regression models that are estimated based on observations that exhibit similar response-predictor relationships. As such, the proposed model not only accounts for non-stationarity in the spatial trend, but also clusters observations into a few distinct and homogenous groups. This provides an advantage on interpretation with a few stationary sub-processes identified that capture the predominant relationships between response and predictor variables. Moreover, the proposed method incorporates robust procedures to handle contaminations from both regression outliers and spatial outliers. By doing so, we robustly segment the spatial domain into distinct local regions with similar regression coefficients, and sporadic locations that are purely outliers. Rigorous statistical hypothesis testing procedure has been designed to test the significance of such segmentation. Experimental results on many synthetic and real-world datasets demonstrate the robustness, accuracy, and effectiveness of our proposed method, compared with other robust finite mixture regression, spatial regression and spatial segmentation methods.Item Developmental trajectory of subtle motor signs in attention-deficit/hyperactivity disorder: a longitudinal study from childhood to adolescence(Taylor & Francis, 2021-04) Crasta, Jewel E.; Zhao, Yi; Seymour, Karen E.; Suskauer, Stacy J.; Mostofsky, Stewart H.; Rosch, Keri S.; Biostatistics, School of Public HealthThis study examined the developmental trajectory of neurodevelopmental motor signs among boys and girls with attention-deficit/hyperactivity disorder (ADHD) and typically-developing (TD) children. Seventy children with ADHD and 48 TD children, aged 8–17 years, were evaluated on at least two time-points using the Physical and Neurological Assessment of Subtle Signs (PANESS). Age-related changes in subtle motor signs (overflow, dysrhythmia, speed) were modeled using linear mixed-effects models to compare the developmental trajectories among four subgroups (ADHD girls and boys and TD girls and boys). Across visits, both boys and girls with ADHD showed greater overflow, dysrhythmia, and slower speed on repetitive motor tasks compared to TD peers; whereas, only girls with ADHD were slower on sequential motor tasks than TD girls. Developmental trajectory analyses revealed a greater reduction in overflow with age among boys with ADHD than TD boys; whereas, trajectories did not differ among girls with and without ADHD, or among boys and girls with ADHD. For dysrhythmia and speed, there were no trajectory differences between the subgroups, with all groups showing similar reductions with age. Children with ADHD show developmental trajectories of subtle motor signs that are consistent with those of TD children, with one clear exception: Boys with ADHD show more significant reductions in overflow from childhood to adolescence than do their TD peers. Our findings affirm the presence of subtle motor signs in children with ADHD and suggest that some of these signs, particularly motor overflow in boys, resolve through adolescence while dysrhythmia and slow speed, may persist.Item CMAX3: A Robust Statistical Test for Genetic Association Accounting for Covariates(MDPI, 2021) Chen, Zhongxue; Zang, Yong; Biostatistics, School of Public HealthThe additive genetic model as implemented in logistic regression has been widely used in genome-wide association studies (GWASs) for binary outcomes. Unfortunately, for many complex diseases, the underlying genetic models are generally unknown and a mis-specification of the genetic model can result in a substantial loss of power. To address this issue, the MAX3 test (the maximum of three separate test statistics) has been proposed as a robust test that performs plausibly regardless of the underlying genetic model. However, the original implementation of MAX3 utilizes the trend test so it cannot adjust for any covariates such as age and gender. This drawback has significantly limited the application of the MAX3 in GWASs, as covariates account for a considerable amount of variability in these disorders. In this paper, we extended the MAX3 and proposed the CMAX3 (covariate-adjusted MAX3) based on logistic regression. The proposed test yielded a similar robust efficiency as the original MAX3 while easily adjusting for any covariate based on the likelihood framework. The asymptotic formula to calculate the p-value of the proposed test was also developed in this paper. The simulation results showed that the proposed test performed desirably under both the null and alternative hypotheses. For the purpose of illustration, we applied the proposed test to re-analyze a case-control GWAS dataset from the Collaborative Studies on Genetics of Alcoholism (COGA). The R code to implement the proposed test is also introduced in this paper and is available for free downloadItem Cisplatin +/− rucaparib after preoperative chemotherapy in patients with triple-negative or BRCA mutated breast cancer(Springer Nature, 2021-03-22) Kalra, Maitri; Tong, Yan; Jones, David R.; Walsh, Tom; Danso, Michael A.; Ma, Cynthia X.; Silverman, Paula; King, Mary-Claire; Badve, Sunil S.; Perkins, Susan M.; Miller, Kathy D.; Biostatistics, School of Public HealthPatients with triple-negative breast cancer (TNBC) who have residual disease after neoadjuvant therapy have a high risk of recurrence. We tested the impact of DNA-damaging chemotherapy alone or with PARP inhibition in this high-risk population. Patients with TNBC or deleterious BRCA mutation (TNBC/BRCAmut) who had >2 cm of invasive disease in the breast or persistent lymph node (LN) involvement after neoadjuvant therapy were assigned 1:1 to cisplatin alone or with rucaparib. Germline mutations were identified with BROCA analysis. The primary endpoint was 2-year disease-free survival (DFS) with 80% power to detect an HR 0.5. From Feb 2010 to May 2013, 128 patients were enrolled. Median tumor size at surgery was 1.9 cm (0-11.5 cm) with 1 (0-38) involved LN; median Residual Cancer Burden (RCB) score was 2.6. Six patients had known deleterious BRCA1 or BRCA2 mutations at study entry, but BROCA identified deleterious mutations in 22% of patients with available samples. Toxicity was similar in both arms. Despite frequent dose reductions (21% of patients) and delays (43.8% of patients), 73% of patients completed planned cisplatin. Rucaparib exposure was limited with median concentration 275 (82-4694) ng/mL post-infusion on day 3. The addition of rucaparib to cisplatin did not increase 2-year DFS (54.2% cisplatin vs. 64.1% cisplatin + rucaparib; P = 0.29). In the high-risk post preoperative TNBC/BRCAmut setting, the addition of low-dose rucaparib did not improve 2-year DFS or increase the toxicity of cisplatin. Genetic testing was underutilized in this high-risk population.Item Adaptive empirical pattern transformation (ADEPT) with application to walking stride segmentation(Oxford University Press, 2021-04-10) Karas, Marta; Czkiewicz, Marcin Stra; Fadel, William; Harezlak, Jaroslaw; Crainiceanu, Ciprian M.; Urbanek, Jacek K.; Biostatistics, School of Public HealthQuantifying gait parameters and ambulatory monitoring of changes in these parameters have become increasingly important in epidemiological and clinical studies. Using high-density accelerometry measurements, we propose adaptive empirical pattern transformation (ADEPT), a fast, scalable, and accurate method for segmentation of individual walking strides. ADEPT computes the covariance between a scaled and translated pattern function and the data, an idea similar to the continuous wavelet transform. The difference is that ADEPT uses a data-based pattern function, allows multiple pattern functions, can use other distances instead of the covariance, and the pattern function is not required to satisfy the wavelet admissibility condition. Compared to many existing approaches, ADEPT is designed to work with data collected at various body locations and is invariant to the direction of accelerometer axes relative to body orientation. The method is applied to and validated on accelerometry data collected during a equation M1-m outdoor walk of equation M2 study participants wearing accelerometers on the wrist, hip, and both ankles. Additionally, all scripts and data needed to reproduce presented results are included in supplementary material available at Biostatistics online.Item Multi-Omics Analysis of Brain Metastasis Outcomes Following Craniotomy(Frontiers Media, 2021-04-06) Su, Jing; Song, Qianqian; Qasem, Shadi; O’Neill, Stacey; Lee, Jingyun; Furdui, Cristina M.; Pasche, Boris; Metheny-Barlow, Linda; Masters, Adrianna H.; Lo, Hui-Wen; Xing, Fei; Watabe, Kounosuke; Miller, Lance D.; Tatter, Stephen B.; Laxton, Adrian W.; Whitlow, Christopher T.; Chan, Michael D.; Soike, Michael H.; Ruiz, Jimmy; Biostatistics, School of Public HealthBackground: The incidence of brain metastasis continues to increase as therapeutic strategies have improved for a number of solid tumors. The presence of brain metastasis is associated with worse prognosis but it is unclear if distinctive biomarkers can separate patients at risk for CNS related death. Methods: We executed a single institution retrospective collection of brain metastasis from patients who were diagnosed with lung, breast, and other primary tumors. The brain metastatic samples were sent for RNA sequencing, proteomic and metabolomic analysis of brain metastasis. The primary outcome was distant brain failure after definitive therapies that included craniotomy resection and radiation to surgical bed. Novel prognostic subtypes were discovered using transcriptomic data and sparse non-negative matrix factorization. Results: We discovered two molecular subtypes showing statistically significant differential prognosis irrespective of tumor subtype. The median survival time of the good and the poor prognostic subtypes were 7.89 and 42.27 months, respectively. Further integrated characterization and analysis of these two distinctive prognostic subtypes using transcriptomic, proteomic, and metabolomic molecular profiles of patients identified key pathways and metabolites. The analysis suggested that immune microenvironment landscape as well as proliferation and migration signaling pathways may be responsible to the observed survival difference. Conclusion: A multi-omics approach to characterization of brain metastasis provides an opportunity to identify clinically impactful biomarkers and associated prognostic subtypes and generate provocative integrative understanding of disease.Item Combining non-probability and probability survey samples through mass imputation(Wiley, 2021-07) Kim, Jae Kwang; Park, Seho; Chen, Yilin; Wu, Changbao; Biostatistics, School of Public HealthAnalysis of non-probability survey samples requires auxiliary information at the population level. Such information may also be obtained from an existing probability survey sample from the same finite population. Mass imputation has been used in practice for combining non-probability and probability survey samples and making inferences on the parameters of interest using the information collected only in the non-probability sample for the study variables. Under the assumption that the conditional mean function from the non-probability sample can be transported to the probability sample, we establish the consistency of the mass imputation estimator and derive its asymptotic variance formula. Variance estimators are developed using either linearization or bootstrap. Finite sample performances of the mass imputation estimator are investigated through simulation studies. We also address important practical issues of the method through the analysis of a real-world non-probability survey sample collected by the Pew Research Centre.Item Ten-eleven translocation protein 1 modulates medulloblastoma progression(BMC, 2021-04-29) Kim, Hyerim; Kang, Yunhee; Li, Yujing; Chen, Li; Lin, Li; Johnson, Nicholas D.; Zhu, Dan; Robinson, M. Hope; McSwain, Leon; Barwick, Benjamin G.; Yuan, Xianrui; Liao, Xinbin; Zhao, Jie; Zhang, Zhiping; Shu, Qiang; Chen, Jianjun; Allen, Emily G.; Kenney, Anna M.; Castellino, Robert C.; Van Meir, Erwin G.; Conneely, Karen N.; Vertino, Paula M.; Jin, Peng; Li, Jian; Biostatistics, School of Public HealthBackground: Medulloblastoma (MB) is the most common malignant pediatric brain tumor that originates in the cerebellum and brainstem. Frequent somatic mutations and deregulated expression of epigenetic regulators in MB highlight the substantial role of epigenetic alterations. 5-hydroxymethylcytosine (5hmC) is a highly abundant cytosine modification in the developing cerebellum and is regulated by ten-eleven translocation (TET) enzymes. Results: We investigate the alterations of 5hmC and TET enzymes in MB and their significance to cerebellar cancer formation. We show total abundance of 5hmC is reduced in MB, but identify significant enrichment of MB-specific 5hmC marks at regulatory regions of genes implicated in stem-like properties and Nanog-binding motifs. While TET1 and TET2 levels are high in MBs, only knockout of Tet1 in the smoothened (SmoA1) mouse model attenuates uncontrolled proliferation, leading to a favorable prognosis. The pharmacological Tet1 inhibition reduces cell viability and platelet-derived growth factor signaling pathway-associated genes. Conclusions: These results together suggest a potential key role of 5hmC and indicate an oncogenic nature for TET1 in MB tumorigenesis, suggesting it as a potential therapeutic target for MBs.