- Department of Computer and Information Science Works
Department of Computer and Information Science Works
Permanent URI for this collection
Browse
Recent Submissions
Item ASPER: Attention-based Approach to Extract Syntactic Patterns denoting Semantic Relations in Sentential Context(2021) Md. Ahsanul, Kabir; Typer, Philips; Xiao, Luo; Mohammed, Al Hasan; Computer and Information Science, School of ScienceSemantic relationships, such as hyponym-hypernym, cause-effect, meronym-holonym etc., between a pair of entities in a sentence are usually reflected through syntactic patterns. Automatic extraction of such patterns benefits several downstream tasks, including, entity extraction, ontology building, and question answering. Unfortunately, automatic extraction of such patterns has not yet received much attention from NLP and information retrieval researchers. In this work, we propose an attentionbased supervised deep learning model, ASPER, which extracts syntactic patterns between entities exhibiting a given semantic relation in the sentential context. We validate the performance of ASPER on three distinct semantic relations—hyponym-hypernym, cause-effect, and meronym-holonym on six datasets. Experimental results show that for all these semantic relations, ASPER can automatically identify a collection of syntactic patterns reflecting the existence of such a relation between a pair of entities in a sentence. In comparison to the existing methodologies of syntactic pattern extraction, ASPER’s performance is substantially superior.Item Energy-Efficient Device Selection in Federated Edge Learning(IEEE, 2021-07) Peng, Cheng; Hu, Qin; Chen, Jianan; Kang, Kyubyung; Li, Feng; Zou, Xukai; Computer and Information Science, School of ScienceDue to the increasing demand from mobile devices for the real-time response of cloud computing services, federated edge learning (FEL) emerges as a new computing paradigm, which utilizes edge devices to achieve efficient machine learning while protecting their data privacy. Implementing efficient FEL suffers from the challenges of devices’ limited computing and communication resources, as well as unevenly distributed datasets, which inspires several existing research focusing on device selection to optimize time consumption and data diversity. However, these studies fail to consider the energy consumption of edge devices given their limited power supply, which can seriously affect the cost-efficiency of FEL with unexpected device dropouts. To fill this gap, we propose a device selection model capturing both energy consumption and data diversity optimization, under the constraints of time consumption and training data amount. Then we solve the optimization problem by reformulating the original model and designing a novel algorithm, named E2DS, to reduce the time complexity greatly. By comparing with two classical FEL schemes, we validate the superiority of our proposed device selection mechanism for FEL with extensive experimental results.Item Privacy-Aware Data Trading(IEEE Xplore, 2021-07) Wang, Shengling; Shi, Lina; Hu, Qin; Zhang, Junshan; Cheng, Xiuzhen; Yu, Jiguo; Computer and Information Science, School of ScienceThe growing threat of personal data breach in data trading pinpoints an urgent need to develop countermeasures for preserving individual privacy. The state-of-the-art work either endows the data collector with the responsibility of data privacy or reports only a privacy-preserving version of the data. The basic assumption of the former approach that the data collector is trustworthy does not always hold true in reality, whereas the latter approach reduces the value of data. In this paper, we investigate the privacy leakage issue from the root source. Specifically, we take a fresh look to reverse the inferior position of the data provider by making her dominate the game with the collector to solve the dilemma in data trading. To that aim, we propose the noisy-sequentially zero-determinant (NSZD) strategies by tailoring the classical zero-determinant strategies, originally designed for the simultaneous-move game, to adapt to the noisy sequential game. NSZD strategies can empower the data provider to unilaterally set the expected payoff of the data collector or enforce a positive relationship between her and the data collector's expected payoffs. Both strategies can stimulate a rational data collector to behave honestly, boosting a healthy data trading market. Numerical simulations are used to examine the impacts of key parameters and the feasible region where the data provider can be an NSZD player. Finally, we prove that the data collector cannot employ NSZD to further dominate the data market for deteriorating privacy leakage.Item Hippocampal Subregion and Gene Detection in Alzheimer’s Disease Based on Genetic Clustering Random Forest(MDPI, 2021-05-01) Li, Jin; Liu, Wenjie; Cao, Luolong; Luo, Haoran; Xu, Siwen; Bao, Peihua; Meng, Xianglian; Liang, Hong; Fang, Shiaofen; Computer and Information Science, School of ScienceThe distinguishable subregions that compose the hippocampus are differently involved in functions associated with Alzheimer's disease (AD). Thus, the identification of hippocampal subregions and genes that classify AD and healthy control (HC) groups with high accuracy is meaningful. In this study, by jointly analyzing the multimodal data, we propose a novel method to construct fusion features and a classification method based on the random forest for identifying the important features. Specifically, we construct the fusion features using the gene sequence and subregions correlation to reduce the diversity in same group. Moreover, samples and features are selected randomly to construct a random forest, and genetic algorithm and clustering evolutionary are used to amplify the difference in initial decision trees and evolve the trees. The features in resulting decision trees that reach the peak classification are the important "subregion gene pairs". The findings verify that our method outperforms well in classification performance and generalization. Particularly, we identified some significant subregions and genes, such as hippocampus amygdala transition area (HATA), fimbria, parasubiculum and genes included RYR3 and PRKCE. These discoveries provide some new candidate genes for AD and demonstrate the contribution of hippocampal subregions and genes to AD.Item Advances in Mobile Communications and Computing(Hindawi, 2009) Durresi, Arjan; Denko, Mieso; Computer and Information Science, School of ScienceItem COVID CV: A System for Creating Holistic Academic CVs during a Global Pandemic(IEEE, 2021-05) Raja, Umesh; Chowdhury, Nahida Sultana; Raje, Rajeev R.; Wheeler, Rachel; Williams, Jane; Ganci, Aaron; Computer and Information Science, School of ScienceThe effects of the Covid pandemic have been, similar to the population at-large, unequal on academicians - some groups have been more susceptible than others. Traditional CVs are inadequate to highlight these imbalances. CovidCV is a framework for academicians that allows them to document their life in a holistic way during the pandemic. It creates a color-coded CV from the user's data entries documenting the work and home life and categorizing corresponding events as good or bad. It, thus, provides a visual representation of an academician's life during the current pandemic. The user can mark any event as major or minor indicating the impact of the event on their life. The CovidCV prototypical system is developed using a three tier architecture. The first tier, the front-end, is a user interface layer that is a web application. This prototype has a back-end layer consisting of two tiers which are responsible for handling the business logic and the data management respectively. The CovidCV system design is described in this paper. A preliminary experimentation with the prototype highlights the usefulness of CovidCV.Item Genome-wide variant-based study of genetic effects with the largest neuroanatomic coverage(BMC, 2021-04-30) Li, Jin; Liu, Wenjie; Li, Huang; Chen, Feng; Luo, Haoran; Bao, Peihua; Li, Yanzhao; Jiang, Hailong; Gao, Yue; Liang, Hong; Fang, Shiaofen; Computer and Information Science, School of ScienceBackground: Brain image genetics provides enormous opportunities for examining the effects of genetic variations on the brain. Many studies have shown that the structure, function, and abnormality (e.g., those related to Alzheimer's disease) of the brain are heritable. However, which genetic variations contribute to these phenotypic changes is not completely clear. Advances in neuroimaging and genetics have led us to obtain detailed brain anatomy and genome-wide information. These data offer us new opportunities to identify genetic variations such as single nucleotide polymorphisms (SNPs) that affect brain structure. In this paper, we perform a genome-wide variant-based study, and aim to identify top SNPs or SNP sets which have genetic effects with the largest neuroanotomic coverage at both voxel and region-of-interest (ROI) levels. Based on the voxelwise genome-wide association study (GWAS) results, we used the exhaustive search to find the top SNPs or SNP sets that have the largest voxel-based or ROI-based neuroanatomic coverage. For SNP sets with >2 SNPs, we proposed an efficient genetic algorithm to identify top SNP sets that can cover all ROIs or a specific ROI. Results: We identified an ensemble of top SNPs, SNP-pairs and SNP-sets, whose effects have the largest neuroanatomic coverage. Experimental results on real imaging genetics data show that the proposed genetic algorithm is superior to the exhaustive search in terms of computational time for identifying top SNP-sets. Conclusions: We proposed and applied an informatics strategy to identify top SNPs, SNP-pairs and SNP-sets that have genetic effects with the largest neuroanatomic coverage. The proposed genetic algorithm offers an efficient solution to accomplish the task, especially for identifying top SNP-sets.Item A modified two-process Knox test for investigating the relationship between law enforcement opioid seizures and overdoses(The Royal Society, 2021-06) Mohler, G.; Mishra, S.; Ray, B.; Magee, L.; Huynh, P.; Canada, M.; O’Donnell, D.; Flaxman, S.; Computer and Information Science, School of ScienceRecent research has shown an association between monthly law enforcement drug seizure events and accidental drug overdose deaths using cross-sectional data in a single state, whereby increased seizures correlated with more deaths. In this study, we conduct statistical analysis of street-level data on law enforcement drug seizures, along with street-level data on fatal and non-fatal overdose events, to determine possible micro-level causal associations between opioid-related drug seizures and overdoses. For this purpose, we introduce a novel, modified two-process Knox test that controls for self-excitation to measure clustering of overdoses nearby in space and time following law enforcement seizures. We observe a small, but statistically significant (p < 0.001), effect of 17.7 excess non-fatal overdoses per 1000 law enforcement seizures within three weeks and 250 m of a seizure. We discuss the potential causal mechanism for this association along with policy implications.Item RASMA: a reverse search algorithm for mining maximal frequent subgraphs(BMC, 2021-03-16) Salem, Saeed; Alokshiya, Mohammed; Hasan, Mohammad Al; Computer and Information Science, School of ScienceBackground: Given a collection of coexpression networks over a set of genes, identifying subnetworks that appear frequently is an important research problem known as mining frequent subgraphs. Maximal frequent subgraphs are a representative set of frequent subgraphs; A frequent subgraph is maximal if it does not have a super-graph that is frequent. In the bioinformatics discipline, methodologies for mining frequent and/or maximal frequent subgraphs can be used to discover interesting network motifs that elucidate complex interactions among genes, reflected through the edges of the frequent subnetworks. Further study of frequent coexpression subnetworks enhances the discovery of biological modules and biological signatures for gene expression and disease classification. Results: We propose a reverse search algorithm, called RASMA, for mining frequent and maximal frequent subgraphs in a given collection of graphs. A key innovation in RASMA is a connected subgraph enumerator that uses a reverse-search strategy to enumerate connected subgraphs of an undirected graph. Using this enumeration strategy, RASMA obtains all maximal frequent subgraphs very efficiently. To overcome the computationally prohibitive task of enumerating all frequent subgraphs while mining for the maximal frequent subgraphs, RASMA employs several pruning strategies that substantially improve its overall runtime performance. Experimental results show that on large gene coexpression networks, the proposed algorithm efficiently mines biologically relevant maximal frequent subgraphs. Conclusion: Extracting recurrent gene coexpression subnetworks from multiple gene expression experiments enables the discovery of functional modules and subnetwork biomarkers. We have proposed a reverse search algorithm for mining maximal frequent subnetworks. Enrichment analysis of the extracted maximal frequent subnetworks reveals that subnetworks that are frequent are highly enriched with known biological ontologies.Item Learning network event sequences using long short-term memory and second-order statistic loss(Wiley, 2021-02) Sha, Hao; Al Hasan, Mohammad; Mohler, George; Computer and Information Science, School of ScienceModeling temporal event sequences on the vertices of a network is an important problem with widespread applications; examples include modeling influences in social networks, preventing crimes by modeling their space–time occurrences, and forecasting earthquakes. Existing solutions for this problem use a parametric approach, whose applicability is limited to event sequences following some well-known distributions, which is not true for many real life event datasets. To overcome this limitation, in this work, we propose a composite recurrent neural network model for learning events occurring in the vertices of a network over time. Our proposed model combines two long short-term memory units to capture base intensity and conditional intensity of an event sequence. We also introduce a second-order statistic loss that penalizes higher divergence between the generated and the target sequence's distribution of hop count distance of consecutive events. Given a sequence of vertices of a network in which an event has occurred, the proposed model predicts the vertex where the next event would most likely occur. Experimental results on synthetic and real-world datasets validate the superiority of our proposed model in comparison to various baseline methods.