Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings
Date
Language
Embargo Lift Date
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Abstract
This paper discusses our work on using software engineering metrics (i.e., source code metrics) to classify an error message generated by a Static Code Analysis (SCA) tool as a true-positive, false-positive, or false-negative. Specifically, we compare the performance of Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Random Forests, and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) over eight datasets. The performance of the techniques is assessed by computing the F-measure metric, which is defined as the weighted harmonic mean of the precision and recall of the predicted model. The overall results of the study show that the F-measure value of the predicted model, which is generated using Random Forests technique, ranges from 83% to 98%. Additionally, the Random Forests technique outperforms the other techniques. Lastly, our results indicate that the complexity and coupling metrics have the most impact on whether a SCA tool with generate a false-positive warning or not.