Software Traceability

Software traceability is the ability to trace to and from software artifacts throughout the software development life cycle.  Software traceability helps stake holders to identify inconsistencies and omissions in each phase of development life cycle and to identify the impacts of changes. Although the importance of software traceability is obvious, implementing traceability is a challenging task because of vast amount of data that need to be traced accompanying a large amount of manual efforts. To facilitate software traceability, techniques and algorithms to automatically generate traceability links have been proposed. Reducing human efforts, improving the accuracy of the traceability links generated by the tools, and making software traceability transparent (less obtrusive) throughout the software development life cycle is the goal of software traceability research. 

I conducted a collaborative project to create an experimental environments for traceability research. In this project, I developed the standards for benchmarking traceability techniques and participated in creating  TraceLab, a framework for traceability experimentation. I also conducted research on tracing architectural tactics from source code and evaluating effectiveness of human-feedback based traceability techniques.  

Software Reliability/Security Metrics

Although organizations spend hugh amount of money for software testing, exhaustive testing is infeasible in practice. Late detection of faults and vulnerabilities costs much more than the early detection in the software development life cycle. Therefore, prioritizing code locations that are likely to have most faults and vulnerabilities before software release and inspecting and testing those code locations first are critical. Therefore, finding software metrics that can indicate software faults and vulnerabilities early in the development life cycle is important.

I performed research on identifying the relationship between software complexity and software vulnerabilities and performed empirical studies by building vulnerability prediction models using complexity metrics and machine learning/data mining and statistical methods.  The complexity metrics that I examined include code complexity, OO design complexity, network dependency complexity, and execution complexity metrics. All of these metrics are indicative of vulnerable code locations. However, because vulnerabilities are reported only in a very small percentage of code, finding vulnerabilities is akin to finding needles in a haystack and the prediction models provide high false positive rates. Further effort is required to identify characteristics of faults and vulnerabilities, to find better metrics and prediction models  to reduce falsely identified vulnerable code locations.