ÌǹûÅɶÔ

Skip to main content

BugPredict: Leveraging AI for Early Bug Prediction in Codebases

Software quality remains a challenge in today’s software industry, with enormous economic costs associated with poor software quality. Software defects cost 40 to 1,000 times more to correct after the release of the software [1].

 

The American Consortium for Information and Software Quality (CISQ) issued a report in 2021 on the state of software quality in the US [2]. The report estimates that the cost of poor software quality in the US in 2021 was approximately $2.84 trillion. The report further states that about 60% of the software engineering effort was consumed by finding and fixing defects [2]. Bugs in production software can have a negative impact on the business, regardless of severity. The same report claims that business disruptions caused by information technology defects can cost between $32.22 and $53.21 a minute [2]. Software defects will continue to make their way into production.

 

However, we should continue our efforts to prevent defects and resolve them before their release. This includes research efforts directed toward understanding how software teams can achieve higher software quality.

 

AI to predict potential bugs

Recently, researchers have become interested in leveraging Artificial Intelligence (AI) technologies to support software quality assurance. Detecting and repairing software bugs in production can be cost-effective if done before releasing the product. Advancements in AI can provide valuable information to software development teams to predict potential bugs in codebases based on historical data. Such capability empowers developers to identify and mitigate issues proactively, mitigating costly production errors and freeing up resources to focus on innovation and quality improvements. By enabling predictive insights into potential defects at the outset or during the development process, AI-driven tools are poised to transform software quality assurance from reactive problem- solving to preemptive action.

 

A proactive defect management strategy can significantly reduce the time and effort required for debugging, testing, and patching post-release issues, fostering a more sustainable software engineering process.


Binary classification is commonly used in existing software defect prediction studies [3]. However, the true value of AI lies in its potential to predict not just the presence of defects but their severity, necessary remediation efforts, and areas of code particularly vulnerable to defects in future releases. This enables a more nuanced approach to release planning, allowing development teams to anticipate risks and allocate resources effectively, thereby fostering a proactive approach to software quality and operational resilience.

 

Leveraging historical defect datasets, code version management system (e.g., Bitbucket, GitHub, etc.) data, software development workflow data (e.g., JIRA tickets), codebases, and upcoming release scopes, potential bugs can be predicted for specific releases. In addition, by classifying the defect to a particular module or package, targeted effort can be shifted to mitigate the risks of future errors. Similarly, based on the historical data, resource allocation for the defect can also be predicted based on the attributes like for similar defects.

 

In this research line at CoSELab, we ask and seek to investigate these questions:
  • How can AI-driven predictive models leverage historical defect and codebase data to identify areas most vulnerable to defects in upcoming software releases?
  • How can defect prediction within specific modules or packages facilitate targeted mitigation strategies, and what impact does this approach have on reducing the frequency and severity of defects in production?
  • How can AI-driven models accurately predict the severity and required remediation efforts for potential defects in future releases, and how can this information optimize resource allocation during software development cycles (e.g., Sprints) planning?
  • In what ways can AI predictions of defects be integrated into code version management systems (e.g., Bitbucket, GitHub) and development workflows (e.g., JIRA) to enhance preventive strategies for software quality?
  • How can AI be leveraged to generate effective and targeted test cases for verifying predicted defects in software codebases?
  • How can AI be utilized to generate code fixes for predicted software defects, and how effective are these solutions in addressing predicted bugs?

Sustainable software engineering processes

By investigating these questions, we aim to contribute to shaping more sustainable software engineering processes, where developers, managers, and product owners alike are empowered to make data-driven decisions that enhance software reliability, reduce costly post-release corrections, and create a more resilient development lifecycle. We envision a future software engineering where AI actively supports a culture of continuous quality improvement, making high-quality software a sustainable and achievable standard in the industry.
 

CoSELab

Go to


References
[1]    George Issac, Chandrasekharan Rajendran, and RN Anantharaman. Determinants of software quality: cus- tomer’s perspective. Total Quality Management & Business Excellence, 14(9):1053–1070, 2003.
[2]    Herb Krasner. The cost of poor software quality in the us: A 2020 report. Proc. Consortium Inf. Softw. QualityTM (CISQTM), 2, 2021.
[3]    Jalaj Pachouly, Swati Ahirrao, Ketan Kotecha, Ganeshsree Selvachandran, and Ajith Abraham. A systematic literature review on software defect prediction using artificial intelligence: Datasets, data validation methods, approaches, and tools. Engineering Applications of Artificial Intelligence, 111:104773, 2022.

Last Updated 13.03.2025