The remainder of the paper is structured as follows.
BEST MINING SOFTWARE 2017 G2 SOFTWARE
It is strengthened by further enabling the inclusion of MSRs as part of the evidence-based software engineering (EBSE) , paradigm by supporting the generation of replicable, comparable MSRs that can be applied across multiple domains or in large-scale studies. The implication of this paper for both practitioners and researchers is the availability of a thorough process consolidating multiple findings related to a method for conducting MSRs. These are derived from established SLR practices for SE , and supported by results from this SLR and previous related works by other authors. Ĭentralise the findings to propose guidelines for systematic MSRs that support an unbiased aggregation of empirical results.Identify gaps in MSR-based studies and compare their process to that of SLRs (as defined in ). This paper is motivated by the need to improve the reliability of MSR studies, understand how repositories are selected and define a systematic process for mining software repositories.Īssess the current practice of conducting MSR-based studies in Software Engineering (SE) through a systematic literature review (SLR).
However, though meta-studies have focused on different aspects of MSRs, such as the validity of some sources, or the usage of the data produced , there are no frameworks or methodologies to guide the entire process.
Since MSR-based studies are considered evidence-based research , having guidelines to produce structured and reproducible studies is critical. Therefore, the popularity and acceptance of MSRs have grown in recent years, increasing the interest in this research area . Example areas are software evolution, developers’ networks and characterisation, bug prediction, effort estimation, among others. In MSR-based studies, researchers select repositories that fit specific criteria, extract data from them and analyse the data to obtain evidence to answer their research questions .Īs a result, MSR is a flexible methodology, as it allows the empirical exploration of a range of questions.
This field, known as Mining Software Repositories (MSR), analyses and cross-links data to uncover compelling and actionable information about software systems . During the past decade, collecting and analysing data from open-source repositories of software projects has gained relevance in empirical software engineering research .