Indicators for merge conflicts in the wild: survey and empirical study

Leßenich, Olaf; Siegmund, Janet; Apel, Sven; Kästner, Christian; Hunsen, Claus

doi:10.1007/s10515-017-0227-0

Indicators for merge conflicts in the wild: survey and empirical study

Published: 09 September 2017

Volume 25, pages 279–313, (2018)
Cite this article

Automated Software Engineering Aims and scope Submit manuscript

Olaf Leßenich ORCID: orcid.org/0000-0002-8996-1996¹,
Janet Siegmund¹,
Sven Apel¹,
Christian Kästner² &
…
Claus Hunsen¹

966 Accesses
25 Citations
3 Altmetric
Explore all metrics

Abstract

While the creation of new branches and forks is easy and fast with modern version-control systems, merging is often time-consuming. Especially when dealing with many branches or forks, a prediction of merge costs based on lightweight indicators would be desirable to help developers recognize problematic merging scenarios before potential conflicts become too severe in the evolution of a complex software project. We analyze the predictive power of several indicators, such as the number, size or scattering degree of commits in each branch, derived either from the version-control system or directly from the source code. Based on a survey of 41 developers, we inferred 7 potential indicators to predict the number of merge conflicts. We tested corresponding hypotheses by studying 163 open-source projects, including 21,488 merge scenarios and comprising 49,449,773 lines of code. A notable (negative) result is that none of the 7 indicators suggested by the participants of the developer survey has a predictive power concerning the frequency of merge conflicts. We discuss this and other findings as well as perspectives thereof.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How different are different diff algorithms in Git?

Article Open access 11 September 2019

Yusuf Sulistyo Nugroho, Hideaki Hata & Kenichi Matsumoto

Sampling in software engineering research: a critical review and guidelines

Article 28 April 2022

Sebastian Baltes & Paul Ralph

Duration and quality of the peer review process: the author’s perspective

Article Open access 09 March 2017

Janine Huisman & Jeroen Smits

Notes

http://www.infosun.fim.uni-passau.de/spl/pythia/.
http://siemens.github.io/codeface/.
http://www.infosun.fim.uni-passau.de/spl/papers/conflict-prediction/.
http://www.githubarchive.org/.
n-way merges in Git.
http://fosd.de/JDime/.
http://www.r-project.org/.
large data set, easily significant, no value for any claims.
https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html.
Only the last suggestion, studying only merge scenarios with conflicts, leads to high correlations that, however, are meaningless because this approach neglects the fact that merge scenarios indeed often have zero conflicts.

References

Anderson, T., Finn, J.: The New Statistical Analysis of Data. Springer, Berlin (1996)
Book MATH Google Scholar
Antkiewicz, M., Ji, W., Berger, T., Czarnecki, K., Schmorleiz, T., Lämmel, R., Stănciulescu, t., Wąsowski, A., Schaefer, I.: Flexible product line engineering with a virtual platform. In: Companion Volume ICSE, pp. 532–535. ACM (2014)
Apel, S., Liebig, J., Brandl, B., Lengauer, C., Kästner, C.: Semistructured merge: rethinking merge in revision control systems. In: Proc. ESEC/FSE, pp. 190–200. ACM (2011)
Berger, T., Rublack, R., Nair, D., Atlee, J.M., Becker, M., Czarnecki, K., Wąsowski, A.: A survey of variability modeling in industrial practice. In: Proc. Int. Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), pp. 7:1–7:8. ACM (2013)
Bettenburg, N., Hassan, A.: Studying the impact of social structures on software quality. In: Proc. ICPC, pp. 124–133. IEEE (2010)
Biehl, J.T., Czerwinski, M., Smith, G., Robertson, G.G.: FASTDash: A visual dashboard for fostering awareness in software teams. In: Proc. CHI, pp. 1313–1322. ACM (2007)
Bird, C., Zimmermann, T.: Assessing the value of branches with what-if analysis. In: Proc. ACM SIGSOFT FSE, pp. 45:1–45:11. ACM (2012)
Brun, Y., Holmes, R., Ernst, M.D., Notkin, D.: Proactive detection of collaboration conflicts. In: Proc. ESEC/FSE, pp. 168–178. ACM (2011)
Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)
Article Google Scholar
Dewan, P., Hegde, R.: Semi-synchronous conflict detection and resolution in asynchronous software development. In: Proc. ECSCW, pp. 159–178. Springer (2007)
Dig, D., Manzoor, K., Johnson, R., Nguyen, T.N.: Effective software merging in the presence of object-oriented refactorings. IEEE TSE 34(3), 321–335 (2008)
Google Scholar
Dubinsky, Y., Rubin, J., Berger, T., Duszynski, S., Becker, M., Czarnecki, K.: An exploratory study of cloning in industrial software product lines. In: Proc. CSMR, pp. 25–34. IEEE (2013)
Duszynski, S., Knodel, J., Becker, M.: Analyzing the source code of multiple software variants for reuse potential. In: Proc. WCRE, pp. 303–307. IEEE (2011)
El Emam, K., Benlarbi, S., Goel, N., Rai, S.N.: The confounding effect of class size on the validity of object-oriented metrics. IEEE TSE 27(7), 630–650 (2001)
Google Scholar
Faust, D., Verhoef, C.: Software product line migration and deployment. Softw. Pract. Exp. 33(10), 933–955 (2003)
Article Google Scholar
Guimarães, M.L., Silva, A.R.: Improving early detection of software merge conflicts. In: Proc. ICSE, pp. 342–352. IEEE (2012)
Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE TSE 31(10), 897–910 (2005)
Google Scholar
Hattori, L., Lanza, M.: Syde: A tool for collaborative software development. In: Companion Volume ICSE, pp. 235–238. ACM (2010)
Hudson, W.: Card sorting. In: Guide to Advanced Empirical Software Engineering, The Interaction Design Foundation (2013)
Jedlitschka, A., Pfahl, D.: Reporting Guidelines for Controlled Experiments in Software Engineering. In: Proc. ESE, pp. 95–104. IEEE (2005)
Joblin, M., Mauerer, W., Apel, S., Siegmund, J., Riehle, D.: From Developer Networks to Verified Communities: A Fine-Grained Approach. In: Proc. ICSE, pp. 563–573. IEEE (2015)
Kim, M., Notkin, D., Grossman, D.: Automatic Inference of Structural Changes for Matching Across Program Versions. In: Proc. ICSE, pp. 333–343. IEEE (2007)
Leßenich, O., Apel, S., Lengauer, C.: Balancing precision and performance in structured merge. Autom. Softw. Eng. 22, 1–31 (2014)
Mahouachi, R., Kessentini, M., Cinnéide, M.Ó.: Search-based refactoring detection. In: Proc. Int. Conference Genetic and Evolutionary Computation Conference (GECCO), pp. 205–206 (2013)
Mens, T.: A state-of-the-art survey on software merging. IEEE TSE 28(5), 449–462 (2002)
Google Scholar
Muşlu, K., Bird, C., Nagappan, N., Czerwonka, J.: Transition from centralized to decentralized version control systems: a case study on reasons, barriers, and outcomes. In: Proc. ICSE, pp. 334–344. ACM (2014)
Nagappan, N., Ball, T.: Making software: what really works, and why we believe it. O’Reilly, chap Evidence-based Failure Prediction, pp. 415–434 (2010)
Pinzger, M., Gall, H., Girard, J.F., Knodel, J., Riva, C., Pasman, W., Broerse, C., Wijnstra, J.G.: Architecture recovery for product families. In: Proc. Workshop Software Product-Family Engineering, pp. 332–351. Springer (2003)
Potdar, A., Shihab, E.: An exploratory study on self-admitted technical debt. In: Proc. ICSME. IEEE (2014)
Rubin, J., Chechik, M.: N-way model merging. In: Proc. ESEC/FSE, pp. 301–311. ACM (2013)
Rubin, J., Czarnecki, K., Chechik, M.: Managing cloned variants: a framework and experience. In: Proc. SPLC, pp. 101–110. ACM (2013)
Ryssel, U., Ploennigs, J., Kabitzsch, K.: Automatic variation-point identification in function-block-based models. In: Proc. GPCE, pp. 23–32. ACM (2010)
Sarma, A., Noroozi, Z., van der Hoek, A.: Palantír: Raising awareness among configuration management workspaces. In: Proc. ICSE, pp. 444–454. IEEE (2003)
Sarma, A., Redmiles, D., van der Hoek, A.: Palantír: early detection of development conflicts arising from parallel code changes. IEEE TSE 38(4), 889–908 (2012)
Google Scholar
Siegmund, J., Schumann, J.: Confounding parameters on program comprehension: a literature survey. Empir. Softw. Eng. 20(4), 1159–1192 (2014)
Article Google Scholar
Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proc. ICSE, pp. 9–19. IEEE (2015)
Stanciulescu, S., Schulze, S., Wasowski, A.: Forked and integrated variants in an open-source firmware project. In: Proc. ICSME, pp. 151–160 (2015)
Staples, M., Hill, D.: Experiences Adopting Software Product Line Development without a Product Line Architecture. In: Proc. APSEC, pp. 176–183. IEEE (2004)
Tian, Y., Lawall, J., Lo, D.: Identifying Linux bug fixing patches. In: Proc. ICSE, pp. 386–396. ACM (2012)
Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in GitHub. In: Proc. ICSE, pp. 356–366. ACM (2014)

Download references

Acknowledgements

We thank all survey participants for their insightful comments and suggestions. This work has been supported by the German Research Foundation (AP 206/4, AP 206/5, and AP 206/6).

Author information

Authors and Affiliations

University of Passau, Passau, Germany
Olaf Leßenich, Janet Siegmund, Sven Apel & Claus Hunsen
Carnegie Mellon University, Pittsburgh, PA, USA
Christian Kästner

Authors

Olaf Leßenich
View author publications
You can also search for this author in PubMed Google Scholar
Janet Siegmund
View author publications
You can also search for this author in PubMed Google Scholar
Sven Apel
View author publications
You can also search for this author in PubMed Google Scholar
Christian Kästner
View author publications
You can also search for this author in PubMed Google Scholar
Claus Hunsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Olaf Leßenich.

Appendices

Further indicators

In addition to the indicators derived from our hypotheses in Sect. 2, we experimented with other potential indicators as well. Here, we present the results that we computed using these alternative indicators.

First, we computed the number of developers that were involved in each of the two competing branches of a merge scenario. Our intention was that the more developers contribute to a merge scenario, the more likely it is that there will be conflicts, as developers are unaware of what the others are changing. Following the procedure we applied to our other indicators, we computed the geometric mean of both values and correlated it with the number of merge conflicts. We observed almost no correlation (\( cor \) = 0.08, \(p\) \({<}0.05\)), so we assume that this indicator is not useful for predicting merge conflicts. The corresponding scatter plot is shown in Fig. 13a.

Another assumption also mentioned in the survey is that branches that are developed over a long time without a merge are more likely to lead to merge conflicts. To test this hypothesis, we computed the number of days of development for both branches, and correlated the geometric mean with the number of conflicts. As shown in Fig. 13b, we did not even observe a weak correlation here (\( cor \) = 0.15, \(p\) \({<}0.05\)), so we had to reject this idea as well.

To test H ₄, we use the number of changed lines of code to capture the size of changes within a merge scenario. An alternative representation of that can be expressed via the number changed nodes in the abstract syntax trees that are merged. A scatter plot of this variation of H ₄ can be seen in Fig. 13c. As for the results, the correlation we observed is even lower (\( cor \) = 0.25, \(p\) \({<}0.05\)) than the line-based metric presented in Sect. 5 (\( cor \) = 0.43, \(p\) \({<}0.05\)).

As mentioned in Sect. 5, we tested a variation of H ₂ by looking at the last two weeks before the merge instead of only the last week. The result is displayed in Fig. 13d.

Developer survey

In what follow, we show the complete questionnaire from which we derived our hypotheses.

1.1 Introduction

In this survey, we assess the frequency, cause, and nature of merge conflicts in the context of version-control systems. There are 35 small questions—when asked for numbers, a rough estimate is sufficient. You can leave comments on most questions, but you do not have to. Also, each question is optional. Your data will of course be anonymized. If you are interested in the results, you can leave your e-mail address at the end of this survey. If you have any questions, please contact us.

Olaf Leßenich¹, Janet Siegmund¹, Christian Kästner², Sven Apel¹, Claus Hunsen¹

¹ University of Passau, ² Carnegie Mellon University

Thank you very much for you time. We highly appreciate your input. If you have any questions, please contact us.

Olaf Leßenich¹, Janet Siegmund¹, Christian Kästner², Sven Apel¹, Claus Hunsen¹

¹ University of Passau, ² Carnegie Mellon University

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leßenich, O., Siegmund, J., Apel, S. et al. Indicators for merge conflicts in the wild: survey and empirical study. Autom Softw Eng 25, 279–313 (2018). https://doi.org/10.1007/s10515-017-0227-0

Download citation

Received: 11 January 2017
Accepted: 30 August 2017
Published: 09 September 2017
Issue Date: June 2018
DOI: https://doi.org/10.1007/s10515-017-0227-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Indicators for merge conflicts in the wild: survey and empirical study

Abstract

Access this article

Similar content being viewed by others

How different are different diff algorithms in Git?

Sampling in software engineering research: a critical review and guidelines

Duration and quality of the peer review process: the author’s perspective

Notes

References

Acknowledgements