Skip to main content
Log in

Indicators for merge conflicts in the wild: survey and empirical study

  • Published:
Automated Software Engineering Aims and scope Submit manuscript

Abstract

While the creation of new branches and forks is easy and fast with modern version-control systems, merging is often time-consuming. Especially when dealing with many branches or forks, a prediction of merge costs based on lightweight indicators would be desirable to help developers recognize problematic merging scenarios before potential conflicts become too severe in the evolution of a complex software project. We analyze the predictive power of several indicators, such as the number, size or scattering degree of commits in each branch, derived either from the version-control system or directly from the source code. Based on a survey of 41 developers, we inferred 7 potential indicators to predict the number of merge conflicts. We tested corresponding hypotheses by studying 163 open-source projects, including 21,488 merge scenarios and comprising 49,449,773 lines of code. A notable (negative) result is that none of the 7 indicators suggested by the participants of the developer survey has a predictive power concerning the frequency of merge conflicts. We discuss this and other findings as well as perspectives thereof.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. http://www.infosun.fim.uni-passau.de/spl/pythia/.

  2. http://siemens.github.io/codeface/.

  3. http://www.infosun.fim.uni-passau.de/spl/papers/conflict-prediction/.

  4. http://www.githubarchive.org/.

  5. n-way merges in Git.

  6. http://fosd.de/JDime/.

  7. http://www.r-project.org/.

  8. large data set, easily significant, no value for any claims.

  9. https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/stepAIC.html.

  10. Only the last suggestion, studying only merge scenarios with conflicts, leads to high correlations that, however, are meaningless because this approach neglects the fact that merge scenarios indeed often have zero conflicts.

References

  • Anderson, T., Finn, J.: The New Statistical Analysis of Data. Springer, Berlin (1996)

    Book  MATH  Google Scholar 

  • Antkiewicz, M., Ji, W., Berger, T., Czarnecki, K., Schmorleiz, T., Lämmel, R., Stănciulescu, t., Wąsowski, A., Schaefer, I.: Flexible product line engineering with a virtual platform. In: Companion Volume ICSE, pp. 532–535. ACM (2014)

  • Apel, S., Liebig, J., Brandl, B., Lengauer, C., Kästner, C.: Semistructured merge: rethinking merge in revision control systems. In: Proc. ESEC/FSE, pp. 190–200. ACM (2011)

  • Berger, T., Rublack, R., Nair, D., Atlee, J.M., Becker, M., Czarnecki, K., Wąsowski, A.: A survey of variability modeling in industrial practice. In: Proc. Int. Workshop on Variability Modelling of Software-Intensive Systems (VaMoS), pp. 7:1–7:8. ACM (2013)

  • Bettenburg, N., Hassan, A.: Studying the impact of social structures on software quality. In: Proc. ICPC, pp. 124–133. IEEE (2010)

  • Biehl, J.T., Czerwinski, M., Smith, G., Robertson, G.G.: FASTDash: A visual dashboard for fostering awareness in software teams. In: Proc. CHI, pp. 1313–1322. ACM (2007)

  • Bird, C., Zimmermann, T.: Assessing the value of branches with what-if analysis. In: Proc. ACM SIGSOFT FSE, pp. 45:1–45:11. ACM (2012)

  • Brun, Y., Holmes, R., Ernst, M.D., Notkin, D.: Proactive detection of collaboration conflicts. In: Proc. ESEC/FSE, pp. 168–178. ACM (2011)

  • Catal, C., Diri, B.: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009)

    Article  Google Scholar 

  • Dewan, P., Hegde, R.: Semi-synchronous conflict detection and resolution in asynchronous software development. In: Proc. ECSCW, pp. 159–178. Springer (2007)

  • Dig, D., Manzoor, K., Johnson, R., Nguyen, T.N.: Effective software merging in the presence of object-oriented refactorings. IEEE TSE 34(3), 321–335 (2008)

    Google Scholar 

  • Dubinsky, Y., Rubin, J., Berger, T., Duszynski, S., Becker, M., Czarnecki, K.: An exploratory study of cloning in industrial software product lines. In: Proc. CSMR, pp. 25–34. IEEE (2013)

  • Duszynski, S., Knodel, J., Becker, M.: Analyzing the source code of multiple software variants for reuse potential. In: Proc. WCRE, pp. 303–307. IEEE (2011)

  • El Emam, K., Benlarbi, S., Goel, N., Rai, S.N.: The confounding effect of class size on the validity of object-oriented metrics. IEEE TSE 27(7), 630–650 (2001)

    Google Scholar 

  • Faust, D., Verhoef, C.: Software product line migration and deployment. Softw. Pract. Exp. 33(10), 933–955 (2003)

    Article  Google Scholar 

  • Guimarães, M.L., Silva, A.R.: Improving early detection of software merge conflicts. In: Proc. ICSE, pp. 342–352. IEEE (2012)

  • Gyimothy, T., Ferenc, R., Siket, I.: Empirical validation of object-oriented metrics on open source software for fault prediction. IEEE TSE 31(10), 897–910 (2005)

    Google Scholar 

  • Hattori, L., Lanza, M.: Syde: A tool for collaborative software development. In: Companion Volume ICSE, pp. 235–238. ACM (2010)

  • Hudson, W.: Card sorting. In: Guide to Advanced Empirical Software Engineering, The Interaction Design Foundation (2013)

  • Jedlitschka, A., Pfahl, D.: Reporting Guidelines for Controlled Experiments in Software Engineering. In: Proc. ESE, pp. 95–104. IEEE (2005)

  • Joblin, M., Mauerer, W., Apel, S., Siegmund, J., Riehle, D.: From Developer Networks to Verified Communities: A Fine-Grained Approach. In: Proc. ICSE, pp. 563–573. IEEE (2015)

  • Kim, M., Notkin, D., Grossman, D.: Automatic Inference of Structural Changes for Matching Across Program Versions. In: Proc. ICSE, pp. 333–343. IEEE (2007)

  • Leßenich, O., Apel, S., Lengauer, C.: Balancing precision and performance in structured merge. Autom. Softw. Eng. 22, 1–31 (2014)

  • Mahouachi, R., Kessentini, M., Cinnéide, M.Ó.: Search-based refactoring detection. In: Proc. Int. Conference Genetic and Evolutionary Computation Conference (GECCO), pp. 205–206 (2013)

  • Mens, T.: A state-of-the-art survey on software merging. IEEE TSE 28(5), 449–462 (2002)

    Google Scholar 

  • Muşlu, K., Bird, C., Nagappan, N., Czerwonka, J.: Transition from centralized to decentralized version control systems: a case study on reasons, barriers, and outcomes. In: Proc. ICSE, pp. 334–344. ACM (2014)

  • Nagappan, N., Ball, T.: Making software: what really works, and why we believe it. O’Reilly, chap Evidence-based Failure Prediction, pp. 415–434 (2010)

  • Pinzger, M., Gall, H., Girard, J.F., Knodel, J., Riva, C., Pasman, W., Broerse, C., Wijnstra, J.G.: Architecture recovery for product families. In: Proc. Workshop Software Product-Family Engineering, pp. 332–351. Springer (2003)

  • Potdar, A., Shihab, E.: An exploratory study on self-admitted technical debt. In: Proc. ICSME. IEEE (2014)

  • Rubin, J., Chechik, M.: N-way model merging. In: Proc. ESEC/FSE, pp. 301–311. ACM (2013)

  • Rubin, J., Czarnecki, K., Chechik, M.: Managing cloned variants: a framework and experience. In: Proc. SPLC, pp. 101–110. ACM (2013)

  • Ryssel, U., Ploennigs, J., Kabitzsch, K.: Automatic variation-point identification in function-block-based models. In: Proc. GPCE, pp. 23–32. ACM (2010)

  • Sarma, A., Noroozi, Z., van der Hoek, A.: Palantír: Raising awareness among configuration management workspaces. In: Proc. ICSE, pp. 444–454. IEEE (2003)

  • Sarma, A., Redmiles, D., van der Hoek, A.: Palantír: early detection of development conflicts arising from parallel code changes. IEEE TSE 38(4), 889–908 (2012)

    Google Scholar 

  • Siegmund, J., Schumann, J.: Confounding parameters on program comprehension: a literature survey. Empir. Softw. Eng. 20(4), 1159–1192 (2014)

    Article  Google Scholar 

  • Siegmund, J., Siegmund, N., Apel, S.: Views on internal and external validity in empirical software engineering. In: Proc. ICSE, pp. 9–19. IEEE (2015)

  • Stanciulescu, S., Schulze, S., Wasowski, A.: Forked and integrated variants in an open-source firmware project. In: Proc. ICSME, pp. 151–160 (2015)

  • Staples, M., Hill, D.: Experiences Adopting Software Product Line Development without a Product Line Architecture. In: Proc. APSEC, pp. 176–183. IEEE (2004)

  • Tian, Y., Lawall, J., Lo, D.: Identifying Linux bug fixing patches. In: Proc. ICSE, pp. 386–396. ACM (2012)

  • Tsay, J., Dabbish, L., Herbsleb, J.: Influence of social and technical factors for evaluating contribution in GitHub. In: Proc. ICSE, pp. 356–366. ACM (2014)

Download references

Acknowledgements

We thank all survey participants for their insightful comments and suggestions. This work has been supported by the German Research Foundation (AP 206/4, AP 206/5, and AP 206/6).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olaf Leßenich.

Appendices

Appendices

Further indicators

In addition to the indicators derived from our hypotheses in Sect. 2, we experimented with other potential indicators as well. Here, we present the results that we computed using these alternative indicators.

First, we computed the number of developers that were involved in each of the two competing branches of a merge scenario. Our intention was that the more developers contribute to a merge scenario, the more likely it is that there will be conflicts, as developers are unaware of what the others are changing. Following the procedure we applied to our other indicators, we computed the geometric mean of both values and correlated it with the number of merge conflicts. We observed almost no correlation (\( cor \) = 0.08, \(p\) \({<}0.05\)), so we assume that this indicator is not useful for predicting merge conflicts. The corresponding scatter plot is shown in Fig. 13a.

Fig. 13
figure 13

a Number of developers versus number of conflicts, b number of days of development versus number of conflicts, c number of changes AST nodes versus number of conflicts, d commits last 2 weeks versus number of conflicts

Another assumption also mentioned in the survey is that branches that are developed over a long time without a merge are more likely to lead to merge conflicts. To test this hypothesis, we computed the number of days of development for both branches, and correlated the geometric mean with the number of conflicts. As shown in Fig. 13b, we did not even observe a weak correlation here (\( cor \) = 0.15, \(p\) \({<}0.05\)), so we had to reject this idea as well.

To test H 4, we use the number of changed lines of code to capture the size of changes within a merge scenario. An alternative representation of that can be expressed via the number changed nodes in the abstract syntax trees that are merged. A scatter plot of this variation of H 4 can be seen in Fig. 13c. As for the results, the correlation we observed is even lower (\( cor \) = 0.25, \(p\) \({<}0.05\)) than the line-based metric presented in Sect. 5 (\( cor \) = 0.43, \(p\) \({<}0.05\)).

As mentioned in Sect. 5, we tested a variation of H 2 by looking at the last two weeks before the merge instead of only the last week. The result is displayed in Fig. 13d.

Developer survey

In what follow, we show the complete questionnaire from which we derived our hypotheses.

1.1 Introduction

In this survey, we assess the frequency, cause, and nature of merge conflicts in the context of version-control systems. There are 35 small questions—when asked for numbers, a rough estimate is sufficient. You can leave comments on most questions, but you do not have to. Also, each question is optional. Your data will of course be anonymized. If you are interested in the results, you can leave your e-mail address at the end of this survey. If you have any questions, please contact us.

Olaf Leßenich1, Janet Siegmund1, Christian Kästner2, Sven Apel1, Claus Hunsen1

1 University of Passau, 2 Carnegie Mellon University

figure h
figure i
figure j
figure k
figure l
figure m
figure n
figure o
figure p

Thank you very much for you time. We highly appreciate your input. If you have any questions, please contact us.

Olaf Leßenich1, Janet Siegmund1, Christian Kästner2, Sven Apel1, Claus Hunsen1

1 University of Passau, 2 Carnegie Mellon University

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leßenich, O., Siegmund, J., Apel, S. et al. Indicators for merge conflicts in the wild: survey and empirical study. Autom Softw Eng 25, 279–313 (2018). https://doi.org/10.1007/s10515-017-0227-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10515-017-0227-0

Keywords

Navigation