Skip to main content
Log in

Leveraging cluster backbones for improving MAP inference in statistical relational models

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

A wide range of important problems in machine learning, expert system, social network analysis, bioinformatics and information theory can be formulated as a maximum a-posteriori (MAP) inference problem on statistical relational models. While off-the-shelf inference algorithms that are based on local search and message-passing may provide adequate solutions in some situations, they frequently give poor results when faced with models that possess high-density networks. Unfortunately, these situations always occur in models of real-world applications. As such, accurate and scalable maximum a-posteriori (MAP) inference on such models often remains a key challenge. In this paper, we first introduce a novel family of extended factor graphs that are parameterized by a smoothing parameter χ ∈ [0,1]. Applying belief propagation (BP) message-passing to this family formulates a new family of W eighted S urvey P ropagation algorithms (WSP-χ) applicable to relational domains. Unlike off-the-shelf inference algorithms, WSP-χ detects the “backbone” ground atoms in a solution cluster that involve potentially optimal MAP solutions: the cluster backbone atoms are not only portions of the optimal solutions, but they also can be exploited for scaling MAP inference by iteratively fixing them to reduce the complex parts until the network is simplified into one that can be solved accurately using any conventional MAP inference method. We also propose a lazy variant of this WSP-χ family of algorithms. Our experiments on several real-world problems show the efficiency of WSP-χ and its lazy variants over existing prominent MAP inference solvers such as MaxWalkSAT, RockIt, IPP, SP-Y and WCSP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Achlioptas, D., Ricci-Tersenghi, F.: Random formulas have frozen variables. SIAM J Comput 39(1), 260–280 (2009). SIAM

    MathSciNet  MATH  Google Scholar 

  2. Ahmadi, B., Kersting, K., Mladenov, M., Natarajan, S.: Exploiting symmetries for scaling loopy belief propagation and relational training, vol. 92 (2013)

  3. Allouche, D., de Givry, S., Schiex, T.: Toulbar2 an Open Source Exact Cost Function Network Solver. Technical report, INRIA (2010)

  4. Amirian, M.M., Ghidary, S.S.: Xeggora: Exploiting immune-to-evidence symmetries with full aggregation in statistical relational models. J. Artif. Intell. Res. 66, 33–56 (2019)

    MathSciNet  MATH  Google Scholar 

  5. Battaglia, D., Kolár, M., Zecchina, R.: Minimizing energy below the glass thresholds. Phys. Rev. E 70, 36107–36118 (2004)

    MathSciNet  Google Scholar 

  6. Besag, J.: On the statistical analysis of dirty pictures. J R Stat Soc Series B stat Methodol 48(3), 259–279 (1986)

    MathSciNet  MATH  Google Scholar 

  7. Braunstein, A., Zecchina, R.: Survey and belief propagation on random k-sat. In: Proceedings of the 7th International Conference on Theory and Applications of Satisfiability Testing, vol. 2919, pp. 519–528. Springer, Vancouver (2004)

  8. Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for satisfiability. Random Struct. Algorithm. 27(2), 201–226 (2005)

    MathSciNet  MATH  Google Scholar 

  9. Chavas, J., Furtlehner, C., Mézard, M., Zecchina, R.: Survey-propagation decimation through distributed local computations. J. Stat. Mech. Theory Exper. 2005 (11), 11016–11027 (2005). IOP Publishing

    MATH  Google Scholar 

  10. Chieu, H.L., Lee, W.S.: Relaxed survey propagation for the weighted maximum satisfiability problem. J. Artif. Intell. Res. (JAIR) 36, 229–266 (2009)

    MathSciNet  MATH  Google Scholar 

  11. Chieu, H.L., Lee, W.S., Teh, Y.W.: Cooled and relaxed survey propagation for mrfs. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems, vol. 20, pp. 297–304, Vancouver. Curran Associates, Inc. (2007)

  12. Conaty, D., Maua, D., de Campos, C.: Approximation complexity of maximum a posteriori inference in sum-product networks. In: Proceedings of The 33rd Conference on Uncertainty in Artificial Intelligence, AUAI (2017)

  13. Davis, J., Domingos, P.: Deep Transfer via Second-Order Markov Logic. In: Proceedings of the 26Th International Conference on Machine Learning (ICML-09), Montreal (2009)

  14. De Salvo Braz, R., Amir, E., Roth, D.: Lifted first-order probabilistic inference. In: Proceedings of the 19th International joint conference in artificial intelligent, pp. 1319–1325. AAAI Press (2005)

  15. De Salvo Braz, R., Amir, E., Roth, D.: Mpe and partial inversion in lifted probabilistic variable elimination. In: Proceedings Of The Twenty-first National Conference On Artificial Intelligence, vol. 6, pp. 1123–1130. AAAI press, Boston (2006)

  16. Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973). IEEE computer Society

    MathSciNet  Google Scholar 

  17. Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. Adaptive Computation and Machine Learning. The MIT Press (2007)

  18. Gomes, C., Hogg, T., Walsh, T., Zhang, W.: Tutorial - Phase Transitions and Structure in Combinatorial Problems. In: Proceedings Of The Eighteenth National Conference On Artificial Intelligence. AAAI Press, Edmonton (2002)

  19. Granville, V., Krivánek, M., Rasson, J.P.: Simulated annealing: a proof of convergence. IEEE Trans. Pattern Anal. Mach. Intell. 16(6), 652–656 (1994). IEEE computer society

    Google Scholar 

  20. Hartmann, A.K., Weigt, M.: Phase transitions in combinatorial optimization problems: basics, algorithms and statistical mechanics. Wiley, New York (2006)

  21. Huynh, T.N., Mooney, R.J.: Max-margin weight learning for markov logic networks. In: Machine Learning and Knowledge Discovery in Databases, vol. 5781, pp. 564–579. Springer (2009)

  22. Ibrahim, M.H., Pal, C., Pesant, G.: Exploiting determinism to scale relational inference. In: Proceedings of the Twenty-Ninth National Conference on Artificial Intelligence (AAAI’15), pp. 1756–1762. AAAI Press, Austin (2015)

  23. Jain, D., Maier, P., Wylezich, G.: Markov Logic as a Modelling Language for Weighted Constraint Satisfaction Problems. In: Eighth International Workshop on Constraint Modelling and Reformulation, in conjunction with CP (2009)

  24. Kambhampati, S.C., Liu, T.: Phase transition and network structure in realistic sat problems. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1619–1620. AAAI Press, Washington (2013)

  25. Kautz, H., Selman, B., Jiang, Y.: A general stochastic approach to solving problems with hard and soft constraints. Satisfiab Problem Theory Appl. 17, 573–586 (1997)

    MathSciNet  MATH  Google Scholar 

  26. Kazemi, S.M., Kimmig, A., Van den Broeck, G., Poole, D.: New liftable classes for first-order probabilistic inference. In: Advances in Neural Information Processing Systems, pp. 3117–3125 (2016)

  27. Kersting, K.: Lifted probabilistic inference. In: Proceedings of 20th European Conference on Artificial Intelligence (ECAI–2012), vol. 27-31, pp. 33–38. ECCAI, Montpellier (2012)

  28. Khosla, M., Melhorn, K., Panagiotou, K.: Message Passing Algorithms, PhD thesis. Citeseer (2009)

  29. Kiddon, C., Domingos, P.: Coarse-to-fine inference and learning for first-order probabilistic models. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pp. 1049–1056. AAAI Press, San Francisco (2011)

  30. Kilby, P., Slaney, J., Thiébaux, S., Walsh, T.: Backbones and backdoors in satisfiability. In: Proceedings of the The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, vol. 5, pp. 1368–1373. AAAI Press, Pittsburgh (2005)

  31. Kok, S., Singla, P., Richardson, M., Domingos, P., Sumner, M., Poon, H., Lowd, D.: The Alchemy System for Statistical Relational AI. In: Technical Report Department of Computer Science and Engineering, University of Washington, Seattle. http://alchemy.cs.washington.edu (2007)

  32. Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)

    Google Scholar 

  33. Kroc, L., Sabharwal, A., Selman, B.: Survey propagation revisited. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pp. 217–226. AUAI Press, Vancouver (2007)

  34. Kroc, L., Sabharwal, A., Selman, B.: Counting solution clusters in graph coloring problems using belief propagation. In: Proceedings of 22nd Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems, vol. 21, pp. 873–880. Curran Associates Inc., Vancouver (2008)

  35. Kroc, L., Sabharwal, A., Selman, B.: Message-passing and local heuristics as decimation strategies for satisfiability. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp. 1408–1414. ACM (2009)

  36. Kumar, M.P., Torr, P.H.: Efficiently solving convex relaxations for map estimation. In: Proceedings of the 25th international conference on Machine learning, pp. 680–687. ACM, Helsinki (2008)

  37. Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Ser. B (Methodol.) 50, 157–224 (1988)

    MathSciNet  MATH  Google Scholar 

  38. Lowd, D., Domingos, P.: Efficient weight learning for markov logic networks. In: Proceedings of 11th European Conference on Principles and Practice of Knowledge Discovery in Databases PKDD 2007, pp. 200–211. Springer, Warsaw (2007)

  39. Lüdtke, S., Schröder, M., Krüger, F., Bader, S., Kirste, T.: State-space abstractions for probabilistic inference: a systematic review. J. Artif. Intell. Res. 63, 789–848 (2018)

    MathSciNet  MATH  Google Scholar 

  40. Maneva, E., Mossel, E., Wainwright, M.J.: A new look at survey propagation and its generalizations. J. ACM (JACM) 54(4), 17–21 (2007). ACM

    MathSciNet  MATH  Google Scholar 

  41. Mann, A., Hartmann, A.: Numerical solution-space analysis of satisfiability problems. Phys. Rev. E 82(5), 056702–56707. APS (2010)

  42. Mei, J., Jiang, Y., Tu, K.: Maximum a posteriori inference in sum-product networks. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1923–1930 (2018)

  43. Meilicke, C., Leopold, H., Kuss, E., Stuckenschmidt, H., Reijers, H.A.: Overcoming individual process model matcher weaknesses using ensemble matching. Decis. Support. Syst. 100, 15–26 (2017)

    Google Scholar 

  44. Molina, A., Vergari, A., Stelzner, K., Peharz, R., Subramani, P., Mauro, N.D., Poupart, P., Kersting, K.: Spflow: An easy and extensible library for deep probabilistic learning using sum-product networks. arXiv:1901.03704(2019)

  45. Montanari, A., Parisi, G., Ricci-Tersenghi, F.: Instability of one-step replica-symmetry-broken phase in satisfiability problems. J. Phys. A: Math. Gen. 37 (6), 2073–2079 (2004). IOP Publishing

    MathSciNet  MATH  Google Scholar 

  46. Natarajan, S., Tadepalli, P., Dietterich, T.G., Fern, A.: Learning first-order probabilistic models with combining rules. Ann. Math. Artif. Intell. 54(1-3), 223–256 (2008)

    MathSciNet  MATH  Google Scholar 

  47. Nath, A., Domingos, P.M.: Learning relational sum-product networks. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 2878–2886 (2015)

  48. Ng, K.S., Lloyd, J.W., Uther, W.T.: Probabilistic modelling, inference and learning using logical theories. Ann. Math. Artif. Intell. 54(1-3), 159–205 (2008)

    MathSciNet  MATH  Google Scholar 

  49. Niu, F., Ré, C., Doan, A., Shavlik, J.: Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. Proc. VLDB Endow. 4(6), 373–384 (2011)

    Google Scholar 

  50. Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting Parallelism and Symmetry for Map Inference in Statistical Relational Models. In: Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)

  51. Papai, T., Singla, P., Kautz, H.: Constraint propagation for efficient inference in markov logic. In: Proceedings of 17th International Conference on Principles and Practice of Constraint Programming (CP 2011), no. 6876 in Lecture Notes in Computer Science (LNCS), pp 691–705 (2011)

  52. Park, J.D.: Using weighted max-sat engines to solve mpe. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 682–687. AAAI Press, Menlo Park (2002)

  53. Parkes, A.J.: Clustering at the phase transition. In: Proceedings of the 14th National Conference on Artificial Intelligence, pp. 340–345. AAAI Press. at the convention center in Providence (1997)

  54. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)

  55. Peharz, R., Gens, R., Pernkopf, F., Domingos, P.: On the latent variable interpretation in sum-product networks. IEEE Trans. Pattern Anal. Mach. Intell. 39 (10), 2030–2044 (2016)

    Google Scholar 

  56. Poon, H., Domingos, P.: Sum-Product Networks: a New Deep Architecture. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–690. IEEE (2011)

  57. Poon, H., Domingos, P., Sumner, M.: A general method for reducing the complexity of relational inference and its application to mcmc. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pp 1075–1080. AAAI Press, Chicago (2008)

  58. Ravikumar, P., Lafferty, J.: Quadratic programming relaxations for metric labeling and markov random field map estimation. In: Proceedings of the 23rd international conference on Machine learning, pp. 737–744. ACM (2006)

  59. Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1-2), 107–136 (2006). Kluwer Academic Publishers

    Google Scholar 

  60. Riedel, S.: Improving the accuracy and efficiency of map inference for markov logic. In: UAI, pp 468–475. AUAI Press (2008)

  61. Rooshenas, A., Lowd, D.: Learning sum-product networks with direct and indirect variable interactions. In: International Conference on Machine Learning, pp 710–718 (2014)

  62. Sarkhel, S., Gogate, V.: Lifting walksat-based local search algorithms for map inference. In: Proceedings of Statistical Relational Artificial Intelligence Workshop at the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 64–67. AAAI Press, Bellevue (2013)

  63. Sarkhel, S., Venugopal, D., Singla, P., Gogate, V.: Lifted MAP inference for markov logic networks. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, vol. 33, pp. 859–867. JMLR: W & CP, Reykjavik (2014a)

  64. Sarkhel, S., Venugopal, D., Singla, P., Gogate, V.G.: An integer polynomial programming based framework for lifted map inference. In: Advances in Neural Information Processing Systems, pp 3302–3310 (2014b)

  65. Schoenfisch, J., Meilicke, C., von Stülpnagel, J., Ortmann, J., Stuckenschmidt, H.: Root cause analysis in it infrastructures using ontologies and abduction in markov logic networks. Inf. Syst. 74, 103–116 (2018)

    Google Scholar 

  66. Selman, B., Kautz, H., Cohen, B., et al.: Local search strategies for satisfiability testing. Cliques, coloring, and satisfiability: Second DIMACS implementation challenge 26, 521–532 (1993)

  67. Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM, pp 572–582. IEEE Computer Society (2006a)

  68. Singla, P., Domingos, P.: Memory-efficient inference in relational domains. In: Proceedings of the Twenty-first National Conference on Artificial Intelligence (AAAI-06), vol. 6, pp 488–493. AAAI Press, Boston (2006b)

  69. Skarlatidis, A.: Logical markov random fields (lomrf): an open-source implementation of markov logic networks. https://github.com/anskarl/LoMRF (2012)

  70. Slaney, J., Walsh, T.: Backbones in optimization and approximation. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, vol. 1, pp. 254–259. Morgan Kaufmann Publishers Inc., Seattle (2001)

  71. Szeliski, R.: Image alignment and stitching: a tutorial. Found. Trends®; Comput. Graph. Vis. 2(1), 1–104 (2006). Now Publishers Inc.

    MathSciNet  MATH  Google Scholar 

  72. Wainwright, M., Jaakkola, T., Willsky, A.: Tree consistency and bounds on the performance of the max-product algorithm and its generalizations. Stat. Comput. 14(2), 143–166 (2004). Springer

    MathSciNet  Google Scholar 

  73. Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches. IEEE Transactions on Information Theory, vol. 51, pp. 3697–3717. IEEE computer society (2005)

  74. Weiss, Y., Freeman, W.T.: On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Trans. Inf. Theory 47(2), 736–744 (2001). IEEE computer Society

    MathSciNet  MATH  Google Scholar 

  75. Yanover, C., Meltzer, T., Weiss, Y.: Linear programming relaxations and belief propagation–an empirical study. J. Mach. Learn. Res. 7, 1887–1907 (2006). JMLR. org

    MathSciNet  MATH  Google Scholar 

  76. Zhang, W.: Phase transitions and backbones of the asymmetric traveling salesman problem. J. Artif. Intell. Res. (JAIR) 21, 471–497 (2004). AAAI Press

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamed-Hamza Ibrahim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Derivation of WSP-χ’s update equations

Appendix A: Derivation of WSP-χ’s update equations

Here we derive the update equations for WSP-χ’s message passing. For simplicity, and without lose of generality, we consider the derivation of WSP-1 — a pure version of WSP-χ on \(\hat {\mathcal {G}}\) when setting χ = 1 and γ = 0 in (8).

1.1 A1. Variable-to-factor

Let us start here by computing the update of the component \(\mu ^{s}_{X_{j} \rightarrow \hat {f_{i}}}\). This component represents the probability that Xj is constrained by other extended factors to satisfy \(\hat {f_{i}}\), and therefore, it is specified by the event that the variable Xj = si,j and its mega-node \(P_{j} = Z^{j} \cup \{\hat {f_{i}}\}\). If we use \(P_{j} = Z^{j} \cup \{\hat {f_{i}}\}\) as a notation representing the following event for a ground atom Xj

$$ \hat{f_{i}} \in P_{j} \text{and} Z^{j} = P_{j} \setminus \{\hat{f_{i}}\} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j) $$
(18)

Then we can compute \(\mu ^{s}_{X_{j} \rightarrow \hat {f_{i}}}\) as follows:

$$ \begin{array}{@{}rcl@{}} \mu^{s}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=s_{i,j}, P_{j} = Z^{j} \cup\{\hat{f_{i}}\} \right\} \end{array} $$
(19a)
$$ \begin{array}{@{}rcl@{}} \mu^{s}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \prod\limits_{\hat{f_{k}} \in Z^{j}} \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j) \setminus Z^{j}} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$
(19b)
$$ \begin{array}{@{}rcl@{}} & =& \left[ \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left( \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right) \right] \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$
(19c)

Similarly for \(\mu ^{u}_{X_{j} \rightarrow \hat {f_{i}}}\). This component is specified by the event that Xj = ui,j and its mega-node \(P_{j} \subseteq \mathcal {F}^{u}_{\hat {f_{i}}}(j)\). Thus, we have:

$$ \begin{array}{@{}rcl@{}} \mu^{u}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \left\{\eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=u_{i,j}, P_{j} = Z^{j} \right\} \end{array} $$
(20a)
$$ \begin{array}{@{}rcl@{}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(j), Z^{j} \neq \emptyset} \prod\limits_{\hat{f_{k}} \in Z^{j}} \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j) \setminus Z^{j}} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$
(20b)
$$ \begin{array}{@{}rcl@{}} && \ - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \\ & =& \left[ \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \left( \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right] \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$
(20c)

Finally, computing \(\mu ^{*}_{X_{j} \rightarrow \hat {f_{i}}}\) is specified by the event that Xj = si,j with \(P_{j} = \mathcal {F}^{s}_{\hat {f_{i}}}(j)\), and Xj = ∗ with Pj = ∅. Thus we have the following:

$$ \begin{array}{@{}rcl@{}} \mu^{*}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left\{\eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=s_{i,j}, P_{j} = Z^{j} \right\} + \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=*, P_{j} = \emptyset \right\} \end{array} $$
(21a)
$$ \begin{array}{@{}rcl@{}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j), Z^{j} \neq \emptyset} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \\ && - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} + \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$
(21b)
$$ \begin{array}{@{}rcl@{}} & =& \left[ \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left( \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right] \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \\ && + \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j) \cup \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$
(21c)

1.2 A2. Factor-to-Variables

Let us start here with the component \(\eta ^{s}_{\hat {f_{i}} \rightarrow X_{j}}\). This component implies that Xj = si,j and \(\hat {f_{i}} \in P_{j}\), and that the only possible assignment for the other ground atoms \(X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}\) is ui,k and their mega-nodes are \(P_{k} \subseteq \mathcal {F}^{u}_{\hat {f_{i}}}(k)\). That is, it takes the form:

$$ \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} = \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \overbrace{\sum\limits_{P_{k} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=u_{i,k}, P_{k} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(k)\right\}}^{\text{From Eq.}~(20a) \text{this equals} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}}} \right) \times \overbrace{e^{\hat{w}_{i} \cdot y}}^{\text{a reward term, see (6)}} $$
(22)

Note that since the component \(\eta ^{s}_{\hat {f_{i}} \rightarrow X_{j}}\) is constrained to satisfy \(\hat {f_{i}}\), we multiply right hand side of (22) by the term \(e^{\hat {w}_{i} \cdot y}\) which is the reward term of satisfying \(\hat {f_{i}}\). Now, using the definition of \(\mu ^{u}_{X_{k} \rightarrow \hat {f_{i}}}\) from (20a) into (22), we obtain the following:

$$ \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} = \left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \right] \times e^{\hat{w}_{i} \cdot y} $$
(23)

Now moving to the component \(\eta ^{u}_{\hat {f_{i}} \rightarrow X_{j}}\). This component represents the probability that Xj can violate \(\hat {f_{i}}\). That is to say, we have Xj = ui,j and \(P_{j} \subseteq \mathcal {F}^{u}_{\hat {f_{i}}}(j)\). This probability implies a combination of three possibilities (having weights labeled as W1,W2 and W3) for the other ground atoms \(X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}\) in a potential complete assignment:

  1. 1.

    There is one ground atom in \(\mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}\) satisfying \(\hat {f_{i}}\), and all the other ground atoms are violating it

    $$ \begin{array}{@{}rcl@{}} \text{W}_{1} & =& \sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \overbrace{\sum\limits_{Z^{k} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(k)} \left\{ \mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \cup \{\hat{f_{i}}\} \right\} }^{\text{From Eq.}~(19a) \text{this equals} \mu^{s}_{X_{k} \rightarrow \hat{f_{i}}}} \\ && \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \overbrace{\sum\limits_{Z^{i} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(i)} \left\{ \mu_{X_{i} \rightarrow \hat{f_{i}}} \bigg| X_{i}=u_{i,i}, P_{i} = Z^{i}\} \right\} }^{\text{From Eq.}~(20a) \text{this equals} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}}} \end{array} $$
    (24a)
    $$ \begin{array}{@{}rcl@{}} & = &\sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{s}_{X_{k} \rightarrow \hat{f_{i}}} \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}} \end{array} $$
    (24b)
  2. 2.

    There are two or more ground atoms in \(\mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}\) satisfying \(\hat {f_{i}}\) or equal joker ∗, and all other ground atoms are violating it

    $$ \begin{array}{@{}rcl@{}} \text{W}_{2} & =& \sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left[\sum\limits_{Z^{k} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \right\} + \left\{ \mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=*, P_{k} =\emptyset \right\} \right] \\ && \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \sum\limits_{Z^{i} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(i)} \left\{\mu_{X_{i} \rightarrow \hat{f_{i}}} \bigg| X_{i}=u_{i,i}, P_{i} = Z^{i} \right\} \end{array} $$
    (25a)
    $$ \begin{array}{@{}rcl@{}} & =& \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left[\mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} + \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \right] - \sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}} \\ && - \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \end{array} $$
    (25b)

    Note that the weight assigned to the event that each ground atom is either satisfying or ∗ is \({\prod }_{X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}} \left [\mu ^{u}_{X_{k} \rightarrow \hat {f_{i}}} + \mu ^{*}_{X_{k} \rightarrow \hat {f_{i}}} \right ]\), and the weight W2 is given by subtracting from this quantity the weight assigned to the event that there are not at least two joker ground atoms ∗ or satisfying. This event is a combination of two disjoint events that either all other ground atoms in \(X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}\) are violating (which weight \({\prod }_{X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}} \mu ^{u}_{X_{k} \rightarrow \hat {f_{i}}}\)) or that only one ground atom is ∗ or satisfying (with weight \({\sum }_{X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}} \mu ^{*}_{X_{k} \rightarrow \hat {f_{i}}} \times {\prod }_{X_{i} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{k},X_{j}\}} \mu ^{u}_{X_{i} \rightarrow \hat {f_{i}}}\)).

  3. 3.

    All other ground atoms in \(\mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}\) are violating \(\hat {f_{i}}\). So here, there is a penalty term \(e^{-\hat {w}_{i} \cdot y}\) of violating \(\hat {f_{i}}\) when updating the message:

    $$ \begin{array}{@{}rcl@{}} \text{W}_{3} & = &\left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \overbrace{\sum\limits_{Z^{k} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \right\}}^{\text{From Eq.}~(6) \text{this equals} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}}} \right] \times \overbrace{e^{-\hat{w}_{i} \cdot y}}^{\text{A penalty term, see (6)}} \end{array} $$
    (26a)
    $$ \begin{array}{@{}rcl@{}} & =& \left[ \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \right] \times e^{-\hat{w}_{i} \cdot y} \end{array} $$
    (26b)

Now, bringing together the weight forms of W1, W2, and W3 from (24b), (25b) and (26b) results in:

$$ \begin{array}{@{}rcl@{}} &&\eta^{u}_{\hat{f_{i}} \rightarrow X_{j}} = \left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} + \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \right) + \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j},X_{k}\}} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}}\right. \end{array} $$
$$ \begin{array}{@{}rcl@{}} &&\left.\sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \mu^{s}_{X_{k} \rightarrow \hat{f_{i}}} - \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \right) \right] - \left[ \overbrace{(1-e^{-\hat{w}_{i} \cdot y})}^{\text{penalty}} \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \right] \end{array} $$
(27)

Finally, the component \(\eta ^{*}_{\hat {f_{i}} \rightarrow X_{j}}\) represents the probability that Xj can be unconstrained by \(\hat {f_{i}}\). This probability is a combination of two possibilities: either Xj is satisfying \(\hat {f_{i}}\) and all other ground atoms are unconstrained, or Xj is unconstrained (i.e., Xj = ∗ with Pi = ∅). So we have:

$$ \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} = {\sum}_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left[ \sum\limits_{Z^{k} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \right\} + \left\{ \mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=*, P_{k} =\emptyset \right\} \right] x $$
(28)

Note that the first part of (25a) and (25b) is identical to (28). Thus, we substitute the computation of this part from (25a) and (25b) into (28), and we have:

$$ \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} = \left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}}+\mu^{*}_{X_{k} \rightarrow \hat{f_{i}}}\right) \right] - \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} $$
(29)

1.3 A3. Estimating the Marginals

Now let us explain the derivation of ground atoms’ marginals over max-cores in \(\hat {\mathcal {G}}\). Computing the unnormalized positive marginal of a ground atom Xj requires multiplying the satisfying income messages from the ground clauses in which Xj appears positively by the violating income messages from the ground clauses in which Xj appears negatively:

$$ \begin{array}{@{}rcl@{}} \tilde{\theta}^{+}_{j} & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{s}(j)} {\sum}_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=s_{i,j}, P_{j} = \mathcal{F}^{s}(j) \right\} \times \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{u}(j)} {\sum}_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=u_{i,j}, P_{j} = \mathcal{F}^{u}(j) \right\} \end{array} $$
(30a)
$$ \begin{array}{@{}rcl@{}} & =& {\prod}_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \sum\limits_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=+, P_{j} = \mathcal{F}^{+}(j) \right\} \times \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \sum\limits_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=-, P_{j} = \mathcal{F}^{-}(j) \right\} \end{array} $$
(30b)
$$ \begin{array}{@{}rcl@{}} & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \eta^{u}_{\hat{f_{i}} \rightarrow X_{j}} \times \left[ \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \left( \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right] \end{array} $$
(30c)

Similarly, we can obtain the unnormalized negative marginal by multiplying the satisfying income messages from the factors in which Xi appears negatively by the violating income messages from the factors in which Xi appears positively:

$$ \tilde{\theta}^{-}_{j} = \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \eta^{u}_{\hat{f_{i}} \rightarrow X_{j}} \times \left[ \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \left( \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right] $$
(31)

Finally, we can estimate the unnormalized joker marginal by multiplying all the unconstrained incoming messages from all factors in which Xj appears:

$$ \begin{array}{@{}rcl@{}} \tilde{\theta}^{*}_{j} & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}(j)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=*, P_{j} = \emptyset \right\} \\ & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}(j)} \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \end{array} $$
(32a)

Now by normalizing the quantities in (30c), (31) and (32a), we obtain the marginal of Xj as follows:

$$ \begin{array}{@{}rcl@{}} \theta^{+}_{j} & =& \mathcal{Z}_{j}^{-1} \tilde{\theta}^{+}_{j} \end{array} $$
(33a)
$$ \begin{array}{@{}rcl@{}} \theta^{-}_{j} & =& \mathcal{Z}_{j}^{-1} \tilde{\theta}^{-}_{j} \end{array} $$
(33b)
$$ \begin{array}{@{}rcl@{}} \theta^{*}_{j} & =& \mathcal{Z}_{j}^{-1} \tilde{\theta}^{*}_{j} \end{array} $$
(33c)

and

$$ \mathcal{Z}_{j} = \tilde{\theta}^{+}_{j} + \tilde{\theta}^{-}_{j} + \tilde{\theta}^{*}_{j} $$
(34)

where \(\mathcal {Z}_{i}\) is the normalizing constant, given the evidence E.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibrahim, MH., Pal, C. & Pesant, G. Leveraging cluster backbones for improving MAP inference in statistical relational models. Ann Math Artif Intell 88, 907–949 (2020). https://doi.org/10.1007/s10472-020-09698-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-020-09698-z

Keywords

Mathematics Subject Classification (2010)

Navigation