Leveraging cluster backbones for improving MAP inference in statistical relational models

Ibrahim, Mohamed-Hamza; Pal, Christopher; Pesant, Gilles

doi:10.1007/s10472-020-09698-z

Leveraging cluster backbones for improving MAP inference in statistical relational models

Published: 07 May 2020

Volume 88, pages 907–949, (2020)
Cite this article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Mohamed-Hamza Ibrahim^1,2,
Christopher Pal¹ &
Gilles Pesant¹

70 Accesses
Explore all metrics

Abstract

A wide range of important problems in machine learning, expert system, social network analysis, bioinformatics and information theory can be formulated as a maximum a-posteriori (MAP) inference problem on statistical relational models. While off-the-shelf inference algorithms that are based on local search and message-passing may provide adequate solutions in some situations, they frequently give poor results when faced with models that possess high-density networks. Unfortunately, these situations always occur in models of real-world applications. As such, accurate and scalable maximum a-posteriori (MAP) inference on such models often remains a key challenge. In this paper, we first introduce a novel family of extended factor graphs that are parameterized by a smoothing parameter χ ∈ [0,1]. Applying belief propagation (BP) message-passing to this family formulates a new family of W eighted S urvey P ropagation algorithms (WSP-χ) applicable to relational domains. Unlike off-the-shelf inference algorithms, WSP-χ detects the “backbone” ground atoms in a solution cluster that involve potentially optimal MAP solutions: the cluster backbone atoms are not only portions of the optimal solutions, but they also can be exploited for scaling MAP inference by iteratively fixing them to reduce the complex parts until the network is simplified into one that can be solved accurately using any conventional MAP inference method. We also propose a lazy variant of this WSP-χ family of algorithms. Our experiments on several real-world problems show the efficiency of WSP-χ and its lazy variants over existing prominent MAP inference solvers such as MaxWalkSAT, RockIt, IPP, SP-Y and WCSP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Density-Based Clustering Based on Hierarchical Density Estimates

Conditional probability table limit-based quantization for Bayesian networks: model quality, data fidelity and structure score

Article Open access 03 April 2024

Rafael Rodrigues Mendes Ribeiro, Jordão Natal, … Carlos Dias Maciel

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Rosember Guerra-Urzola, Katrijn Van Deun, … Klaas Sijtsma

References

Achlioptas, D., Ricci-Tersenghi, F.: Random formulas have frozen variables. SIAM J Comput 39(1), 260–280 (2009). SIAM
MathSciNet MATH Google Scholar
Ahmadi, B., Kersting, K., Mladenov, M., Natarajan, S.: Exploiting symmetries for scaling loopy belief propagation and relational training, vol. 92 (2013)
Allouche, D., de Givry, S., Schiex, T.: Toulbar2 an Open Source Exact Cost Function Network Solver. Technical report, INRIA (2010)
Amirian, M.M., Ghidary, S.S.: Xeggora: Exploiting immune-to-evidence symmetries with full aggregation in statistical relational models. J. Artif. Intell. Res. 66, 33–56 (2019)
MathSciNet MATH Google Scholar
Battaglia, D., Kolár, M., Zecchina, R.: Minimizing energy below the glass thresholds. Phys. Rev. E 70, 36107–36118 (2004)
MathSciNet Google Scholar
Besag, J.: On the statistical analysis of dirty pictures. J R Stat Soc Series B stat Methodol 48(3), 259–279 (1986)
MathSciNet MATH Google Scholar
Braunstein, A., Zecchina, R.: Survey and belief propagation on random k-sat. In: Proceedings of the 7th International Conference on Theory and Applications of Satisfiability Testing, vol. 2919, pp. 519–528. Springer, Vancouver (2004)
Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for satisfiability. Random Struct. Algorithm. 27(2), 201–226 (2005)
MathSciNet MATH Google Scholar
Chavas, J., Furtlehner, C., Mézard, M., Zecchina, R.: Survey-propagation decimation through distributed local computations. J. Stat. Mech. Theory Exper. 2005 (11), 11016–11027 (2005). IOP Publishing
MATH Google Scholar
Chieu, H.L., Lee, W.S.: Relaxed survey propagation for the weighted maximum satisfiability problem. J. Artif. Intell. Res. (JAIR) 36, 229–266 (2009)
MathSciNet MATH Google Scholar
Chieu, H.L., Lee, W.S., Teh, Y.W.: Cooled and relaxed survey propagation for mrfs. In: Proceedings of the 21st Annual Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems, vol. 20, pp. 297–304, Vancouver. Curran Associates, Inc. (2007)
Conaty, D., Maua, D., de Campos, C.: Approximation complexity of maximum a posteriori inference in sum-product networks. In: Proceedings of The 33rd Conference on Uncertainty in Artificial Intelligence, AUAI (2017)
Davis, J., Domingos, P.: Deep Transfer via Second-Order Markov Logic. In: Proceedings of the 26Th International Conference on Machine Learning (ICML-09), Montreal (2009)
De Salvo Braz, R., Amir, E., Roth, D.: Lifted first-order probabilistic inference. In: Proceedings of the 19th International joint conference in artificial intelligent, pp. 1319–1325. AAAI Press (2005)
De Salvo Braz, R., Amir, E., Roth, D.: Mpe and partial inversion in lifted probabilistic variable elimination. In: Proceedings Of The Twenty-first National Conference On Artificial Intelligence, vol. 6, pp. 1123–1130. AAAI press, Boston (2006)
Forney, G.D.: The viterbi algorithm. Proc. IEEE 61(3), 268–278 (1973). IEEE computer Society
MathSciNet Google Scholar
Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. Adaptive Computation and Machine Learning. The MIT Press (2007)
Gomes, C., Hogg, T., Walsh, T., Zhang, W.: Tutorial - Phase Transitions and Structure in Combinatorial Problems. In: Proceedings Of The Eighteenth National Conference On Artificial Intelligence. AAAI Press, Edmonton (2002)
Granville, V., Krivánek, M., Rasson, J.P.: Simulated annealing: a proof of convergence. IEEE Trans. Pattern Anal. Mach. Intell. 16(6), 652–656 (1994). IEEE computer society
Google Scholar
Hartmann, A.K., Weigt, M.: Phase transitions in combinatorial optimization problems: basics, algorithms and statistical mechanics. Wiley, New York (2006)
Huynh, T.N., Mooney, R.J.: Max-margin weight learning for markov logic networks. In: Machine Learning and Knowledge Discovery in Databases, vol. 5781, pp. 564–579. Springer (2009)
Ibrahim, M.H., Pal, C., Pesant, G.: Exploiting determinism to scale relational inference. In: Proceedings of the Twenty-Ninth National Conference on Artificial Intelligence (AAAI’15), pp. 1756–1762. AAAI Press, Austin (2015)
Jain, D., Maier, P., Wylezich, G.: Markov Logic as a Modelling Language for Weighted Constraint Satisfaction Problems. In: Eighth International Workshop on Constraint Modelling and Reformulation, in conjunction with CP (2009)
Kambhampati, S.C., Liu, T.: Phase transition and network structure in realistic sat problems. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 1619–1620. AAAI Press, Washington (2013)
Kautz, H., Selman, B., Jiang, Y.: A general stochastic approach to solving problems with hard and soft constraints. Satisfiab Problem Theory Appl. 17, 573–586 (1997)
MathSciNet MATH Google Scholar
Kazemi, S.M., Kimmig, A., Van den Broeck, G., Poole, D.: New liftable classes for first-order probabilistic inference. In: Advances in Neural Information Processing Systems, pp. 3117–3125 (2016)
Kersting, K.: Lifted probabilistic inference. In: Proceedings of 20th European Conference on Artificial Intelligence (ECAI–2012), vol. 27-31, pp. 33–38. ECCAI, Montpellier (2012)
Khosla, M., Melhorn, K., Panagiotou, K.: Message Passing Algorithms, PhD thesis. Citeseer (2009)
Kiddon, C., Domingos, P.: Coarse-to-fine inference and learning for first-order probabilistic models. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, pp. 1049–1056. AAAI Press, San Francisco (2011)
Kilby, P., Slaney, J., Thiébaux, S., Walsh, T.: Backbones and backdoors in satisfiability. In: Proceedings of the The Twentieth National Conference on Artificial Intelligence and the Seventeenth Innovative Applications of Artificial Intelligence Conference, vol. 5, pp. 1368–1373. AAAI Press, Pittsburgh (2005)
Kok, S., Singla, P., Richardson, M., Domingos, P., Sumner, M., Poon, H., Lowd, D.: The Alchemy System for Statistical Relational AI. In: Technical Report Department of Computer Science and Engineering, University of Washington, Seattle. http://alchemy.cs.washington.edu (2007)
Kolmogorov, V.: Convergent tree-reweighted message passing for energy minimization. IEEE Trans. Pattern Anal. Mach. Intell. 28(10), 1568–1583 (2006)
Google Scholar
Kroc, L., Sabharwal, A., Selman, B.: Survey propagation revisited. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligence, pp. 217–226. AUAI Press, Vancouver (2007)
Kroc, L., Sabharwal, A., Selman, B.: Counting solution clusters in graph coloring problems using belief propagation. In: Proceedings of 22nd Conference on Neural Information Processing Systems: Advances in Neural Information Processing Systems, vol. 21, pp. 873–880. Curran Associates Inc., Vancouver (2008)
Kroc, L., Sabharwal, A., Selman, B.: Message-passing and local heuristics as decimation strategies for satisfiability. In: Proceedings of the 2009 ACM symposium on Applied Computing, pp. 1408–1414. ACM (2009)
Kumar, M.P., Torr, P.H.: Efficiently solving convex relaxations for map estimation. In: Proceedings of the 25th international conference on Machine learning, pp. 680–687. ACM, Helsinki (2008)
Lauritzen, S.L., Spiegelhalter, D.J.: Local computations with probabilities on graphical structures and their application to expert systems. J. R. Stat. Soc. Ser. B (Methodol.) 50, 157–224 (1988)
MathSciNet MATH Google Scholar
Lowd, D., Domingos, P.: Efficient weight learning for markov logic networks. In: Proceedings of 11th European Conference on Principles and Practice of Knowledge Discovery in Databases PKDD 2007, pp. 200–211. Springer, Warsaw (2007)
Lüdtke, S., Schröder, M., Krüger, F., Bader, S., Kirste, T.: State-space abstractions for probabilistic inference: a systematic review. J. Artif. Intell. Res. 63, 789–848 (2018)
MathSciNet MATH Google Scholar
Maneva, E., Mossel, E., Wainwright, M.J.: A new look at survey propagation and its generalizations. J. ACM (JACM) 54(4), 17–21 (2007). ACM
MathSciNet MATH Google Scholar
Mann, A., Hartmann, A.: Numerical solution-space analysis of satisfiability problems. Phys. Rev. E 82(5), 056702–56707. APS (2010)
Mei, J., Jiang, Y., Tu, K.: Maximum a posteriori inference in sum-product networks. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 1923–1930 (2018)
Meilicke, C., Leopold, H., Kuss, E., Stuckenschmidt, H., Reijers, H.A.: Overcoming individual process model matcher weaknesses using ensemble matching. Decis. Support. Syst. 100, 15–26 (2017)
Google Scholar
Molina, A., Vergari, A., Stelzner, K., Peharz, R., Subramani, P., Mauro, N.D., Poupart, P., Kersting, K.: Spflow: An easy and extensible library for deep probabilistic learning using sum-product networks. arXiv:1901.03704(2019)
Montanari, A., Parisi, G., Ricci-Tersenghi, F.: Instability of one-step replica-symmetry-broken phase in satisfiability problems. J. Phys. A: Math. Gen. 37 (6), 2073–2079 (2004). IOP Publishing
MathSciNet MATH Google Scholar
Natarajan, S., Tadepalli, P., Dietterich, T.G., Fern, A.: Learning first-order probabilistic models with combining rules. Ann. Math. Artif. Intell. 54(1-3), 223–256 (2008)
MathSciNet MATH Google Scholar
Nath, A., Domingos, P.M.: Learning relational sum-product networks. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp 2878–2886 (2015)
Ng, K.S., Lloyd, J.W., Uther, W.T.: Probabilistic modelling, inference and learning using logical theories. Ann. Math. Artif. Intell. 54(1-3), 159–205 (2008)
MathSciNet MATH Google Scholar
Niu, F., Ré, C., Doan, A., Shavlik, J.: Tuffy: Scaling up statistical inference in markov logic networks using an rdbms. Proc. VLDB Endow. 4(6), 373–384 (2011)
Google Scholar
Noessner, J., Niepert, M., Stuckenschmidt, H.: Rockit: Exploiting Parallelism and Symmetry for Map Inference in Statistical Relational Models. In: Twenty-Seventh AAAI Conference on Artificial Intelligence (2013)
Papai, T., Singla, P., Kautz, H.: Constraint propagation for efficient inference in markov logic. In: Proceedings of 17th International Conference on Principles and Practice of Constraint Programming (CP 2011), no. 6876 in Lecture Notes in Computer Science (LNCS), pp 691–705 (2011)
Park, J.D.: Using weighted max-sat engines to solve mpe. In: Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 682–687. AAAI Press, Menlo Park (2002)
Parkes, A.J.: Clustering at the phase transition. In: Proceedings of the 14th National Conference on Artificial Intelligence, pp. 340–345. AAAI Press. at the convention center in Providence (1997)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers Inc., San Francisco (1988)
Peharz, R., Gens, R., Pernkopf, F., Domingos, P.: On the latent variable interpretation in sum-product networks. IEEE Trans. Pattern Anal. Mach. Intell. 39 (10), 2030–2044 (2016)
Google Scholar
Poon, H., Domingos, P.: Sum-Product Networks: a New Deep Architecture. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 689–690. IEEE (2011)
Poon, H., Domingos, P., Sumner, M.: A general method for reducing the complexity of relational inference and its application to mcmc. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, pp 1075–1080. AAAI Press, Chicago (2008)
Ravikumar, P., Lafferty, J.: Quadratic programming relaxations for metric labeling and markov random field map estimation. In: Proceedings of the 23rd international conference on Machine learning, pp. 737–744. ACM (2006)
Richardson, M., Domingos, P.: Markov logic networks. Mach. Learn. 62(1-2), 107–136 (2006). Kluwer Academic Publishers
Google Scholar
Riedel, S.: Improving the accuracy and efficiency of map inference for markov logic. In: UAI, pp 468–475. AUAI Press (2008)
Rooshenas, A., Lowd, D.: Learning sum-product networks with direct and indirect variable interactions. In: International Conference on Machine Learning, pp 710–718 (2014)
Sarkhel, S., Gogate, V.: Lifting walksat-based local search algorithms for map inference. In: Proceedings of Statistical Relational Artificial Intelligence Workshop at the Twenty-Seventh AAAI Conference on Artificial Intelligence, pp. 64–67. AAAI Press, Bellevue (2013)
Sarkhel, S., Venugopal, D., Singla, P., Gogate, V.: Lifted MAP inference for markov logic networks. In: Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, vol. 33, pp. 859–867. JMLR: W & CP, Reykjavik (2014a)
Sarkhel, S., Venugopal, D., Singla, P., Gogate, V.G.: An integer polynomial programming based framework for lifted map inference. In: Advances in Neural Information Processing Systems, pp 3302–3310 (2014b)
Schoenfisch, J., Meilicke, C., von Stülpnagel, J., Ortmann, J., Stuckenschmidt, H.: Root cause analysis in it infrastructures using ontologies and abduction in markov logic networks. Inf. Syst. 74, 103–116 (2018)
Google Scholar
Selman, B., Kautz, H., Cohen, B., et al.: Local search strategies for satisfiability testing. Cliques, coloring, and satisfiability: Second DIMACS implementation challenge 26, 521–532 (1993)
Singla, P., Domingos, P.: Entity resolution with markov logic. In: ICDM, pp 572–582. IEEE Computer Society (2006a)
Singla, P., Domingos, P.: Memory-efficient inference in relational domains. In: Proceedings of the Twenty-first National Conference on Artificial Intelligence (AAAI-06), vol. 6, pp 488–493. AAAI Press, Boston (2006b)
Skarlatidis, A.: Logical markov random fields (lomrf): an open-source implementation of markov logic networks. https://github.com/anskarl/LoMRF (2012)
Slaney, J., Walsh, T.: Backbones in optimization and approximation. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence, vol. 1, pp. 254–259. Morgan Kaufmann Publishers Inc., Seattle (2001)
Szeliski, R.: Image alignment and stitching: a tutorial. Found. Trends®; Comput. Graph. Vis. 2(1), 1–104 (2006). Now Publishers Inc.
MathSciNet MATH Google Scholar
Wainwright, M., Jaakkola, T., Willsky, A.: Tree consistency and bounds on the performance of the max-product algorithm and its generalizations. Stat. Comput. 14(2), 143–166 (2004). Springer
MathSciNet Google Scholar
Wainwright, M., Jaakkola, T., Willsky, A.: MAP estimation via agreement on (hyper)trees: Message-passing and linear programming approaches. IEEE Transactions on Information Theory, vol. 51, pp. 3697–3717. IEEE computer society (2005)
Weiss, Y., Freeman, W.T.: On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs. IEEE Trans. Inf. Theory 47(2), 736–744 (2001). IEEE computer Society
MathSciNet MATH Google Scholar
Yanover, C., Meltzer, T., Weiss, Y.: Linear programming relaxations and belief propagation–an empirical study. J. Mach. Learn. Res. 7, 1887–1907 (2006). JMLR. org
MathSciNet MATH Google Scholar
Zhang, W.: Phase transitions and backbones of the asymmetric traveling salesman problem. J. Artif. Intell. Res. (JAIR) 21, 471–497 (2004). AAAI Press
MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Software Engineering, École Polytechnique de Montréal, Montréal, Québec, Canada
Mohamed-Hamza Ibrahim, Christopher Pal & Gilles Pesant
Department of Mathematics, Faculty of Science, Zagazig University, Zagazig, Egypt
Mohamed-Hamza Ibrahim

Authors

Mohamed-Hamza Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Pal
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Pesant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed-Hamza Ibrahim.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Derivation of WSP-χ’s update equations

Here we derive the update equations for WSP-χ’s message passing. For simplicity, and without lose of generality, we consider the derivation of WSP-1 — a pure version of WSP-χ on $\hat {\mathcal {G}}$ when setting χ = 1 and γ = 0 in (8).

1.1 A1. Variable-to-factor

Let us start here by computing the update of the component $\mu ^{s}_{X_{j} \rightarrow \hat {f_{i}}}$. This component represents the probability that X_j is constrained by other extended factors to satisfy $\hat {f_{i}}$, and therefore, it is specified by the event that the variable X_j = s_i,j and its mega-node $P_{j} = Z^{j} \cup \{\hat {f_{i}}\}$. If we use $P_{j} = Z^{j} \cup \{\hat {f_{i}}\}$ as a notation representing the following event for a ground atom X_j

$$ \hat{f_{i}} \in P_{j} \text{and} Z^{j} = P_{j} \setminus \{\hat{f_{i}}\} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j) $$

(18)

Then we can compute $\mu ^{s}_{X_{j} \rightarrow \hat {f_{i}}}$ as follows:

$$ \begin{array}{@{}rcl@{}} \mu^{s}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=s_{i,j}, P_{j} = Z^{j} \cup\{\hat{f_{i}}\} \right\} \end{array} $$

(19a)

$$ \begin{array}{@{}rcl@{}} \mu^{s}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \prod\limits_{\hat{f_{k}} \in Z^{j}} \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j) \setminus Z^{j}} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$

(19b)

$$ \begin{array}{@{}rcl@{}} & =& \left[ \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left( \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right) \right] \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$

(19c)

Similarly for $\mu ^{u}_{X_{j} \rightarrow \hat {f_{i}}}$. This component is specified by the event that X_j = u_i,j and its mega-node $P_{j} \subseteq \mathcal {F}^{u}_{\hat {f_{i}}}(j)$. Thus, we have:

$$ \begin{array}{@{}rcl@{}} \mu^{u}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \left\{\eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=u_{i,j}, P_{j} = Z^{j} \right\} \end{array} $$

(20a)

$$ \begin{array}{@{}rcl@{}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(j), Z^{j} \neq \emptyset} \prod\limits_{\hat{f_{k}} \in Z^{j}} \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j) \setminus Z^{j}} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$

(20b)

$$ \begin{array}{@{}rcl@{}} && \ - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \\ & =& \left[ \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \left( \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right] \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$

(20c)

Finally, computing $\mu ^{*}_{X_{j} \rightarrow \hat {f_{i}}}$ is specified by the event that X_j = s_i,j with $P_{j} = \mathcal {F}^{s}_{\hat {f_{i}}}(j)$, and X_j = ∗ with P_j = ∅. Thus we have the following:

$$ \begin{array}{@{}rcl@{}} \mu^{*}_{X_{j} \rightarrow \hat{f_{i}}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left\{\eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=s_{i,j}, P_{j} = Z^{j} \right\} + \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=*, P_{j} = \emptyset \right\} \end{array} $$

(21a)

$$ \begin{array}{@{}rcl@{}} & =& \sum\limits_{Z^{j} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(j), Z^{j} \neq \emptyset} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \\ && - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} + \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$

(21b)

$$ \begin{array}{@{}rcl@{}} & =& \left[ \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j)} \left( \eta^{s}_{\hat{f_{k}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \right] \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{u}_{\hat{f_{k}} \rightarrow X_{j}} \\ && + \prod\limits_{\hat{f_{k}} \in \mathcal{F}^{s}_{\hat{f_{i}}}(j) \cup \mathcal{F}^{u}_{\hat{f_{i}}}(j)} \eta^{*}_{\hat{f_{k}} \rightarrow X_{j}} \end{array} $$

(21c)

1.2 A2. Factor-to-Variables

Let us start here with the component $\eta ^{s}_{\hat {f_{i}} \rightarrow X_{j}}$. This component implies that X_j = s_i,j and $\hat {f_{i}} \in P_{j}$, and that the only possible assignment for the other ground atoms $X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}$ is u_i,k and their mega-nodes are $P_{k} \subseteq \mathcal {F}^{u}_{\hat {f_{i}}}(k)$. That is, it takes the form:

$$ \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} = \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \overbrace{\sum\limits_{P_{k} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=u_{i,k}, P_{k} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(k)\right\}}^{\text{From Eq.}~(20a) \text{this equals} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}}} \right) \times \overbrace{e^{\hat{w}_{i} \cdot y}}^{\text{a reward term, see (6)}} $$

(22)

Note that since the component $\eta ^{s}_{\hat {f_{i}} \rightarrow X_{j}}$ is constrained to satisfy $\hat {f_{i}}$, we multiply right hand side of (22) by the term $e^{\hat {w}_{i} \cdot y}$ which is the reward term of satisfying $\hat {f_{i}}$. Now, using the definition of $\mu ^{u}_{X_{k} \rightarrow \hat {f_{i}}}$ from (20a) into (22), we obtain the following:

$$ \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} = \left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \right] \times e^{\hat{w}_{i} \cdot y} $$

(23)

Now moving to the component $\eta ^{u}_{\hat {f_{i}} \rightarrow X_{j}}$. This component represents the probability that X_j can violate $\hat {f_{i}}$. That is to say, we have X_j = u_i,j and $P_{j} \subseteq \mathcal {F}^{u}_{\hat {f_{i}}}(j)$. This probability implies a combination of three possibilities (having weights labeled as W₁,W₂ and W₃) for the other ground atoms $X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}$ in a potential complete assignment:

1.
There is one ground atom in $\mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}$ satisfying $\hat {f_{i}}$, and all the other ground atoms are violating it
$$ \begin{array}{@{}rcl@{}} \text{W}_{1} & =& \sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \overbrace{\sum\limits_{Z^{k} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(k)} \left\{ \mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \cup \{\hat{f_{i}}\} \right\} }^{\text{From Eq.}~(19a) \text{this equals} \mu^{s}_{X_{k} \rightarrow \hat{f_{i}}}} \\ && \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \overbrace{\sum\limits_{Z^{i} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(i)} \left\{ \mu_{X_{i} \rightarrow \hat{f_{i}}} \bigg| X_{i}=u_{i,i}, P_{i} = Z^{i}\} \right\} }^{\text{From Eq.}~(20a) \text{this equals} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}}} \end{array} $$
(24a)
$$ \begin{array}{@{}rcl@{}} & = &\sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{s}_{X_{k} \rightarrow \hat{f_{i}}} \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}} \end{array} $$
(24b)
2.
There are two or more ground atoms in $\mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}$ satisfying $\hat {f_{i}}$ or equal joker ∗, and all other ground atoms are violating it
$$ \begin{array}{@{}rcl@{}} \text{W}_{2} & =& \sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left[\sum\limits_{Z^{k} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \right\} + \left\{ \mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=*, P_{k} =\emptyset \right\} \right] \\ && \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \sum\limits_{Z^{i} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(i)} \left\{\mu_{X_{i} \rightarrow \hat{f_{i}}} \bigg| X_{i}=u_{i,i}, P_{i} = Z^{i} \right\} \end{array} $$
(25a)
$$ \begin{array}{@{}rcl@{}} & =& \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left[\mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} + \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \right] - \sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \times \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{k},X_{j}\}} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}} \\ && - \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \end{array} $$
(25b)
Note that the weight assigned to the event that each ground atom is either satisfying or ∗ is ${\prod }_{X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}} \left [\mu ^{u}_{X_{k} \rightarrow \hat {f_{i}}} + \mu ^{*}_{X_{k} \rightarrow \hat {f_{i}}} \right ]$, and the weight W₂ is given by subtracting from this quantity the weight assigned to the event that there are not at least two joker ground atoms ∗ or satisfying. This event is a combination of two disjoint events that either all other ground atoms in $X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}$ are violating (which weight ${\prod }_{X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}} \mu ^{u}_{X_{k} \rightarrow \hat {f_{i}}}$) or that only one ground atom is ∗ or satisfying (with weight ${\sum }_{X_{k} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}} \mu ^{*}_{X_{k} \rightarrow \hat {f_{i}}} \times {\prod }_{X_{i} \in \mathcal {X}_{\hat {f_{i}}} \setminus \{X_{k},X_{j}\}} \mu ^{u}_{X_{i} \rightarrow \hat {f_{i}}}$).
3.
All other ground atoms in $\mathcal {X}_{\hat {f_{i}}} \setminus \{X_{j}\}$ are violating $\hat {f_{i}}$. So here, there is a penalty term $e^{-\hat {w}_{i} \cdot y}$ of violating $\hat {f_{i}}$ when updating the message:
$$ \begin{array}{@{}rcl@{}} \text{W}_{3} & = &\left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \overbrace{\sum\limits_{Z^{k} \subseteq \mathcal{F}^{u}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \right\}}^{\text{From Eq.}~(6) \text{this equals} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}}} \right] \times \overbrace{e^{-\hat{w}_{i} \cdot y}}^{\text{A penalty term, see (6)}} \end{array} $$
(26a)
$$ \begin{array}{@{}rcl@{}} & =& \left[ \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \right] \times e^{-\hat{w}_{i} \cdot y} \end{array} $$
(26b)

Now, bringing together the weight forms of W₁, W₂, and W₃ from (24b), (25b) and (26b) results in:

$$ \begin{array}{@{}rcl@{}} &&\eta^{u}_{\hat{f_{i}} \rightarrow X_{j}} = \left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} + \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \right) + \prod\limits_{X_{i} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j},X_{k}\}} \mu^{u}_{X_{i} \rightarrow \hat{f_{i}}}\right. \end{array} $$

$$ \begin{array}{@{}rcl@{}} &&\left.\sum\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \mu^{s}_{X_{k} \rightarrow \hat{f_{i}}} - \mu^{*}_{X_{k} \rightarrow \hat{f_{i}}} \right) \right] - \left[ \overbrace{(1-e^{-\hat{w}_{i} \cdot y})}^{\text{penalty}} \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} \right] \end{array} $$

(27)

Finally, the component $\eta ^{*}_{\hat {f_{i}} \rightarrow X_{j}}$ represents the probability that X_j can be unconstrained by $\hat {f_{i}}$. This probability is a combination of two possibilities: either X_j is satisfying $\hat {f_{i}}$ and all other ground atoms are unconstrained, or X_j is unconstrained (i.e., X_j = ∗ with P_i = ∅). So we have:

$$ \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} = {\sum}_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left[ \sum\limits_{Z^{k} \subseteq \mathcal{F}^{s}_{\hat{f_{i}}}(k)} \left\{\mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=s_{i,k}, P_{k} = Z^{k} \right\} + \left\{ \mu_{X_{k} \rightarrow \hat{f_{i}}} \bigg| X_{k}=*, P_{k} =\emptyset \right\} \right] x $$

(28)

Note that the first part of (25a) and (25b) is identical to (28). Thus, we substitute the computation of this part from (25a) and (25b) into (28), and we have:

$$ \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} = \left[\prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \left( \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}}+\mu^{*}_{X_{k} \rightarrow \hat{f_{i}}}\right) \right] - \prod\limits_{X_{k} \in \mathcal{X}_{\hat{f_{i}}} \setminus \{X_{j}\}} \mu^{u}_{X_{k} \rightarrow \hat{f_{i}}} $$

(29)

1.3 A3. Estimating the Marginals

Now let us explain the derivation of ground atoms’ marginals over max-cores in $\hat {\mathcal {G}}$. Computing the unnormalized positive marginal of a ground atom X_j requires multiplying the satisfying income messages from the ground clauses in which X_j appears positively by the violating income messages from the ground clauses in which X_j appears negatively:

$$ \begin{array}{@{}rcl@{}} \tilde{\theta}^{+}_{j} & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{s}(j)} {\sum}_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=s_{i,j}, P_{j} = \mathcal{F}^{s}(j) \right\} \times \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{u}(j)} {\sum}_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=u_{i,j}, P_{j} = \mathcal{F}^{u}(j) \right\} \end{array} $$

(30a)

$$ \begin{array}{@{}rcl@{}} & =& {\prod}_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \sum\limits_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=+, P_{j} = \mathcal{F}^{+}(j) \right\} \times \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \sum\limits_{\mathcal{F}(i)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=-, P_{j} = \mathcal{F}^{-}(j) \right\} \end{array} $$

(30b)

$$ \begin{array}{@{}rcl@{}} & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \eta^{u}_{\hat{f_{i}} \rightarrow X_{j}} \times \left[ \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \left( \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right] \end{array} $$

(30c)

Similarly, we can obtain the unnormalized negative marginal by multiplying the satisfying income messages from the factors in which X_i appears negatively by the violating income messages from the factors in which X_i appears positively:

$$ \tilde{\theta}^{-}_{j} = \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{+}(j)} \eta^{u}_{\hat{f_{i}} \rightarrow X_{j}} \times \left[ \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \left( \eta^{s}_{\hat{f_{i}} \rightarrow X_{j}} + \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right) - \prod\limits_{\hat{f_{i}} \in \mathcal{F}^{-}(j)} \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \right] $$

(31)

Finally, we can estimate the unnormalized joker marginal by multiplying all the unconstrained incoming messages from all factors in which X_j appears:

$$ \begin{array}{@{}rcl@{}} \tilde{\theta}^{*}_{j} & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}(j)} \left\{ \eta_{\hat{f_{i}} \rightarrow X_{j}} \bigg| X_{j}=*, P_{j} = \emptyset \right\} \\ & =& \prod\limits_{\hat{f_{i}} \in \mathcal{F}(j)} \eta^{*}_{\hat{f_{i}} \rightarrow X_{j}} \end{array} $$

(32a)

Now by normalizing the quantities in (30c), (31) and (32a), we obtain the marginal of X_j as follows:

$$ \begin{array}{@{}rcl@{}} \theta^{+}_{j} & =& \mathcal{Z}_{j}^{-1} \tilde{\theta}^{+}_{j} \end{array} $$

(33a)

$$ \begin{array}{@{}rcl@{}} \theta^{-}_{j} & =& \mathcal{Z}_{j}^{-1} \tilde{\theta}^{-}_{j} \end{array} $$

(33b)

$$ \begin{array}{@{}rcl@{}} \theta^{*}_{j} & =& \mathcal{Z}_{j}^{-1} \tilde{\theta}^{*}_{j} \end{array} $$

(33c)

and

$$ \mathcal{Z}_{j} = \tilde{\theta}^{+}_{j} + \tilde{\theta}^{-}_{j} + \tilde{\theta}^{*}_{j} $$

(34)

where $\mathcal {Z}_{i}$ is the normalizing constant, given the evidence E.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ibrahim, MH., Pal, C. & Pesant, G. Leveraging cluster backbones for improving MAP inference in statistical relational models. Ann Math Artif Intell 88, 907–949 (2020). https://doi.org/10.1007/s10472-020-09698-z

Download citation

Published: 07 May 2020
Issue Date: August 2020
DOI: https://doi.org/10.1007/s10472-020-09698-z

Keywords

Mathematics Subject Classification (2010)

68T01 “General topics in artificial intelligence”

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Leveraging cluster backbones for improving MAP inference in statistical relational models

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Conditional probability table limit-based quantization for Bayesian networks: model quality, data fidelity and structure score

A Guide for Sparse PCA: Model Comparison and Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Derivation of WSP-χ’s update equations

1.1 A1. Variable-to-factor

1.2 A2. Factor-to-Variables

1.3 A3. Estimating the Marginals

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Abstract

Access this article

Similar content being viewed by others

Density-Based Clustering Based on Hierarchical Density Estimates

Conditional probability table limit-based quantization for Bayesian networks: model quality, data fidelity and structure score

A Guide for Sparse PCA: Model Comparison and Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix A: Derivation of WSP-χ’s update equations

Appendix A: Derivation of WSP-χ’s update equations

1.1 A1. Variable-to-factor

1.2 A2. Factor-to-Variables

1.3 A3. Estimating the Marginals

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation