Abstract
This work aims to evaluate and propose matheuristics for the Distinguishing String Selection Problem (DSSP) and the Distinguishing Substring Selection Problems (DSSSP). Heuristics based on mathematical programming have already been proposed for String Selection problems in the literature and we are interested in adopting and testing different approaches for those problems. We proposed two matheuristics for both the DSSP and DSSSP by combining the Variable Neighbourhood Search (VNS) metaheuristic and mathematical programming. We compare the linear relaxation, lower bounds found through the branch-and-bound technique, and the matheuristics in three different groups of instances. Computational experiments show that the Basic Core Problem Algorithm (BCPA) finds overall better results for the DSSP. However, it was unable to provide any solutions for some hard DSSSP instances in a reasonable time limit. The two matheuristics based on the VNS have their own niche related to the different groups of instances. They found good solutions for the DSSSP while the BCPA failed. All the obtained data are available in our repository.
Similar content being viewed by others
References
Chimani, M., Woste, M., & Böcker, S. (2011) A closer look at the closest string and closest substring problem. In: Proceedings of the Meeting on Algorithm Engineering & Expermiments (pp. 13–24).
Della Croce, F., & Salassa, F. (2012). Improved lp-based algorithms for the closest string problem. Computers & Operations Research, 39(3), 746–749.
Deng, X., Li, G., Li, Z., Ma, B., & Wang, L. (2003). Genetic design of drugs without side-effects. SIAM Journal on Computing, 32(4), 1073–1090.
Faro, S., & Pappalardo, E. (2010). Ant-csp: An ant colony optimization algorithm for the closest string problem. In Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science, Lecture Notes in Computer Science (Vol. 5901, pp. 370–381).
Gamrath, G., Fischer, T., Gally, T., Gleixner, A.M., Hendel, G., Koch, T., Maher, S.J., Miltenberger, M., Müller, B., Pfetsch, M.E., Puchert, C., Rehfeldt, D., Schenker, S., Schwarz, R., Serrano, F., Shinano, Y., Vigerske, S., Weninger, D., Winkler, M., Witt, J.T., & Witzig, J. (2016). The scip optimization suite 3.2. Tech. Rep. 15-60, ZIB, Berlin.
Gramm, J., Guo, J., & Niedermeier, R. (2006). Parameterized intractability of distinguishing substring selection. Theory of Computing Systems, 39(4), 545–560.
Hansen, P., Mladenović, N., & Moreno Pérez, J. A. (2008). Variable neighbourhood search: methods and applications. A Quarterly Journal of Operations Research, 6(4), 319–360.
IBM (2013). IBM ILOG CPLEX v12.6 optimization studio CPLEX user’s manual.
Jean, T. (2018). DSSP. https://github.com/jeanpttorres/dssp . Accessed December 28, 2020.
Lanctot, J.K. (2000). Some string problems in computational biology. Ph.D. thesis, University of Waterloo, Ontario, Canada
Lanctot, J. K., Li, M., Ma, B., Wang, S., & Zhang, L. (2003). Distinguishing string selection problems. Information and Computation, 185(1), 41–55. https://doi.org/10.1016/S0890-5401(03)00057-9.
Liu, X., Holger, M., Hao, Z., & Wu, G. (2008). A compounded genetic and simulated annealing algorithm for the closest string problem. In Proceedings of the 2nd International Conference on Bioinformatics and Biomedical Engineering (pp. 702–705). https://doi.org/10.1109/ICBBE.2008.171
Liu, X., Liu, S., Hao, Z., & Mauch, H. (2011). Exact algorithm and heuristic for the closest string problem. Computers & Operations Research, 38, 1513–1520.
Mauch, H., Melzer, M.J., & Hu, J.S. (2003). Genetic algorithm approach for the closest string problem. In Proceedings of the IEEE Computer Society Conference on Bioinformatics (p. 560) . https://doi.org/10.1109/CSB.2003.1227407
Meneses, C.N. (2005). Combinatorial approaches for problems in bioinformatics. Ph.D. thesis, University of Florida, Florida, USA
Meneses, C.N., Pardalos, P.M., Resende, M.G.C., & Vazacopoulos, A. (2005). Modeling and solving string selection problems. In Proceedings of the Second International Symposium on Mathematical and Computational Biology (pp. 54–64).
Proutski, V., & Holmes, E. C. (1996). Primer Master: a new program for the design and analysis of PCR primers. Bioinformatics, 12(3), 253–255. https://doi.org/10.1093/bioinformatics/12.3.253.
Stormo, G. D., Hartzell, G. W., & Hertz, G. Z. (1990). Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Computer Applications in the Biosciences, 6(2), 81–92.
Torres, J., Silva, E., & Hoshino, E.A. (2018). Heuristic approaches to the distinguishing substring selection problem. In Proceedings of the 5th International Conference on Variable Neighborhood Search, Electronic Notes in Discrete Mathematics (Vol. 66, pp. 151–158). https://doi.org/10.1016/j.endm.2018.03.020
Torres, J.P., & Hoshino, E.A. (2018). Abordagens heurísticas para problemas de seleção de strings. In Proceedings of Simpásio Brasileiro de Matemática Aplicada e Computacional, Proceeding Series of the Brazilian Society of Computational and Applied Mathematics, vol. 6.
Acknowledgements
We would like to thank SCIP developers team and IBM for the SCIP and Cplex academic licenses. We also thank the anonymous reviewers, whose works and contributions helped us to improve the quality of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance Code 001.
Appendix: Detailed experiments results
Appendix: Detailed experiments results
Tables 5, 6, 7, 8, 9, and 10 show the results for each instance. Column frac presents the percentage of variables whose values in the optimal solution of the linear relaxation were fractional. Column root refers to the value of the optimal solution of the linear relaxation at the root node, whilst columns lb and ub represent the final lower an upper bound found by the exact algorithm, respectively. Columns \(H_1\), \(H_2\), \(H_3\), and \(H_4\) show the value of the solution found by RA, BCPA, ILPBN-VNS, and ILPBS-VNS, respectively. Instances are named r-XX-YY-ZZ-N. where XX indicates \(|\varSigma |\), YY refers to \(|S^c|=|S^f|\), ZZ to |t|, and N is used to express different test cases with similar structures. The best solutions found by heuristics are highlighted in boldface. We also highlight the root lower bound and the final upper bound found by the exact approach when they are optimum. The symbol \(*\) are used to indicate that the information was not available after the time limit. We use the symbol − instead of the value of the solution found by a heuristic, when it could not be find due to the time limit.
Rights and permissions
About this article
Cite this article
Torres, J.P.T., Hoshino, E.A. LP-based heuristics for the distinguishing string and substring selection problems. Ann Oper Res 316, 1205–1234 (2022). https://doi.org/10.1007/s10479-021-04138-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-021-04138-5