Abstract
This work focuses on document fragments association using deep metric learning methods. More precisely, we are interested in ancient papyri fragments that need to be reconstructed prior to their analysis by papyrologists. This is a challenging task to automatize using machine learning algorithms because labeled data is rare, often incomplete, imbalanced and of inconsistent conservation states. However, there is a real need for such software in the papyrology community as the process of reconstructing the papyri by hand is extremely time-consuming and tedious. In this paper, we explore ways in which papyrologists can obtain useful matching suggestion on new data using Deep Convolutional Siamese-Networks. We emphasize on low-to-no human intervention for annotating images. We show that the from-scratch self-supervised approach we propose is more effective than using knowledge transfer from a large dataset, the former achieving a top-1 accuracy score of 0.73 on a retrieval task involving 800 fragments.
Similar content being viewed by others
Notes
Found here: https://quod.lib.umich.edu/a/apis (accessed October 28, 2020).
Using OpenCV: https://opencv.org/, accessed November 3, 2020.
References
Bondi, L., Güera, D., Baroffio, L., Bestagini, P., Delp, E.J., Tubaro, S.: A preliminary study on convolutional neural networks for camera model identification. Electron. Imaging 2017(7), 67–76 (2017)
Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. Int. J. Pattern Recognit. Artif. Intell. 7(04), 669–688 (1993)
Cao, Z., Ma, L., Long, M., Wang, J.: Partial adversarial domain adaptation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 135–150 (2018)
Christlein, V., Gropp, M., Fiel, S., Maier, A.: Unsupervised feature learning for writer identification and writer retrieval. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 991–997. IEEE (2017)
Vincent Christlein and Anguelos Nicolaou and Mathias Seuret and Dominique Stutzmann and Andreas Maier ICDAR 2019 Competition on Image Retrieval for Historical Handwritten Documents (2019). arXiv:1912.03713
Cloppet, F., Eglin, V., Helias-Baron, M., Kieu, C., Vincent, N., Stutzmann, D.: ICDAR2017 competition on the classification of medieval handwritings in Latin script. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1371–1376. IEEE (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR09 (2009)
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1422–1430 (2015)
Dosovitskiy, A., Fischer, P., Springenberg, J.T., Riedmiller, M., Brox, T.: Discriminative unsupervised feature learning with exemplar convolutional neural networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1734–1747 (2015)
Fiel, S., Kleber, F., Diem, M., Christlein, V., Louloudis, G., Nikos, S., Gatos, B.: ICDAR2017 competition on historical document writer identification (historical-wi). In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1377–1382. IEEE (2017)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), vol. 2, pp. 1735–1742. IEEE (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR (2015). arXiv:1512.03385
Hoffer, E., Ailon, N.: Deep metric learning using triplet network. In: International Workshop on Similarity-Based Pattern Recognition, pp. 84–92. Springer (2015)
Huh, M., Agrawal, P., Efros, A.A.: What makes imagenet good for transfer learning? arXiv preprint arXiv:1608.08614 (2016)
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449 (2002)
Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2020)
Karpinski, R., Belaid, A.: Semi-supervised learning through adversary networks for baseline detection. In: 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), vol. 5, pp. 128–133. IEEE (2019)
Kaya, M., Bílge, H.: Deep metric learning: a survey. Symmetry (2019). https://doi.org/10.3390/sym11091066
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Korbar, B., Tran, D., Torresani, L.: Cooperative learning of audio and video models from self-supervised synchronization. arXiv preprint arXiv:1807.00230 (2018)
Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: European Conference on Computer Vision, pp. 577–593. Springer (2016)
Liu, S., Deng, W.: Very deep convolutional neural network based image classification using small training sample size. In: 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR), pp. 730–734 (2015). https://doi.org/10.1109/ACPR.2015.7486599
Lombardi, F., Marinai, S.: Deep learning for historical document analysis and recognition—a survey. J. Imaging 6(10), 110 (2020)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Mahendran, A., Thewlis, J., Vedaldi, A.: Cross pixel optical-flow similarity for self-supervised learning. In: Asian Conference on Computer Vision, pp. 99–116. Springer (2018)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval. Nat. Lang. Eng. 16(1), 100–103 (2008)
Misra, I., Maaten, L.V.D.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717 (2020)
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: European Conference on Computer Vision, pp. 527–544. Springer (2016)
Noroozi, M., Vinjimoor, A., Favaro, P., Pirsiavash, H.: Boosting self-supervised learning via knowledge transfer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9359–9367 (2018)
Ostertag, C., Beurton-Aimar, M.: Matching ostraca fragments using a siamese neural network. Pattern Recognit. Lett. 131, 336–340 (2020)
Owens, A., Efros, A.A.: Audio-visual scene analysis with self-supervised multisensory features. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 631–648 (2018)
Paixão, T.M., Berriel, R.F., Boeres, M.C., Koerich, A.L., Badue, C., De Souza, A.F., Oliveira-Santos, T.: Self-supervised deep reconstruction of mixed strip-shredded text documents. Pattern Recognit. 107, 107535 (2020). https://doi.org/10.1016/j.patcog.2020.107535
Pal, S., Datta, A., Majumder, D.D.: Computer recognition of vowel sounds using a self-supervised learning algorithm. J. Anat. Soc. India 6, 117–123 (1978)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Pirrone, A., Beurton-Aimar, M., Journet, N.: Papy-s-net: a siamese network to match papyrus fragments. In: Proceedings of the 5th International Workshop on Historical Document Imaging and Processing, pp. 78–83 (2019)
Ren, Z., Lee, Y.J.: Cross-domain self-supervised multi-task feature learning using synthetic imagery. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 762–771 (2018)
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One (2015). https://doi.org/10.1371/journal.pone.0118432
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Seuret, M., Nicolaou, A., Stutzmann, D., Maier, A., Christlein, V.: ICFHR 2020 competition on image retrieval for historical handwritten fragments. Int. Conf. Front. Handwrit. Recognit. (2020). https://doi.org/10.1109/ICFHR2020.2020.00048
Studer, L., Alberti, M., Pondenkandath, V., Goktepe, P., Kolonko, T., Fischer, A., Liwicki, M., Ingold, R.: A comprehensive study of imagenet pre-training for historical document image analysis. CoRR (2019). arXiv:1905.09113
Tang, H., Zhao, Y., Lu, H.: Unsupervised person re-identification with iterative self-supervised domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Tang, Y., Peng, L., Xu, Q., Wang, Y., Furuhata, A.: CNN based transfer learning for historical Chinese character recognition. In: 2016 12th IAPR Workshop on Document Analysis Systems (DAS), pp. 25–29. IEEE (2016)
Wiggers, K.L., Junior, A.d.S.B., Koerich, A.L., Heutte, L., de Oliveira, L.E.S.: Deep learning approaches for image retrieval and pattern spotting in ancient documents. arXiv preprint arXiv:1907.09404 (2019)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: European Conference on Computer Vision, pp. 818–833. Springer (2014)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: European Conference on Computer Vision, pp. 649–666. Springer (2016)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The research leading to this results has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program under Grant agreement No. 758907 and is part of the GESHAEM Project, hosted by the Ausonius Institute. The source code (upon request) and data used in this article are available at https://morphoboid.labri.fr/self-supervised-papyrus.html.
Rights and permissions
About this article
Cite this article
Pirrone, A., Beurton-Aimar, M. & Journet, N. Self-supervised deep metric learning for ancient papyrus fragments retrieval. IJDAR 24, 219–234 (2021). https://doi.org/10.1007/s10032-021-00369-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-021-00369-1