Skip to main content
Log in

Deep learning-based automatic downbeat tracking: a brief review

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

As an important format of multimedia, music has filled almost everyone’s life. Automatic analyzing of music is a significant step to satisfy people’s need for music retrieval and music recommendation in an effortless way. Thereinto, downbeat tracking has been a fundamental and continuous problem in Music Information Retrieval (MIR) area. Despite significant research efforts, downbeat tracking still remains a challenge. Previous researches either focus on feature engineering (extracting certain features by signal processing, which are semi-automatic solutions); or have some limitations: they can only model music audio recordings within limited time signatures and tempo ranges. Recently, deep learning has surpassed traditional machine learning methods and has become the primary algorithm in feature learning; the combination of traditional and deep learning methods also has made better performance. In this paper, we begin with a background introduction of downbeat tracking problem. Then, we give detailed discussions of the following topics: system architecture, feature extraction, deep neural network algorithms, data sets, and evaluation strategy. In addition, we take a look at the results from the annual benchmark evaluation—Music Information Retrieval Evaluation eXchange—as well as the developments in software implementations. Although much has been achieved in the area of automatic downbeat tracking, some problems still remain. We point out these problems and conclude with possible directions and challenges for future research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. For a more comprehensive survey of MIR, containing background, history, fundamentals, tasks and applications, we refer readers to the overview by  [14, 18,19,20].

  2. A more complete list of data sets for MIR research is at: http://www.audiocontentanalysis.org/data-sets/.

  3. There are 13 duplicates which are pointed out by Bob Sturm: http://media.aau.dk/null_space_pursuits/2014/01/ballroom-dataset.html.

  4. http://www.sonicvisualiser.org/.

  5. http://www.music-ir.org/mirex/wiki/MIREX_HOME.

  6. Here we only talk about the six data sets used since 2014 because there are no comparisons for RWC classical and GTZAN.

  7. https://github.com/CPJKU/madmom.

  8. https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/materials/mirtoolbox.

  9. http://essentia.upf.edu/documentation/.

  10. https://github.com/librosa/librosa.

  11. Note that almost all researchers of automatic downbeat tracking have participated in MIREX Automatic Downbeat Estimation task and systems they proposed in their papers are similar to ones in MIREX. So results in Table 2 are quite representative and sufficient enough to analysis

References

  1. Lerdahl, F., Jackendoff, R.S.: A Generative Theory of Tonal Music. MIT Press, Cambridge (1985)

    Google Scholar 

  2. Sigtia, S., Benetos, E., Cherla, S., Weyde, T., Garcez, A.S.d., Dixon, S.: RNN-based music language models for improving automatic music transcription. In: International Society for Music Information Retrieval Conference (2014)

  3. Sigtia, S., Benetos, E., Dixon, S.: An end-to-end neural network for polyphonic piano music transcription. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 24(5), 927–939 (2016)

    Article  Google Scholar 

  4. Sturm, B.L., Santos, J.a.F., Ben-Tal, O., Korshunova, I.: Music transcription modelling and composition using deep learning (2016). arXiv:1604.08723

  5. Cogliati, A., Duan, Z., Wohlberg, B.: Context-dependent piano music transcription with convolutional sparse coding. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2218–2230 (2016)

    Article  Google Scholar 

  6. Oudre, L., Févotte, C., Grenier, Y.: Probabilistic template-based chord recognition. IEEE Trans. Audio Speech Lang. Process. 19(8), 2249–2259 (2011)

    Article  Google Scholar 

  7. Di Giorgi, B., Zanoni, M., Sarti, A., Tubaro, S.: Automatic chord recognition based on the probabilistic modeling of diatonic modal harmony. Multidimensional Systems (nDS). In: Proceedings of the 8th International Workshop on (VDE, 2013), pp. 1–6 (2013)

  8. Maddage, N.C.: Automatic structure detection for popular music. IEEE Multimed. 13(1), 65–77 (2006)

    Article  Google Scholar 

  9. Serra, J., Müller, M., Grosche, P., Arcos, J.L.: Unsupervised music structure annotation by time series structure features and segment similarity. IEEE Trans. Multimed. 16(5), 1229–1240 (2014)

    Article  Google Scholar 

  10. Panagakis, Y., Kotropoulos, C.: Elastic net subspace clustering applied to pop/rock music structure analysis. Pattern Recogn. Lett. 38, 46–53 (2014)

    Article  Google Scholar 

  11. Pauwels, J., Kaiser, F., Peeters, G.: Combining Harmony-Based and Novelty-Based Approaches for Structural Segmentation. In: International Society for Music Information Retrieval Conference, pp. 601–606 (2013)

  12. Müller, M.: Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, New York (2015)

    Book  Google Scholar 

  13. Downie, J.S.: Music Information Retrieval Evaluation eXchange. http://www.music-ir.org/mirex/wiki/MIREX_HOME

  14. Downie, J.S.: Music information retrieval. Ann. Rev. Inf. Sci. Technol. 37(1), 295–340 (2003)

    Article  Google Scholar 

  15. Celma, O.: Music Recommendation. Springer, Berlin, Heidelberg (2010)

    Book  Google Scholar 

  16. Park, S.H., Ihm, S.Y., Jang, W.I., Nasridinov, A., Park, Y.H.: A Music Recommendation Method with Emotion Recognition Using Ranked Attributes. Springer, Berlin, Heidelberg (2015)

    Book  Google Scholar 

  17. Yang, X., Dong, Y., Li, J.: Review of data features-based music emotion recognition methods. Multimed. Syst. 24(4), 365–389 (2018)

    Article  Google Scholar 

  18. Typke, R., Wiering, F., Veltkamp, R.C.: A survey of music information retrieval systems. In: Proc. 6th International Conference on Music Information Retrieval, pp. 153–160. Queen Mary, University of London, London (2005)

  19. Casey, M.A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., Slaney, M.: Content-based music information retrieval: Current directions and future challenges. Proc. IEEE 96(4), 668–696 (2008)

    Article  Google Scholar 

  20. Downie, J.S.: The music information retrieval evaluation exchange (2005–2007): A window into music information retrieval research. Acoust. Sci. Technol. 29(4), 247–255 (2008)

    Article  Google Scholar 

  21. Goto, M., Muraoka, Y.: A beat tracking system for acoustic signals of music. In: Proceedings of the Second ACM International Conference on Multimedia, pp. 365–372. ACM, New York (1994)

  22. Goto, M., Muraoka, Y.: A Real-Time Beat Tracking System for Audio Signals. In: ICMC (1995)

  23. Goto, M.: An audio-based real-time beat tracking system for music with or without drum-sounds. J. New Music Res. 30(2), 159–171 (2001)

    Article  Google Scholar 

  24. Davies, M.E., Plumbley, M.D.: Beat tracking with a two state model [music applications]. In: IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP'05), vol. 3, pp. iii–241. IEEE (2005)

  25. Seppänen, J., Eronen, A.J., Hiipakka, J.: Joint Beat & Tatum Tracking from Music Signals. In: ISMIR, pp. 23–28 (2006)

  26. Gkiokas, A., Katsouros, V., Carayannis, G., Stajylakis, T.: Music tempo estimation and beat tracking by applying source separation and metrical relations. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 421–424. IEEE (2012)

  27. Peeters, G., Papadopoulos, H.: Simultaneous beat and downbeat-tracking using a probabilistic framework: Theory and large-scale evaluation. IEEE Trans. Audio Speech Lang. Process. 19(6), 1754–1769 (2011)

    Article  Google Scholar 

  28. Krebs, F., Böck, S., Widmer, G.: Rhythmic Pattern Modeling for Beat and Downbeat Tracking in Musical Audio. In: ISMIR, pp. 227–232 (2013)

  29. Krebs, F., Korzeniowski, F., Grachten, M., Widmer, G.: Unsupervised learning and refinement of rhythmic patterns for beat and downbeat tracking. In: 2014 Proceedings of the 22nd European, Signal Processing Conference (EUSIPCO), pp. 611–615. IEEE (2014)

  30. Böck, S., Krebs, F., Widmer, G.: Joint Beat and Downbeat Tracking with Recurrent Neural Networks. In: ISMIR, pp. 255–261 (2016)

  31. Goto, M., Muraoka, Y.: Real-time beat tracking for drumless audio signals: Chord change detection for musical decisions. Speech Commun. 27(3–4), 311 (1999)

    Article  Google Scholar 

  32. Davies, M.E., Plumbley, M.D.: A spectral difference approach to downbeat extraction in musical audio. In: Proceedings of the 14th European Signal Processing Conference (EUSIPCO), pp. 1–4 (2006)

  33. Durand, S., David, B., Richard, G.: Enhancing downbeat detection when facing different music styles. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3132–3136 (2014)

  34. Klapuri, A.P., Eronen, A.J., Astola, J.T.: Analysis of the meter of acoustic musical signals. IEEE Trans. Audio Speech Lang. Process. 14(1), 342–355 (2006)

    Article  Google Scholar 

  35. Jehan, T.: Downbeat prediction by listening and learning. In: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. pp. 267–270. IEEE (2005)

  36. Papadopoulos, H., Peeters, G.: Joint estimation of chords and downbeats from an audio signal. IEEE Trans. Audio Speech Lang. Process. 19(1), 138–152 (2011)

    Article  Google Scholar 

  37. Gärtner, D.: Unsupervised learning of the downbeat in drum patterns. In: Audio Engineering Society Conference: 53rd International Conference: Semantic Audio. Audio Engineering Society (2014)

  38. Hockman, J., Davies, M.E., Fujinaga, I.: One in the Jungle: Downbeat Detection in Hardcore, Jungle, and Drum and Bass. In: ISMIR, pp. 169–174 (2012)

  39. Srinivasamurthy, A., Holzapfel, A., Serra, X.: In search of automatic rhythm analysis methods for turkish and indian art music. J. New Music Res. 43(1), 94–114 (2014)

    Article  Google Scholar 

  40. Allan, H.: Bar lines and beyond-meter tracking in digital audio. Mémoire de DEA School Inf. Univ. Edinb. 27, 28 (2004)

    Google Scholar 

  41. Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT Press, Cambridge (2016)

    MATH  Google Scholar 

  42. Wang, X., Wang, Y.: Improving content-based and hybrid music recommendation using deep learning. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 627–636. ACM, New York (2014)

  43. Yan, Y., Chen, M., Shyu, M.L., Chen, S.C.: Deep learning for imbalanced multimedia data classification. In: 2015 IEEE International Symposium on Multimedia (ISM), pp. 483–488. IEEE (2015)

  44. Zou, H., Du, J.X., Zhai, C.M., Wang, J.: Deep learning and shared representation space learning based cross-modal multimedia retrieval. In: International Conference on Intelligent Computing, pp. 322–331. Springer, New York (2016)

    Chapter  Google Scholar 

  45. Nie, W., Cao, Q., Liu, A., Su, Y.: Convolutional deep learning for 3d object retrieval. Multimed. Syst. 23(3), 325–332 (2017)

    Article  Google Scholar 

  46. Durand, S., Bello, J.P., David, B., Richard, G.: Downbeat tracking with multiple features and deep neural networks. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 409–413. IEEE (2015)

  47. Durand, S., Bello, J.P., David, B., Richard, G.: Feature adapted convolutional neural networks for downbeat tracking. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 296–300. IEEE (2016)

  48. Durand, S., Bello, J.P., David, B., Richard, G.: Robust downbeat tracking using an ensemble of convolutional networks. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 25(1), 76–89 (2017)

    Article  Google Scholar 

  49. Krebs, F., Böck, S., Dorfer, M., Widmer, G.: Downbeat Tracking Using Beat Synchronous Features with Recurrent Neural Networks. In: ISMIR, pp. 129–135 (2016)

  50. Graves, A.: Supervised sequence labelling. Supervised sequence labelling with recurrent neural networks. Springer, Berlin, Heidelberg, pp. 5–13 (2012)

    Chapter  Google Scholar 

  51. Kittler, J., Hatef, M., Duin, R.P., Matas, J.: On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 20(3), 226–239 (1998)

    Article  Google Scholar 

  52. Grosche, P., Müller, M.: Tempogram toolbox: Matlab implementations for tempo and pulse analysis of music recordings. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR). Miami, FL, USA (2011)

  53. Dixon, S.: Evaluation of the audio beat tracking system beatroot. J. New Music Res. 36(1), 39–50 (2007)

    Article  Google Scholar 

  54. Khadkevich, M., Fillon, T., Richard, G., Omologo, M.: A probabilistic approach to simultaneous extraction of beats and downbeats. In: 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 445–448. IEEE (2012)

  55. Holzapfel, A., Krebs, F., Srinivasamurthy, A.: Tracking the “odd”: Meter inference in a culturally diverse music corpus. In: ISMIR-International Conference on Music Information Retrieval, pp. 425–430. ISMIR (2014)

  56. Malm, W.P.: Music Cultures of the Pacific, the Near East, and Asia. Pearson College Division, London (1996)

    Google Scholar 

  57. Bello, J.P., Pickens, J.: A Robust Mid-Level Representation for Harmonic Content in Music Signals. In: ISMIR, vol. 5, pp. 304–311 (2005)

  58. Müller, M., Ewert, S.: Chroma Toolbox: MATLAB implementations for extracting variants of chroma-based audio features. In: Proceedings of the 12th International Conference on Music Information Retrieval (ISMIR) (2011)

  59. Brookes, M.: Voicebox: Speech processing toolbox for matlab, vol. 47. Software. Available from http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html (2011)

  60. Pfordresher, P.Q.: The role of melodic and rhythmic accents in musical structure. Music Percept. Interdiscip. J. 20(4), 431–464 (2003)

    Article  Google Scholar 

  61. Hannon, E.E., Snyder, J.S., Eerola, T., Krumhansl, C.L.: The role of melodic and temporal cues in perceiving musical meter. J. Exp. Psychol. Human Percept. Perform. 30(5), 956 (2004)

    Article  Google Scholar 

  62. Ellis, R.J., Jones, M.R.: The role of accent salience and joint accent structure in meter perception. J. Exp. Psychol. Human Percept. Perform. 35(1), 264 (2009)

    Article  Google Scholar 

  63. Florian, K., Andre, H., Ajay, S.: Mirex 2014 audio downbeat tracking evaluation: Khs1. http://www.music-ir.org/mirex/abstracts/2014/KSH1.pdf (2014)

  64. Florian, K., Gerhard, W.: Mirex 2014 audio downbeat tracking evaluation: Fk1. http://www.music-ir.org/mirex/abstracts/2014/FK3.pdf (2014)

  65. Florian, K., Gerhard, W.: Mirex 2014 audio downbeat tracking evaluation: Fk2. http://www.music-ir.org/mirex/abstracts/2014/FK4.pdf (2014)

  66. Florian, K., Sebastian, B.: Mirex 2015 audio beat and downbeat tracking submissions: Fk1, fk2, fk3, fk4, fk5, fk6. http://www.music-ir.org/mirex/abstracts/2015/FK2.pdf (2015)

  67. Chris, C., Emmanouil, B., Matthias, M., Matthew, D.E.P., Simon, D., Christian, L., Katy, N., Dan, S.: Mirex 2016: Vamp plugins from the centre for digital music. http://www.music-ir.org/mirex/abstracts/2016/CD4.pdf (2016)

  68. Matthew, D., Adam, S., Andrew, R.: Downbeater: Audio downbeat estimation task. http://www.music-ir.org/mirex/abstracts/2016/DSR1.pdf (2016)

  69. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)

  70. Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks (2013). arXiv:1302.4389

  71. Zhou, Y., Chellappa, R.: Computation of optical flow using a neural network. In: IEEE International Conference on Neural Networks, vol. 27, pp. 71–78 (1988)

  72. Sainath, T.N., Kingsbury, B., Mohamed, A.r., Dahl, G.E., Saon, G., Soltau, H., Beran, T., Aravkin, A.Y., Ramabhadran, B.: Improvements to deep convolutional neural networks for LVCSR. In: 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 315–320. IEEE (2013)

  73. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533 (1986)

    Article  Google Scholar 

  74. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  75. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using rnn encoder-decoder for statistical machine translation (2014). arXiv:1406.1078

  76. Gouyon, F., Klapuri, A., Dixon, S., Alonso, M., Tzanetakis, G., Uhle, C., Cano, P.: An experimental comparison of audio tempo induction algorithms. IEEE Trans. Audio Speech Lang. Process. 14(5), 1832–1844 (2006)

    Article  Google Scholar 

  77. Harte, C.: Towards automatic extraction of harmony information from music signals. Ph.D. thesis (2010)

  78. Davies, M.E., Degara, N., Plumbley, M.D.: Evaluation methods for musical audio beat tracking algorithms. Queen Mary University of London, Centre for Digital Music, Tech. Rep. C4DM-TR-09-06 (2009)

  79. Srinivasamurthy, A., Serra, X.: A supervised approach to hierarchical metrical cycle tracking from audio music recordings. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5217–5221. IEEE (2014)

  80. Srinivasamurthy, A., Holzapfel, A., Cemgil, A.T., Serra, X.: Particle filters for efficient meter tracking with dynamic bayesian networks. In: International Society for Music Information Retrieval Conference (ISMIR) (2015)

  81. Tzanetakis, G., Cook, P.: Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10(5), 293–302 (2002)

    Article  Google Scholar 

  82. Marchand, U., Peeters, G.: Swing ratio estimation. Digital Audio Effects 2015 (Dafx15) (2015)

  83. Hainsworth, S.W.: Techniques for the automated analysis of musical audio. PhD thesis. University of Cambridge (2003)

  84. Hainsworth, S.W., Macleod, M.D.: Particle filtering applied to musical tempo tracking. EURASIP J. Adv. Signal Process. 2004(15), 927847 (2004)

    Article  Google Scholar 

  85. Giorgi, B.D., Zanoni, M., Böck, S., Sarti, A.: Multipath beat tracking. J. Audio Eng. Soc. 64(7/8), 493–502 (2016)

    Article  Google Scholar 

  86. De Clercq, T., Temperley, D.: A corpus analysis of rock harmony. Popul. Music 30(1), 47–70 (2011)

    Article  Google Scholar 

  87. Goto, M., Hashiguchi, H., Nishimura, T., Oka, R.: RWC Music Database: Popular, Classical and Jazz Music Databases. In: ISMIR, vol. 2, pp. 287–288 (2002)

  88. Goto, M., et al.: Development of the RWC music database. In: Proceedings of the 18th International Congress on Acoustics (ICA 2004), vol. 1, pp. 553–556 (2004)

  89. Goto, M.: AIST Annotation for the RWC Music Database. In: ISMIR, pp. 359–360 (2006)

  90. Livshin, A., Rodex, X.: The importance of cross database evaluation in sound classification. In: ISMIR (2003)

  91. Simon, D., Juan, B., P., Bertrand, D., Gaël, R.: Mirex 2014 audio downbeat estimation evaluation: Db1. http://www.music-ir.org/mirex/abstracts/2014/DBDR2.pdf (2014)

  92. Simon, D., Juan, B., P., Bertrand, D., Gaël, R.: Mirex 2015 audio downbeat estimation submissions: Drdb2 and drdb3. http://www.music-ir.org/mirex/abstracts/2015/DBDR2.pdf (2015)

  93. Simon, D., Juan, P.B., Bertrand, D., Gaël, R.: Mirex 2016 audio downbeat estimation evaluation: Dbdr_nobe. http://www.music-ir.org/mirex/abstracts/2016/DBDR1.pdf (2016)

  94. Florian, K., Sebastian, B.: Mirex 2016 audio downbeat tracking submissions: Kb1, kb2. http://www.music-ir.org/mirex/abstracts/2016/KBDW1.pdf (2016)

  95. Sebastian, B., Florian, K.: Mirex 2016 submission bk4. http://www.music-ir.org/mirex/abstracts/2016/BK4.pdf (2016)

  96. Böck, S., Korzeniowski, F., Schlüter, J., Krebs, F., Widmer, G.: Madmom: A new Python audio and music signal processing library. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 1174–1178. ACM (2016)

  97. Lartillot, O., Toiviainen, P.: A Matlab toolbox for musical feature extraction from audio. In: International conference on digital audio effects, pp. 237–244. Bordeaux, FR (2007)

  98. Latrillot, O., Toiviainen, P.: MIR in Matlab: A toolbox for musical feature extraction. In: Proceedings of the International Conference on Music Information Retrieval (2007)

  99. Bogdanov, D., Wack, N., Gómez, E., Gulati, S., Herrera, P., Mayor, O., Roma, G., Salamon, J., Zapata, J., Serra, X.: ESSENTIA: an open-source library for sound and music analysis. In: Proceedings of the 21st ACM international conference on Multimedia, pp. 855–858. ACM (2013)

  100. McFee, B., Raffel, C., Liang, D., Ellis, D.P., McVicar, M., Battenberg, E., Nieto, O.: librosa: Audio and music signal analysis in python. In: Proceedings of the 14th Python in Science Conference, pp. 18–25 (2015)

  101. Krebs, F., Holzapfel, A., Cemgil, A.T., Widmer, G.: Inferring metrical structure in music using particle filters. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 23(5), 817–827 (2015)

    Google Scholar 

  102. Mor, N., Wolf, L., Polyak, A., Taigman, Y.: A universal music translation network (2018). arXiv:1805.07848

  103. Zhang, H., Yang, Y., Luan, H., Yang, S., Chua, T.S.: Start from scratch: Towards automatically identifying, modeling, and naming visual attributes. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 187–196. ACM (2014)

  104. Dieleman, S., Schrauwen, B.: End-to-end learning for music audio. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6964–6968. IEEE (2014)

  105. Miao, Y., Gowayyed, M., Metze, F.: EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding. In: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 167–174. IEEE (2015)

  106. Ma, X., Hovy, E.: End-to-end sequence labeling via bi-directional lstm-cnns-crf (2016). arXiv:1603.01354

  107. Zhang, H., Wang, M., Hong, R., Chua, T.S.: Play and rewind: Optimizing binary representations of videos by self-supervised temporal hashing. In: Proceedings of the 2016 ACM on Multimedia Conference, pp. 781–790. ACM (2016)

Download references

Acknowledgements

This work is supported by National Natural Science Fund for Distinguished Young Scholar (Grant No. 61625204) and partially supported by the State Key Program of National Science Foundation of China (Grant Nos. 61836006 and 61432014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiancheng Lv.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jia, B., Lv, J. & Liu, D. Deep learning-based automatic downbeat tracking: a brief review. Multimedia Systems 25, 617–638 (2019). https://doi.org/10.1007/s00530-019-00607-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-019-00607-x

Keywords

Navigation