Abstract
Although Deep Neural Networks (DNNs) have performed great success in machine learning domain, they usually show poorly on few-shot learning tasks, where a classifier has to quickly generalize after getting very few samples from each class. A Model-Agnostic Meta Learning (MAML) model, which is able to solve new learning tasks, only using a small number of training data. A MAML model with a Convolutional Neural Network (CNN) architecture is implemented as well, trained on the Omniglot dataset (rather than DNN), as a baseline for image classification tasks. However, our baseline model suffered from a long-period training process and relatively low efficiency. To address these problems, we introduced Recurrent Neural Network (RNN) architecture and its advanced variants into our MAML model, including Long Short-Term Memory (LSTM) architecture and its variants: LSTM-b and Gated Recurrent Unit (GRU). The experiment results, measured by ac- curacies, demonstrate a considerable improvement in image classification performance and training efficiency compared to the baseline models.
Similar content being viewed by others
REFERENCES
Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T. and de Freitas, N., Learning to learn by gradient descent by gradient descent, Advances in Neural Information Processing Systems, 2016, pp. 3981–3989.
Bengio, Y., Simard, P., and Frasconi, P., Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Networks, 1994, vol. 5, no. 2, pp. 157–166.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H. and Bengio, Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv:1406.1078 [cs.CL], 2014.
Chung, J., Gulcehre, C., Cho, K., and Bengio, Y., Empirical evaluation of gated recurrent neural networks on sequence modeling, arXiv:1412.3555v1 [cs.NE], 2014.
Colah's Blog, Understanding LSTM networks, 2015. https://colah.github.io/posts/2015-08-Understanding-LSTMs/.
Edwards, H. and Storkey, A., Towards a neural statistician, arXiv:1606.02185 [stat.ML], 2016.
Finn, C., Abbeel, P., and Levine, S., Model-agnostic meta-learning for fast adaptation of deep net-works, arXiv:1703.03400 [cs.LG], 2017.
Gers, F.A., Schmidhuber, J., and Cummins, F., Learning to forget: Continual prediction with LSTM, 1999 Ninth International Conference on Artificial Neural Networks, 1999.
Hochreiter, S., Untersuchungen zu dynamischen neuronalen netzen, Diploma, Technische Universität München, 1991.
Hochreiter, S. and Schmidhuber, J., Long short-term memory, Neural Comput., 1997, vol. 9, no. 8, pp. 1735–1780.
Jozefowicz, R., Zaremba, W., and Sutskever, I., An empirical exploration of recurrent network architectures, International Conference on Machine Learning, 2015, pp. 2342–2350.
Lake, B., Salakhutdinov, R., Gross, J., and Tenenbaum, J., One shot learning of simple visual concepts, Proceedings of the Annual Meeting of the Cognitive Science Society, 2011, vol. 33.
Mishra, N., Rohaninejad, M., Chen, X., and Abbeel, P., Meta-learning with temporal convolutions, arXiv:1707.03141 [cs.AI], 2017.
Ravi, S. and Larochelle, H., Optimization as a model for few-shot learning, 2016. https://openreview.net/pdf?id=rJY0-Kcll.
Santoro, A., Bartunov, S., Botvinick, M., Wierstra, D., and Lillicrap, T., Meta-learning with memory- augmented neural networks, International Conference on Machine Learning, 2016, pp. 1842–1850.
Visin, F., Kastner, K., Cho, K., Matteucci, M., Courville, A., and Bengio, Y., ReNet: A recurrent neural network based alternative to convolutional networks, arXiv:1505.00393 [cs.CV], 2015.
CONFLICT OF INTEREST
The authors declare that they have no conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
Shaodong Chen, Ziyu Niu The Research about Recurrent Model-Agnostic Meta Learning. Opt. Mem. Neural Networks 29, 56–67 (2020). https://doi.org/10.3103/S1060992X20010075
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1060992X20010075