Research summary
A comparison of multilayer neural networks trained by back-propagation on handwritten character recognition showed that convolutional neural networks, designed to handle the variability of 2D shapes, outperform other techniques on a standard handwritten digit task and can synthesise complex decision surfaces directly from high-dimensional patterns with minimal preprocessing [1]. An analysis of gradient-based training of recurrent neural networks demonstrated that capturing long-term temporal dependencies becomes increasingly difficult as the duration of those dependencies grows, exposing a trade-off between efficient gradient-based learning and reliable latching of information over long intervals, and motivating alternatives to standard gradient descent [7]. A study of standard gradient descent applied to deep feed-forward networks from random initialization examined the role of non-linear activation functions and the propagation of activations and gradients across layers, clarifying why such training had failed before 2006 and informing subsequent initialization schemes [5]. An RNN encoder-decoder architecture was introduced for statistical machine translation, learning phrase representations that improve translation quality [2]. An empirical evaluation of gated recurrent units, including LSTM and the newer gated recurrent unit (GRU), on polyphonic music modelling and speech-signal modelling found that gated units outperform tanh units and that GRUs are broadly comparable to LSTMs [6]. A review of representation learning argued that the difficulty of machine-learning tasks reflects how well a representation disentangles the explanatory factors behind the data, and surveyed probabilistic models, autoencoders, manifold learning and deep networks under generic priors [4]. A textbook-style treatment of deep learning frames the field as a hierarchical learning paradigm in which simpler concepts compose into more complex ones, covering background in linear algebra, probability and information theory alongside model families [8]. Generative adversarial networks (GANs) are described as deep generative models that learn distributions implicitly and produce realistic high-resolution images [3]. Graph attention networks (GATs) use masked self-attention to assign different weights to neighbours in graph-structured data without costly matrix operations [9].
Recent publications
- Deep learningDOI
- Gradient-based learning applied to document recognitionDOI
- Learning Phrase Representations using RNN Encoder鈥揇ecoder for Statistical Machine TranslationDOI
- Generative adversarial networksDOI
- Representation Learning: A Review and New PerspectivesDOI
- Understanding the difficulty of training deep feedforward neural networks
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence ModelingDOI
- Deep Learning
- Learning long-term dependencies with gradient descent is difficultDOI
- Graph Attention Networks
The lab page does not clearly state student acceptance status. Email the professor directly to confirm.
How to apply
Email Yoshua Bengio 6-12 months before your application deadline. Read several recent papers and reference specific work in your message. Use our how to email a Japanese professor guide for the proven email structure.
For applications via MEXT scholarship: see our MEXT 2027 complete guide and university-specific University Recommendation track.
External profiles
- ORCID: https://orcid.org/0000-0002-9322-3515
- OpenAlex: openalex.org
Profile compiled from public sources (Researchmap, OpenAlex, Kumamoto University faculty directory). Last refreshed 2026-05. Report incorrect information.