Yoshua Bengio

Research summary

A comparison of multilayer neural networks trained by back-propagation on handwritten character recognition showed that convolutional neural networks, designed to handle the variability of 2D shapes, outperform other techniques on a standard handwritten digit task and can synthesise complex decision surfaces directly from high-dimensional patterns with minimal preprocessing [1]. An analysis of gradient-based training of recurrent neural networks demonstrated that capturing long-term temporal dependencies becomes increasingly difficult as the duration of those dependencies grows, exposing a trade-off between efficient gradient-based learning and reliable latching of information over long intervals, and motivating alternatives to standard gradient descent [7]. A study of standard gradient descent applied to deep feed-forward networks from random initialization examined the role of non-linear activation functions and the propagation of activations and gradients across layers, clarifying why such training had failed before 2006 and informing subsequent initialization schemes [5]. An RNN encoder-decoder architecture was introduced for statistical machine translation, learning phrase representations that improve translation quality [2]. An empirical evaluation of gated recurrent units, including LSTM and the newer gated recurrent unit (GRU), on polyphonic music modelling and speech-signal modelling found that gated units outperform tanh units and that GRUs are broadly comparable to LSTMs [6]. A review of representation learning argued that the difficulty of machine-learning tasks reflects how well a representation disentangles the explanatory factors behind the data, and surveyed probabilistic models, autoencoders, manifold learning and deep networks under generic priors [4]. A textbook-style treatment of deep learning frames the field as a hierarchical learning paradigm in which simpler concepts compose into more complex ones, covering background in linear algebra, probability and information theory alongside model families [8]. Generative adversarial networks (GANs) are described as deep generative models that learn distributions implicitly and produce realistic high-resolution images [3]. Graph attention networks (GATs) use masked self-attention to assign different weights to neighbours in graph-structured data without costly matrix operations [9].

Recent publications

Deep learning2015 · Nature · 80284 citationsDOI
Gradient-based learning applied to document recognition1998 · Proceedings of the IEEE · 57592 citationsDOI
Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation2014 · 24160 citationsDOI
Generative adversarial networks2020 · Communications of the ACM · 13316 citationsDOI
Representation Learning: A Review and New Perspectives2013 · IEEE Transactions on Pattern Analysis and Machine Intelligence · 12849 citationsDOI
Understanding the difficulty of training deep feedforward neural networks2010 · 12676 citations
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling2014 · arXiv (Cornell University) · 10772 citationsDOI
Deep Learning2016 · MIT Press eBooks · 8959 citations
Learning long-term dependencies with gradient descent is difficult1994 · IEEE Transactions on Neural Networks · 8376 citationsDOI
Graph Attention Networks2017 · arXiv (Cornell University) · 8306 citations

The lab page does not clearly state student acceptance status. Email the professor directly to confirm.

How to apply

Email Yoshua Bengio 6-12 months before your application deadline. Read several recent papers and reference specific work in your message. Use our how to email a Japanese professor guide for the proven email structure.

For applications via MEXT scholarship: see our MEXT 2027 complete guide and university-specific University Recommendation track.

External profiles

ORCID: https://orcid.org/0000-0002-9322-3515
OpenAlex: openalex.org

Profile compiled from public sources (Researchmap, OpenAlex, Kumamoto University faculty directory). Last refreshed 2026-05. Report incorrect information.

Yoshua Bengio

Research summary

Recent publications

How to apply

External profiles

JLPT preparation

Application guides