Research summary
Wang's listed publications report deep learning methods for scene parsing, face analysis, image classification, person re-identification, 3D object detection, and object detection more broadly. A 2017 paper proposes the Pyramid Scene Parsing Network (PSPNet), which aggregates context from different regions through a pyramid pooling module to exploit global context for pixel-level prediction; it took first place in the ImageNet 2016 scene parsing challenge and reported state-of-the-art results on PASCAL VOC 2012 and Cityscapes [1]. A 2015 paper introduces a cascaded LNet/ANet framework for face attribute prediction in unconstrained images, with LNet pre-trained on general object categories for face localization and ANet pre-trained on face identities for attribute prediction, jointly fine-tuned with attribute tags [2]. A 2017 paper presents the Residual Attention Network, which stacks attention modules that generate attention-aware features through a bottom-up top-down feedforward structure, trained via attention residual learning to scale to very deep networks [3]. A 2019 paper proposes PointRCNN for 3D object detection directly from raw point clouds in two stages: bottom-up 3D proposal generation by foreground segmentation, and proposal refinement in canonical coordinates [4]. A 2019 paper surveys deep learning approaches to generic object detection, covering more than 300 contributions [5]. A 2014 paper introduces a Filter Pairing Neural Network (FPNN) for person re-identification that jointly handles misalignment, photometric and geometric transforms, occlusions, and background clutter [6]. A 2014 paper proposes DeepID features learned via multi-class face identification over about 10,000 identities, taken from the last hidden layer of convolutional networks and generalizing to face verification and unseen identities [7]. A 2020 paper presents PV-RCNN, integrating 3D voxel CNN features with PointNet-based set abstraction through a voxel set abstraction module that summarizes the scene with a small set of keypoints for 3D object detection [8]. A 2016 paper introduces DeepFashion, a clothes recognition dataset with over 800,000 images annotated with attributes, landmarks, and cross-scenario correspondence [9]. A 2020 paper proposes Deformable DETR, whose attention modules attend to a small set of sampling points around a reference, achieving better performance than DETR on COCO with about ten times fewer training epochs [10].
Recent publications
- Pyramid Scene Parsing NetworkDOI
- Deep Learning Face Attributes in the WildDOI
- Residual Attention Network for Image ClassificationDOI
- PointRCNN: 3D Object Proposal Generation and Detection From Point CloudDOI
- Deep Learning for Generic Object Detection: A SurveyDOI
- DeepReID: Deep Filter Pairing Neural Network for Person Re-identificationDOI
- Deep Learning Face Representation from Predicting 10,000 ClassesDOI
- PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object DetectionDOI
- DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich AnnotationsDOI
- Deformable DETR: Deformable Transformers for End-to-End Object DetectionDOI
The lab page does not clearly state student acceptance status. Email the professor directly to confirm.
How to apply
Email Xiaogang Wang 6-12 months before your application deadline. Read several recent papers and reference specific work in your message. Use our how to email a Japanese professor guide for the proven email structure.
For applications via MEXT scholarship: see our MEXT 2027 complete guide and university-specific University Recommendation track.
External profiles
- ORCID: https://orcid.org/0000-0002-7929-5889
- OpenAlex: openalex.org
Profile compiled from public sources (Researchmap, OpenAlex, The University of Tokyo faculty directory). Last refreshed 2026-05. Report incorrect information.