learning Semantic Segmentation with Minimum Supervision

  • 2,353

learning Semantic Segmentation with Minimum Supervision


Suha Kwak is an assistant professor in Computer Science and Engineering at POSTECH. Before that, he spent one and half years on the faculty of the Department of Information and Communication Engineering at DGIST. He did a post-doc with Ivan Laptev and Jean Ponce in the WILLOW team at the Department of Computer Science of the École Normale Supérieure and Inria Paris. He completed my BS and PhD in 2007 and 2014, respectively, both at POSTECH. His research is in computer vision and machine learning. He is primarily interested in weakly and unsupervised learning for fundamental visual recognition problems like object detection and segmentation. His research interests also include topics in video anaysis such as visual tracking, action recognition, and video event detection.



Semantic segmentation is a visual recognition task aiming to estimate pixel-level class labels in images. This problem has been recently handled by Deep Convolutional Neural Networks (DCNNs), which achieve impressive records on public benchmarks. However, learning a DCNN demands a large number of annotated data for training, while segmentation annotations in existing datasets are significantly limited in terms of both quantity and diversity due to the heavy annotation cost. Weakly supervised approaches have been proposed to handle this issue by leveraging weak annotations such as bounding boxes and scribbles as supervision since they are readily available or easily obtained compared to the segmentation annotations thanks to their low annotation costs. In this talk, I will introduece our recent approaches to weakly supervised semantic segmentation, which exploit as supervision image-level class label, the minimum supervision indicating only presence or absence of a certain semantic entity in image. We tackled this challenging problem by employing (1) unsupervised techniques revealing low-level image structures, (2) web-crawled videos as additional data sources, and (3) DCNN architectures appropriate for learning segmentation with incomplete pixel-level annotations. I will conclude this talk with a few suggestions for future research directions worth to investigate for further improvement.