• Author(s): Lingdong Kong, Youquan Liu, Lai Xing Ng, Benoit R. Cottereau, Wei Tsang Ooi

The paper presents OpenESS, a novel approach to event-based semantic segmentation (ESS), a fundamental yet challenging task in event camera sensing. The scalability of ESS is often limited by the difficulties in interpreting and annotating event data. While domain adaptation from images to event data can alleviate this issue, data representational differences pose additional challenges that need to be addressed.

For the first time, OpenESS synergizes information from image, text, and event-data domains to enable scalable ESS in an open-world, annotation-efficient manner. This is achieved by transferring the semantically rich CLIP knowledge from image-text pairs to event streams.

To enhance cross-modality adaptation, the paper proposes a frame-to-event contrastive distillation and a text-to-event semantic consistency regularization. Experimental results on popular ESS benchmarks demonstrate that this approach outperforms existing methods. Notably, OpenESS achieves 53.93% and 43.31% mIoU on DDD17 and DSEC-Semantic, respectively, without using either event or frame labels.

This paper represents a significant advancement in the field of event-based semantic segmentation, highlighting the potential of OpenESS in addressing the challenges associated with the interpretation and annotation of event data.