• Author(s): Junbao Zhou, Ziqi Pang, Yu-Xiong Wang

“RMem: Restricted Memory Banks Improve Video Object Segmentation” introduces a novel approach to video object segmentation (VOS) that challenges the prevailing trend of expanding memory banks to store extensive historical information. The authors propose a strategy of restricting the size of memory banks, which leads to a notable improvement in VOS accuracy.

The key insight behind this approach is derived from a “memory deciphering” study conducted by the authors. The study reveals that while expanding memory banks may seem beneficial, it actually increases the difficulty for VOS modules to extract relevant features due to the presence of redundant information. By limiting the memory banks to a smaller number of essential frames, the proposed method achieves better performance on VOS tasks.

The restricted memory bank approach balances the importance and freshness of frames to maintain an informative memory bank within a bounded capacity. This not only improves the accuracy of VOS but also reduces the discrepancy between training and inference in terms of memory lengths compared to continuous expansion methods. The reduced discrepancy opens up new opportunities for temporal reasoning and allows the introduction of “temporal positional embedding,” a previously overlooked concept in VOS. The authors embody their insights in a system called “RMem” (“R” for restricted), which is a simple yet effective modification to existing VOS methods. RMem excels in challenging VOS scenarios and establishes new state-of-the-art results on two datasets: VOST, which focuses on object state changes, and Long Videos, which tests performance on extended video sequences.

Experimental results demonstrate the effectiveness of RMem in improving VOS accuracy. The paper provides quantitative evaluations and qualitative examples showcasing the benefits of restricted memory banks in handling complex VOS tasks. The authors also make their code and demo available to the research community, promoting reproducibility and further advancements in the field. “RMem: Restricted Memory Banks Improve Video Object Segmentation” presents a significant contribution to the field of video object segmentation. By challenging the conventional wisdom of expanding memory banks and proposing a restricted approach, the authors achieve notable improvements in VOS accuracy. This research has the potential to impact various applications that rely on accurate object segmentation in videos, such as video editing, surveillance, and autonomous systems.