New RefAM Framework Sets Benchmark in Referring Segmentation
Researchers Anna Kukleva et al. have introduced RefAM, a novel framework for referring segmentation that leverages attention mechanisms in large generative diffusion models, without requiring task-specific training.
RefAM achieves competitive or state-of-the-art performance on benchmarks like RefCOCO, RefCOCO+, RefCOCOg, and Ref-DAVIS17. It relies on large pre-trained diffusion models and SAM2, which can have inherent biases. The method benefits from language model-generated captions when high-quality captions are unavailable. RefAM uses attention magnets, specifically stop words, to focus on relevant regions and filter out noise from attention 'magnets'. This approach is particularly useful for the task of referring segmentation, which typically requires extensive training or complex system designs. Experiments demonstrate RefAM's strong performance on standard benchmarks without task-specific training. It utilizes cross-attention features extracted from large pre-trained diffusion models and refines attention maps using attention magnets.
RefAM, introduced by Anna Kukleva et al., is a significant advancement in referring segmentation. It achieves state-of-the-art performance on zero-shot referring image and video segmentation benchmarks without additional training or architectural modifications. This innovative method relies on large pre-trained diffusion models and attention magnets, offering a promising direction for future research in this field.