New RefAM Framework Sets Benchmark in Referring Segmentation

RefAM's innovative use of attention magnets and large pre-trained diffusion models is revolutionizing referring segmentation, achieving impressive results without additional training.

, and Administrator

2025 October 29 . 10:01 AM

1 min read

In this picture there is a man who is standing in the center of the image and there is a lady who... — In this picture there is a man who is standing in the center of the image and there is a lady who is standing on the right side of the image, there is a boy behind the man he is taking the video and there are machinery on the right and left side of the image.

New RefAM Framework Sets Benchmark in Referring Segmentation

Researchers Anna Kukleva et al. have introduced RefAM, a novel framework for referring segmentation that leverages attention mechanisms in large generative diffusion models, without requiring task-specific training.

RefAM achieves competitive or state-of-the-art performance on benchmarks like RefCOCO, RefCOCO+, RefCOCOg, and Ref-DAVIS17. It relies on large pre-trained diffusion models and SAM2, which can have inherent biases. The method benefits from language model-generated captions when high-quality captions are unavailable. RefAM uses attention magnets, specifically stop words, to focus on relevant regions and filter out noise from attention 'magnets'. This approach is particularly useful for the task of referring segmentation, which typically requires extensive training or complex system designs. Experiments demonstrate RefAM's strong performance on standard benchmarks without task-specific training. It utilizes cross-attention features extracted from large pre-trained diffusion models and refines attention maps using attention magnets.

RefAM, introduced by Anna Kukleva et al., is a significant advancement in referring segmentation. It achieves state-of-the-art performance on zero-shot referring image and video segmentation benchmarks without additional training or architectural modifications. This innovative method relies on large pre-trained diffusion models and attention magnets, offering a promising direction for future research in this field.