Abstract: Vision-language models (VLMs), such as CLIP, play a foundational role in various cross-modal applications. To fully leverage the potential of VLMs in adapting to downstream tasks, context ...
Abstract: Video data and algorithms have been driving advances in multi-object tracking (MOT). While existing MOT datasets focus on occlusion and appearance similarity, complex motion patterns are ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results