PaddleVideo applications T2VLAD README_en. md at develop - GitHub Text-video retrieval is a challenging task that aims to search relevant video contents based on natural language descriptions The key to this problem is to measure text- video similarities in a joint embedding space T2VLAD designs an efficient global-local alignment method
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval In this paper, we propose an efficient global-local se-quence alignment method for text-video retrieval In the local perspective, we aim to utilize a number of learn-able semantic topics to jointly summarize both texts and videos
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval Ø Global alignment gives a comprehensive similarity measurement Ø Local alignment provides fine-grained comparisons by computing the similarities between the local text-video features on the same semantic cues Ø Ranking: Ours 0 06s vs MMT 0 05s Thank You!
expo applications T2VLAD README_en. md at develop - GitHub T2VLAD designs an efficient global-local alignment method This model achieves consistent improvements on three standard text-video retrieval benchmarks and outperform the state- of-the-art by a clear margin