copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
[2411. 04997] LLM2CLIP: Powerful Language Model Unlocks Richer Visual . . . We propose an efficient post-training strategy that integrates LLMs into pretrained CLIP To address the challenge posed by the autoregressive nature of LLMs, we introduce a caption-to-caption contrastive fine-tuning framework, significantly enhancing the discriminative quality of LLM outputs
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Contrastive Language-Image Pre-training (CLIP) plays an essential role in extracting valuable content information from images across diverse tasks It aligns textual and visual modalities to comprehend the entire image, including all the details, even those irrelevant to specific tasks
Long-CLIP: Unlocking the Long-Text Capability of CLIP Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities Despite its widespread adoption, a significant limitation of CLIP lies in the inadequate length of text input
EVA-CLIP: Improved Training Techniques for CLIP at Scale Contrastive language-image pre-training, CLIP for short, has gained increasing attention for its potential in various scenarios In this paper, we propose EVA-CLIP, a series of models that significantly improve the efficiency and effectiveness of CLIP training
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis To enable high-quality, efficient, fast, and controllable text-to-image synthesis, we propose Generative Adversarial CLIPs, namely GALIP GALIP leverages the powerful pretrained CLIP model both in the discriminator and generator Specifically, we propose a CLIP-based discriminator
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset We develop 5 Chinese CLIP models of multiple sizes, spanning from 77 to 958 million parameters