copy and paste this google map to your website or blog!
Press copy button and paste into your blog or website.
(Please switch to 'HTML' mode when posting into your blog. Examples: WordPress Example, Blogger Example)
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis To enable high-quality, efficient, fast, and controllable text-to-image synthesis, we propose Generative Adversarial CLIPs, namely GALIP GALIP leverages the powerful pretrained CLIP model both in the discriminator and generator
SEM-CLIP: Precise Few-Shot Learning for Nanoscale Defect Detection in . . . SEM-CLIP requires little annotated data, substantially reducing labor demands in the semiconductor industry Extensive experimental validation demonstrates that our model achieves impressive classification and segmentation results under few-shot learning scenarios
[2411. 04997] LLM2CLIP: Powerful Language Model Unlocks Richer Visual . . . Motivated by the remarkable advancements in large language models (LLMs), this work explores how LLMs' superior text understanding and extensive open-world knowledge can enhance CLIP's capability, especially for processing longer and more complex image captions
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want To fulfill the requirements, we introduce Alpha-CLIP, an enhanced version of CLIP with an auxiliary alpha channel to suggest attentive regions and fine-tuned with constructed millions of RGBA region-text pairs
[2404. 16030] MoDE: CLIP Data Experts via Clustering - arXiv. org The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering
Chinese CLIP: Contrastive Vision-Language Pretraining in Chinese In this work, we construct a large-scale dataset of image-text pairs in Chinese, where most data are retrieved from publicly available datasets, and we pretrain Chinese CLIP models on the new dataset
Long-CLIP: Unlocking the Long-Text Capability of CLIP To this end, we propose Long-CLIP as a plug-and-play alternative to CLIP that supports long-text input, retains or even surpasses its zero-shot generalizability, and aligns the CLIP latent space, making it readily replace CLIP without any further adaptation in downstream frameworks