Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
-
Updated
Jan 14, 2025 - Python
Vision-Augmented Retrieval and Generation (VARAG) - Vision first RAG Engine
Official code release for ARTEMIS: Attention-based Retrieval with Text-Explicit Matching and Implicit Similarity (published at ICLR 2022)
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
This repository contains the dataset and source files to reproduce the results in the publication Müller-Budack et al. 2021: "Multimodal news analytics using measures of cross-modal entity and context consistency", In: International Journal on Multimedia Information Retrieval (IJMIR), Vol. 10, Art. no. 2, 2021.
Explores early fusion and late fusion approaches for Multimodal medical Image Retrieval
A Survey of Multimodal Retrieval-Augmented Generation
[CVPR 2025] Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Formalizing Multimedia Recommendation through Multimodal Deep Learning, accepted in ACM Transactions on Recommender Systems.
Multimodal retrieval in art with context embeddings.
A list of research papers on knowledge-enhanced multimodal learning
A generalized self-supervised training paradigm for unimodal and multimodal alignment and fusion.
The official code of "Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search"
Mini-batch selective sampling for knowledge adaption of VLMs for mammography.
Evaluating dense model-based approaches for Multimodal Medical Case retrieval.
Add a description, image, and links to the multimodal-retrieval topic page so that developers can more easily learn about it.
To associate your repository with the multimodal-retrieval topic, visit your repo's landing page and select "manage topics."