RDoc-3122 Vector search - Overview

Danielle9897 · Danielle9897 · commit 563e76031467 · 2025-01-30T17:35:34.000+02:00
diff --git a/Documentation/7.0/Raven.Documentation.Pages/.docs.json b/Documentation/7.0/Raven.Documentation.Pages/.docs.json
@@ -39,6 +39,11 @@
         "Name": "Integrations",
         "Mappings": []
     },
+    {
+        "Path": "/ai-integration",
+        "Name": "AI Integration",
+        "Mappings": []
+    },
     {
         "Path": "/glossary",
         "Name": "Glossary",
diff --git a/Documentation/7.0/Raven.Documentation.Pages/ai-integration/.docs.json b/Documentation/7.0/Raven.Documentation.Pages/ai-integration/.docs.json
@@ -0,0 +1,20 @@
+[
+  {
+    "Path": "ravendb-as-vector-database.markdown",
+    "Name": "RavenDB as a Vector Database",
+    "DiscussionId": "3a848621-110f-4bbd-9147-58f743b0a950",
+    "Mappings": []
+  },
+  {
+    "Path": "vector-search-using-dynamic-query.markdown",
+    "Name": "Vector Search using a Dynamic Query",
+    "DiscussionId": "2b55d124-a9ff-474c-8171-d065daf938a3",
+    "Mappings": []
+  },
+  {
+    "Path": "vector-search-using-static-index.markdown",
+    "Name": "Vector Search using a Static Index",
+    "DiscussionId": "a25c7cca-e662-401f-9b66-8ce1102a2a09",
+    "Mappings": []
+  }
+]
diff --git a/Documentation/7.0/Raven.Documentation.Pages/ai-integration/ravendb-as-vector-database.markdown b/Documentation/7.0/Raven.Documentation.Pages/ai-integration/ravendb-as-vector-database.markdown
@@ -0,0 +1,93 @@
+# RavenDB as a Vector Database
+---
+
+{NOTE: }
+
+* In this page:
+    * [What is a vector database](../ai-integration/ravendb-as-vector-database#what-is-a-vector-database)
+    * [Why choose RavenDB as your vector database](../ai-integration/ravendb-as-vector-database#why-choose-ravendb-as-your-vector-database)
+    
+{NOTE/}
+
+---
+
+{PANEL: What is a vector database}
+
+**What is a vector database**:  
+
+* A vector database stores data as high-dimensional vectors in a high-dimensional space.  
+  These vectors, known as **embeddings**, are mathematical representations of your data.
+
+* Each embedding is an array of numbers, where each dimension corresponds to specific characteristics of the data, capturing its semantic or contextual meaning.
+  Words, phrases, entire documents, images, audio, and other types of data can all be vectorized.
+
+* The raw data is converted into embeddings using [transformers](https://huggingface.co/docs/transformers).  
+  To reduce storage and computation, transformers can encode embeddings with lower-precision data types, such as 8-bit integers, through a technique called **quantization**.
+
+**Storing embeddings and searching**:  
+
+* Embeddings are indexed and stored in a vector space. Their positions in the space reflect relationships and characteristics of the data.
+  The distance between two embeddings in the vector space correlates with the semantic similarity of their original inputs. 
+  
+* Vectors representing similar data are located close to each other in the vector space.  
+  This is achieved using [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world), a vector search algorithm for indexing and querying embeddings.  
+  HNSW creates a graph-based structure to efficiently navigate and retrieve approximate nearest neighbors in high-dimensional spaces.
+
+* This architecture enables **similarity searches**.
+  Instead of conventional keyword-based queries, a vector database allows you to search for relevant data based on semantic and contextual meaning.
+
+{PANEL/}
+
+{PANEL: Why choose RavenDB as your vector database}
+
+**An integrated solution**:  
+RavenDB provides an integrated solution that combines high-performance NoSQL capabilities with advanced vector indexing and querying features,
+enabling efficient storage and management of high-dimensional vector data.
+
+**Data privacy and ownership**:  
+With RavenDB, your data remains private. 
+There's no need to integrate with external vector databases, keeping your sensitive data secure within your own infrastructure.
+
+**AI integration**:  
+You can use RavenDB as the vector database for your AI-powered applications, including large language models (LLMs).
+This eliminates the need to transfer data to expensive external services for vector similarity search,
+providing a cost-effective and efficient solution for vector-based operations.
+
+**Built-in embedding support**:  
+
+* **Textual input**:  
+  RavenDB uses the [bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2) model to embed textual input from your documents into 384-dimensional dense vectors.
+  This highly efficient sentence-transformer model ensures precise and compact vector representations.
+
+* **Numerical arrays input**:  
+  Documents in RavenDB can contain numerical arrays with pre-made embeddings created elsewhere.  
+  Use RavenDB's dedicated data type, `RavenVector`, to store these embeddings in your document entities.  
+  This type is highly optimized to reduce storage space and enhance the speed of reading arrays from disk.
+
+* All embeddings, whether generated from textual input or pre-made numerical arrays,  
+  are indexed and searched for using the [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) algorithm.
+
+**Multiple field types in indexes**:  
+An index can combine different field types, e.g., standard fields, spatial fields, full-text search fields,  
+and **vector-fields**, allowing queries to retrieve data from all these field types.  
+
+Document attachments can also be indexed as vector-fields.  
+Map-reduce indexes can include vector-fields in their reduce phase.
+
+{PANEL/}
+
+## Related Articles
+
+### Client API
+
+- [RQL](../client-api/session/querying/what-is-rql) 
+- [Query overview](../client-api/session/querying/how-to-query)
+
+### Vector Search
+
+- [Vector search using a dynamic query](../ai-integration/vector-search-using-dynamic-query.markdown)
+- [Vector search using a static index](../ai-integration/vector-search-using-static-index.markdown)
+
+### Server
+
+- [indexing configuration](../server/configuration/indexing-configuration)
diff --git a/Raven.Documentation.Parser/Data/Category.cs b/Raven.Documentation.Parser/Data/Category.cs
@@ -82,6 +82,10 @@ public enum Category
 
         [Prefix("document-extensions")]
         [Description("Document Extensions")]
-        DocumentExtensions
+        DocumentExtensions,
+        
+        [Prefix("ai-integration")]
+        [Description("AI Integration")]
+        AiIntegration
     }
 }

Original file line number	Diff line number	Diff line change
`@@ -82,6 +82,10 @@ public enum Category`
`82`	`82`
`83`	`83`	`[Prefix("document-extensions")]`
`84`	`84`	`[Description("Document Extensions")]`
`85`		`- DocumentExtensions`
	`85`	`+ DocumentExtensions,`
	`86`	`+`
	`87`	`+ [Prefix("ai-integration")]`
	`88`	`+ [Description("AI Integration")]`
	`89`	`+ AiIntegration`
`86`	`90`	`}`
`87`	`91`	`}`