Skip to content

Commit 563e760

Browse files
committed
RDoc-3122 Vector search - Overview
1 parent f03836e commit 563e760

File tree

4 files changed

+123
-1
lines changed

4 files changed

+123
-1
lines changed

Documentation/7.0/Raven.Documentation.Pages/.docs.json

+5
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@
3939
"Name": "Integrations",
4040
"Mappings": []
4141
},
42+
{
43+
"Path": "/ai-integration",
44+
"Name": "AI Integration",
45+
"Mappings": []
46+
},
4247
{
4348
"Path": "/glossary",
4449
"Name": "Glossary",
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
[
2+
{
3+
"Path": "ravendb-as-vector-database.markdown",
4+
"Name": "RavenDB as a Vector Database",
5+
"DiscussionId": "3a848621-110f-4bbd-9147-58f743b0a950",
6+
"Mappings": []
7+
},
8+
{
9+
"Path": "vector-search-using-dynamic-query.markdown",
10+
"Name": "Vector Search using a Dynamic Query",
11+
"DiscussionId": "2b55d124-a9ff-474c-8171-d065daf938a3",
12+
"Mappings": []
13+
},
14+
{
15+
"Path": "vector-search-using-static-index.markdown",
16+
"Name": "Vector Search using a Static Index",
17+
"DiscussionId": "a25c7cca-e662-401f-9b66-8ce1102a2a09",
18+
"Mappings": []
19+
}
20+
]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# RavenDB as a Vector Database
2+
---
3+
4+
{NOTE: }
5+
6+
* In this page:
7+
* [What is a vector database](../ai-integration/ravendb-as-vector-database#what-is-a-vector-database)
8+
* [Why choose RavenDB as your vector database](../ai-integration/ravendb-as-vector-database#why-choose-ravendb-as-your-vector-database)
9+
10+
{NOTE/}
11+
12+
---
13+
14+
{PANEL: What is a vector database}
15+
16+
**What is a vector database**:
17+
18+
* A vector database stores data as high-dimensional vectors in a high-dimensional space.
19+
These vectors, known as **embeddings**, are mathematical representations of your data.
20+
21+
* Each embedding is an array of numbers, where each dimension corresponds to specific characteristics of the data, capturing its semantic or contextual meaning.
22+
Words, phrases, entire documents, images, audio, and other types of data can all be vectorized.
23+
24+
* The raw data is converted into embeddings using [transformers](https://huggingface.co/docs/transformers).
25+
To reduce storage and computation, transformers can encode embeddings with lower-precision data types, such as 8-bit integers, through a technique called **quantization**.
26+
27+
**Storing embeddings and searching**:
28+
29+
* Embeddings are indexed and stored in a vector space. Their positions in the space reflect relationships and characteristics of the data.
30+
The distance between two embeddings in the vector space correlates with the semantic similarity of their original inputs.
31+
32+
* Vectors representing similar data are located close to each other in the vector space.
33+
This is achieved using [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world), a vector search algorithm for indexing and querying embeddings.
34+
HNSW creates a graph-based structure to efficiently navigate and retrieve approximate nearest neighbors in high-dimensional spaces.
35+
36+
* This architecture enables **similarity searches**.
37+
Instead of conventional keyword-based queries, a vector database allows you to search for relevant data based on semantic and contextual meaning.
38+
39+
{PANEL/}
40+
41+
{PANEL: Why choose RavenDB as your vector database}
42+
43+
**An integrated solution**:
44+
RavenDB provides an integrated solution that combines high-performance NoSQL capabilities with advanced vector indexing and querying features,
45+
enabling efficient storage and management of high-dimensional vector data.
46+
47+
**Data privacy and ownership**:
48+
With RavenDB, your data remains private.
49+
There's no need to integrate with external vector databases, keeping your sensitive data secure within your own infrastructure.
50+
51+
**AI integration**:
52+
You can use RavenDB as the vector database for your AI-powered applications, including large language models (LLMs).
53+
This eliminates the need to transfer data to expensive external services for vector similarity search,
54+
providing a cost-effective and efficient solution for vector-based operations.
55+
56+
**Built-in embedding support**:
57+
58+
* **Textual input**:
59+
RavenDB uses the [bge-micro-v2](https://huggingface.co/TaylorAI/bge-micro-v2) model to embed textual input from your documents into 384-dimensional dense vectors.
60+
This highly efficient sentence-transformer model ensures precise and compact vector representations.
61+
62+
* **Numerical arrays input**:
63+
Documents in RavenDB can contain numerical arrays with pre-made embeddings created elsewhere.
64+
Use RavenDB's dedicated data type, `RavenVector`, to store these embeddings in your document entities.
65+
This type is highly optimized to reduce storage space and enhance the speed of reading arrays from disk.
66+
67+
* All embeddings, whether generated from textual input or pre-made numerical arrays,
68+
are indexed and searched for using the [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) algorithm.
69+
70+
**Multiple field types in indexes**:
71+
An index can combine different field types, e.g., standard fields, spatial fields, full-text search fields,
72+
and **vector-fields**, allowing queries to retrieve data from all these field types.
73+
74+
Document attachments can also be indexed as vector-fields.
75+
Map-reduce indexes can include vector-fields in their reduce phase.
76+
77+
{PANEL/}
78+
79+
## Related Articles
80+
81+
### Client API
82+
83+
- [RQL](../client-api/session/querying/what-is-rql)
84+
- [Query overview](../client-api/session/querying/how-to-query)
85+
86+
### Vector Search
87+
88+
- [Vector search using a dynamic query](../ai-integration/vector-search-using-dynamic-query.markdown)
89+
- [Vector search using a static index](../ai-integration/vector-search-using-static-index.markdown)
90+
91+
### Server
92+
93+
- [indexing configuration](../server/configuration/indexing-configuration)

Raven.Documentation.Parser/Data/Category.cs

+5-1
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,10 @@ public enum Category
8282

8383
[Prefix("document-extensions")]
8484
[Description("Document Extensions")]
85-
DocumentExtensions
85+
DocumentExtensions,
86+
87+
[Prefix("ai-integration")]
88+
[Description("AI Integration")]
89+
AiIntegration
8690
}
8791
}

0 commit comments

Comments
 (0)