Demo showing how to index Astra DB data into Glean.
You can run this tutorial entirely in a google colab, or run it locally by following the instructions below.
ℹ️ See the Astra Reference documentation.
✅ 1.1
: Create an Astra account
Access https://astra.datastax.com and register with Google
or Github
account.
✅ 1.2
: Create a Database in Astra DB
Get to the databases dashboard (by clicking on Databases in the left-hand navigation bar, expanding it if necessary), and click the [Create Database]
button on the right.
ℹ️ Field Description
Field | Description |
---|---|
Vector Database vs Serverless Database | Choose Vector Database . In june 2023, Cassandra introduced the support of vector search to enable Generative AI use cases. |
Database name | Database names are permanent. They must start and end with a letter or number, and they can contain no more than 50 characters, including letters, numbers, and the special characters & + - _ ( ) < > . , @ . It is recommended to have a database for each of your applications. The free tier is limited to 5 databases. |
Cloud Provider | Choose whatever you like. Click a cloud provider logo, pick an Area in the list and finally pick a region. We recommend choosing a region that is closest to you to reduce latency. In the free tier, there is very little difference. |
Cloud Region | Pick a region close to you, among those available for the selected cloud provider and your plan. |
If all fields are filled properly, clicking the "Create Database" button will start the process.
It should take a couple of minutes for your database to become Active
.
✅ 1.3
: Create an Astra Database token
To connect to your database, you need the API endpoint and a Database token.
The API endpoint is available on the database screen, there is a little icon to copy the URL in your clipboard. (it should look like https://<db-id>-<db-region>.apps.astra.datastax.com
).
To get a token click the [Generate Token]
button on the right. It will generate a token that you can copy to your clipboard.
Admins can manage Glean API tokens via the API tokens page within Workspace Settings:
Workspace > Setup > API tokens > Indexing tokens tab
As a Glean admin, create a token and assign permissions (or have an admin do it for you).
✅ 3.1
: Create and activate a virtual environment. You need Python version 3.9 or higher.
python3 -m venv my_virtual_env
macOS/Linux:
source my_virtual_env/bin/activate
Windows:
my_virtual_env\Scripts\activate
✅ 3.2
:Install the dependencies:
pip install -r requirements.txt
Copy .env.example
as .env
, and edit its content with the Astra DB and Glean credentials:
# Astra Configuration
export ASTRA_DB_APPLICATION_TOKEN=<change_me>
export ASTRA_DB_API_ENDPOINT=<change_me>
export ASTRA_DB_COLLECTION_NAME="plain_collection"
# export ASTRA_DB_KEYSPACE="default_keyspace" # Optional
# Glean Configuration
export GLEAN_CUSTOMER=<you>
export GLEAN_DATASOURCE_NAME=<change_me>
export GLEAN_API_TOKEN=<change_me>
python3 astra-glean-import-job.py
Congratulations: you have indexed data from an Astra DB collection into Glean!
You can inspect the Astra DB collection in your Astra dashboard: navigate to the database and find the "Data explorer" tab to locate your collection.
You can perform a test with Glean: search for the content you just indexed and verify the response contains information coming from the inserted dataset.
ℹ️ Glean integration page on Astra DB documentation.