This project is designed to find and compare scriptures verses embeddings and cosine similarity.
- Python 3.x
- Docker
-
In your desired directory, clone the repository
git clone https://github.com/cahille/scripture-similarity.git cd scripture-similarity
-
Create and activate a virtual environment
python3 -m venv scripture-similarity source scripture-similarity/bin/activate
-
Install the required Python packages
pip install -r requirements.txt
-
Set up the database
- Install and start Docker
- Start PostgreSQL via docker. Note well the password and database name
docker run --name scripture-similarity -e POSTGRES_USER=scripture-similarity -e POSTGRES_DB=scripture-similarity -e POSTGRES_PASSWORD=scripture-similarity -p 5432:5432 -d pgvector/pgvector:pg17 -c 'listen_addresses=*'
- Create the database tables using
dbmate
dbmate up
- You can check the connection and database state with
docker exec -it `docker ps | grep pgvector | awk '{print $1}'` psql -U scripture-similarity
-
Index
- Download the standard works (Book of Mormon, Doctrine and Covenants, Pearl of Great Price, Old and New Testaments)
- Index the raw verses into the
verse
table - Walk through all the verses and find similar verses in the other works.
python3 indexer.py --threshold 0.5 # 0.5 is the default threshold
- This will likely take a while! Indexing the raw verses will likely take a few minutes, and finding similar verses will
take
much
longer 😁
-
Verify
- You can verify that things are looking good
docker exec -it `docker ps | grep pgvector | awk '{print $1}'` psql -U scripture-similarity \dt # you should see the verse and similar_verse tables SELECT COUNT(*) FROM verse; # at current writing this should be 41995 verses, but there might be more later 😁 SELECT COUNT(*) FROM similar_verseverse; # this will depend on the threshold 😁
-
Enjoy!