Skip to content

Commit fff4561

Browse files
ProchazkaDavidTerézia Slanináková
and
Terézia Slanináková
authored
Merge all changes from dev to main so that dev can be deleted. (#2)
* Added dockerfile, requirements, installation instructions, fixed contiguous array error in Python3.10 * Fixed old email * Added NAR link for AlphaFind --------- Co-authored-by: Terézia Slanináková <[email protected]>
1 parent 3e84db4 commit fff4561

File tree

6 files changed

+77
-20
lines changed

6 files changed

+77
-20
lines changed

Dockerfile

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
FROM python:3.10-slim
2+
3+
LABEL author="Terézia Slanináková"
4+
LABEL email="[email protected]"
5+
LABEL website="https://disa.fi.muni.cz/complex-data-analysis/"
6+
7+
# Set the version of the image to use, default: cpu
8+
ARG version=cpu
9+
10+
# Install linux packages
11+
RUN apt-get update && apt-get install vim -y
12+
13+
# Install required python packages
14+
COPY requirements-${version}.txt /tmp/
15+
COPY requirements.txt /tmp/
16+
RUN pip install --upgrade pip
17+
RUN pip install --no-cache-dir -r /tmp/requirements-${version}.txt
18+
19+
# Create user, make it the owner of the home directory
20+
RUN addgroup --gid 1000 user && adduser --gid 1000 --uid 1000 --disabled-password --gecos user user
21+
USER root
22+
RUN chown -R user:user /home/user && chmod -R 755 /home/user
23+
24+
# Copy the files from the host to the container and install the local package
25+
COPY . /home/user
26+
RUN pip install -e /home/user
27+
28+
USER user
29+
WORKDIR /home/user
30+
31+
CMD ['/bin/sh', '-c', 'bash']

README.md

Lines changed: 30 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,40 +2,49 @@
22

33
Learned Metric Index (LMI) is an index for approximate nearest neighbor search on complex data using machine learning and probability-based navigation.
44

5-
65
# Getting started
76

87
See examples of how to index and search in a dataset in: [01_Introduction.ipynb](01_Introduction.ipynb) notebook.
98

109
## Installation
1110

12-
See also ``.github/workflows/ci.yml``
13-
14-
### Using conda
11+
### Using virtualenv
1512
```bash
16-
conda create -n env python=3.8
17-
conda activate env
18-
conda install matplotlib pandas scikit-learn jupyterlab
19-
pip install h5py flake8 setuptools tqdm faiss-cpu
20-
pip install torch --index-url https://download.pytorch.org/whl/cpu
13+
# 1) Clone the repo with submodules
14+
git clone --recursive [email protected]:LearnedMetricIndex/LearnedMetricIndex.git
15+
# 2) Create and activate a new virtual environment
16+
python -m venv lmi-env
17+
source lmi-env/bin/activate
18+
# 3) Install the dependencies
19+
pip install -r requirements-cpu.txt # alternatively requirements-gpu.txt
2120
pip install --editable .
2221
```
2322

24-
## Running
23+
### Using docker
2524

26-
```bash
27-
jupyter-lab
28-
# and open 01_Introduction.ipynb
25+
Requirements:
26+
- [Docker](https://docs.docker.com/get-docker/)
27+
- At least 1.5 gb disk space for the CPU and up to 5.5 gb for the GPU version
2928

30-
# or
31-
python3 search/search.py
29+
```bash
30+
# 1) Clone the repo with submodules
31+
git clone --recursive [email protected]:LearnedMetricIndex/LearnedMetricIndex.git
32+
# 2) Build the docker image (CPU version)
33+
docker build -t lmi -f Dockerfile --build-arg version=cpu .
34+
# alternatively: docker build -t lmi -f Dockerfile --build-arg version=gpu .
35+
# 3) Run the docker image
36+
docker run -p 8888:8888 -it lmi bash
3237
```
3338

34-
## Evaluation
39+
## Running
3540

3641
```bash
37-
python3 eval/eval.py
38-
python3 eval/plot.py res.csv
42+
# Run jupyterlab, copy the outputted url into the browser and open 01_Introduction.ipynb
43+
jupyter-lab --ip 0.0.0.0 --no-browser
44+
45+
# Run the search on 100k data subset, evaluate the results and plot them.
46+
# Expected time to run = ~5-10 mins
47+
python3 search/search.py && python eval/eval.py && python eval/plot.py res.csv
3948
```
4049

4150
## Performance
@@ -64,7 +73,8 @@ python3 eval/plot.py res.csv
6473
- ~6h of runtime (waries depending on the hardware)
6574

6675
# LMI in action
67-
🌐 [**Similarity search in 214M protein structures (AlphaFold DB)**](https://alphafind.fi.muni.cz/search)
76+
77+
- 🌐 [**Similarity search in 214M protein structures (AlphaFold DB)**](https://alphafind.fi.muni.cz/)
6878

6979
# Publications
7080

@@ -86,7 +96,7 @@ python3 eval/plot.py res.csv
8696
- [**Web**](https://alphafind.fi.muni.cz/search)
8797
- [**Repository**](https://github.com/Coda-Research-Group/AlphaFind)
8898
- [**Data**](https://data.narodni-repozitar.cz/general/datasets/d35zf-1ja47)
89-
> PROCHÁZKA, David, Terézia SLANINÁKOVÁ, Jaroslav OĽHA, Adrián ROŠINEC, Katarína GREŠOVÁ, Miriama JÁNOŠOVÁ, Jakub ČILLÍK, Jana PORUBSKÁ, Radka SVOBODOVÁ, Vlastislav DOHNAL a Matej ANTOL.: [AlphaFind: Discover structure similarity across the entire known proteome](https://www.biorxiv.org/content/10.1101/2024.02.15.580465v1). BioRxiv (pre-print version)
99+
> Procházka, D., Slanináková, T., Oľha, J., Rošinec, A., Grešová, K., Jánošová, M., Čillík, J., Porubská, J., Svobodová, R., Dohnal, V., & Antol, M. (2024). [AlphaFind: discover structure similarity across the proteome in AlphaFold DB](https://academic.oup.com/nar/article/52/W1/W182/7673488). Nucleic Acids Research.
90100
91101

92102
## Team

requirements-cpu.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
-r requirements.txt
2+
--extra-index-url=https://download.pytorch.org/whl/cpu
3+
torch==2.1.1

requirements-gpu.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
-r requirements.txt
2+
torch==2.1.1

requirements.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
numpy==1.26.2
2+
matplotlib==3.8.1
3+
pandas==2.1.3
4+
scikit-learn==1.3.2
5+
h5py==3.10.0
6+
flake8==6.1.0
7+
tqdm==4.66.1
8+
faiss-cpu==1.7.4
9+
setuptools==60.7.0
10+
jupyterlab==4.0.8

search/li/clustering/faiss_kmeans.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ def cluster(
1616
_, d = data.shape
1717

1818
kmeans = Kmeans(d=d, k=n_clusters, **parameters)
19+
data = np.ascontiguousarray(data.astype(np.float32))
1920
kmeans.train(data)
2021

2122
labels = kmeans.index.search(data, 1)[1].T[0] # type: ignore

0 commit comments

Comments
 (0)