Skip to content

Canadian-Geospatial-Platform/similarity-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

similarity_engine

Machine learning models to recommend similary uuid on geo.ca

Setup

Having a virtual environment using anaconda or virtualenv will help project specific scope of packages.

The project uses python 3.9.17 inside an anaconda environment

conda create -n similarity_engine python=3.9.17

Install packages.

pip install -r requirements.txt

Models

The following models are being used.

  1. Word2Vec
  2. BERT
  3. DistillBERT -> A smaller (up to 40% reduction in size during pretraining) and faster (up to 60% faster); preserves 95% of the BERT's performance on the GLUE Benchmark.
  4. RoBERTa - base models
  5. stsb-roberta-large

Evaluation

Using perplexity for now. May switch to some external metrics.

About

Machine learning models to recommend similary uuid on geo.ca

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •