NeoChemSynthWave: Data

Welcome to the NeoChemSynthWave: Data project !!!

Over the past decade, computer-assisted chemical synthesis has re-emerged as a prominent research subject. Even though the idea of utilizing computers to assist chemical synthesis has existed for nearly as long as computers themselves, the inherent complexity repeatedly exceeded the available resources. However, recent machine learning approaches have exhibited the potential to break this tendency. The performance of such approaches is dependent on data that frequently suffer from limited quantity, quality, visibility, and accessibility, posing significant challenges to potential scientific breakthroughs. Consequently, the primary objective of the NeoChemSynthWave: Data project is to provide access to essential open computer-assisted chemical synthesis data.

Installation

An environment can be created using the git and conda commands as follows:

git clone https://github.com/neo-chem-synth-wave/ncsw-data.git

cd ncsw-data

conda env create -f environment.yaml

conda activate ncsw-data-env

The ncsw_data package can be installed using the pip command as follows:

pip install .

Utilization

The purpose of the case_study directory is to illustrate how to download, extract, and format the relevant data and subsequently construct, manage, and query a version of the Computer-assisted Chemical Synthesis (CaCS) database that reflects the current state of computer-assisted chemical synthesis data.

First, the a_download_extract_and_format_data script can be utilized as follows:

python use_case/scripts/a_download_extract_and_format_data.py \
  --data_source_category "reaction" \
  --data_source_name "uspto" \
  --data_source_version "v_50k_by_20171116_coley_c_w_et_al" \
  --output_directory_path "/path/to/the/output/directory"

Next, the b_insert_archive_data script can be utilized as follows:

python use_case/scripts/b_insert_archive_data.py \
  --sqlite_database_file_path "sqlite:////path/to/the/cacs_db.sqlite" \
  --input_csv_file_path "/path/to/the/xxx_uspto_v_50k_by_20171116_coley_c_w_et_al.csv" \
  --smiles_or_smarts_column_name "rxn_smiles" \
  --file_name_column_name "file_name" \
  --data_source_category "reaction" \
  --data_source_name "uspto" \
  --data_source_version "v_50k_by_20171116_coley_c_w_et_al"

Next, the c_migrate_archive_to_workbench_data script can be utilized as follows:

python use_case/scripts/c_migrate_archive_to_workbench_data.py \
  --sqlite_database_file_path "sqlite:////path/to/the/cacs_db.sqlite" \
  --data_source_category "reaction"

Ultimately, the d_update_workbench_data script can be utilized as follows:

python use_case/scripts/d_update_workbench_data.py \
  --sqlite_database_file_path "sqlite:////path/to/the/cacs_db.sqlite"

The relevant SQLite scripts and Jupyter notebooks of the case study illustrating the querying of the CaCS database can be found in the notebooks directory.

License Information

The contents of this repository are published under the MIT license. Please refer to the individual references for more details regarding the license information of external resources utilized within the repository.

Contact

If you are interested in contributing to this research project by reporting bugs, suggesting improvements, or submitting feedback, feel free to do so using GitHub Issues.

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
case_study		case_study
ncsw_data		ncsw_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NeoChemSynthWave: Data

Installation

Utilization

License Information

Contact

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

neo-chem-synth-wave/ncsw-data

Folders and files

Latest commit

History

Repository files navigation

NeoChemSynthWave: Data

Installation

Utilization

License Information

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages