Skip to content
This repository was archived by the owner on May 9, 2023. It is now read-only.

minor updates; stylistic touch-ups #35

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
**/.DS_Store

# User generated files
**/temp/
**/design-*.json
**/toy_*.json

Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"\n",
"*Authors: Enze Chen, Eric Lundberg*\n",
"\n",
"In this notebook, we will cover how to *create* a data view using the [Citrination API](http://citrineinformatics.github.io/python-citrination-client/). Data views provide the configuration necessary in order to perform machine learning and identify relationships in your data. We will demonstrate this functionality using the [Band gaps from Strehlow and Cook](https://citrination.com/datasets/1160/show_search?searchMatchOption=fuzzyMatch) dataset, where we will create a view mapping: \n",
"In this notebook, we will cover how to *create* a data view using the [Citrination API](http://citrineinformatics.github.io/python-citrination-client/). Data views provide the configuration necessary in order to perform machine learning and data analysis. We will demonstrate this functionality using the [Band gaps from Strehlow and Cook](https://citrination.com/datasets/1160/show_search?searchMatchOption=fuzzyMatch) dataset, where we will create a view mapping: \n",
"\n",
"$$\\text{Chemical formula (inorganic) + Crystallinity (categorical)} \\longrightarrow \\boxed{\\text{ML model}} \\longrightarrow \\text{Band gap (real)}$$"
]
Expand Down Expand Up @@ -79,10 +79,9 @@
"outputs": [],
"source": [
"# Standard packages\n",
"import json\n",
"import os\n",
"import time\n",
"import uuid # generating random IDs\n",
"from os import environ # get environment variables\n",
"from time import sleep # wait time\n",
"from uuid import uuid4 # generating random IDs\n",
"\n",
"# Third-party packages\n",
"from citrination_client import *\n",
Expand All @@ -97,12 +96,18 @@
"\n",
"[Back to ToC](#Table-of-contents)\n",
"\n",
"The [`DataViewBuilder`](http://citrineinformatics.github.io/python-citrination-client/modules/views/ml_config_builder.html) class handles the configuration for data views and returns a **configuration** object that is an input for the `DataViewsClient`. The configuration specifies the datasets, model, and descriptors. Some of the important parameters to note are:\n",
"* **dataset_ids**: An array of strings, one for each dataset ID that should be included in the view.\n",
"* **descriptors**: A descriptor instance, which could be `{RealDescriptor, InorganicDescriptor, OrganicDescriptor, CategoricalDescriptor,` or `AlloyCompositionDescriptor}`.\n",
" * **Note 1**: Chemical formulas for the API take the key `formula`.\n",
" * **Note 2**: Properties take the key `Property <property name>`.\n",
"* **roles**: A role for each descriptor, as a string, which could be `{input, output, latentVariable, ignored}`."
"The [`DataViewBuilder`](http://citrineinformatics.github.io/python-citrination-client/modules/views/ml_config_builder.html) class handles the configuration for data views and returns a **configuration** object that is an input for the `DataViewsClient`. The configuration specifies:\n",
"* The datasets you want to include.\n",
"* The ML model you want to use.\n",
"* Which properties you want to use as descriptors. \n",
"\n",
"Some of the important parameters to note are:\n",
"* `dataset_ids`: An array of strings, one for each dataset ID that should be included in the view.\n",
"* `descriptors`: A descriptor instance, which is one of `{RealDescriptor, InorganicDescriptor, OrganicDescriptor, CategoricalDescriptor,` or `AlloyCompositionDescriptor}`.\n",
" * *Note 1*: Chemical formulas for the API take the key `\"formula\"`.\n",
" * *Note 2*: Properties take the key `\"Property [property name]\"`.\n",
" * *Note 3*: Strings are **Case-sensitive!**\n",
"* `roles`: A role for each descriptor, as a string, which is one of `{'input', 'output', 'latentVariable',` or `'ignored'}`."
]
},
{
Expand All @@ -115,16 +120,26 @@
"dv_builder = DataViewBuilder()\n",
"dv_builder.dataset_ids(['172242']) # ID number for band gaps dataset\n",
"\n",
"# Define descriptors\n",
"# Define crystallinity descriptor\n",
"crystallinity = ['Single crystalline', 'Polycrystalline', 'Amorphous'] # Obtained from dataset\n",
"desc_crystal = CategoricalDescriptor(key='Property Crystallinity', categories=crystallinity)\n",
"dv_builder.add_descriptor(descriptor=desc_crystal, role='input')\n",
"desc_crystal = CategoricalDescriptor(key='Property Crystallinity', \n",
" categories=crystallinity)\n",
"dv_builder.add_descriptor(descriptor=desc_crystal, \n",
" role='input')\n",
"\n",
"desc_formula = InorganicDescriptor(key='formula', threshold=1.0) # threshold <= 1.0; default in future releases\n",
"dv_builder.add_descriptor(descriptor=desc_formula, role='input')\n",
"# Define chemical formula descriptor\n",
"desc_formula = InorganicDescriptor(key='formula', \n",
" threshold=1.0)\n",
"dv_builder.add_descriptor(descriptor=desc_formula, \n",
" role='input')\n",
"\n",
"desc_bandgap = RealDescriptor(key='Property Band gap', lower_bound=0.0, upper_bound=1e2, units='eV')\n",
"dv_builder.add_descriptor(descriptor=desc_bandgap, role='output')\n",
"# Define band gap descriptor\n",
"desc_bandgap = RealDescriptor(key='Property Band gap', \n",
" lower_bound=0.0, \n",
" upper_bound=1e3, \n",
" units='eV')\n",
"dv_builder.add_descriptor(descriptor=desc_bandgap, \n",
" role='output')\n",
"\n",
"# Build the configuration once all the pieces are in place\n",
"view_config = dv_builder.build()"
Expand All @@ -138,7 +153,7 @@
"\n",
"[Back to ToC](#Table-of-contents)\n",
"\n",
"After obtaining your customized configuration, you have to initialize a [`DataViewsClient`](http://citrineinformatics.github.io/python-citrination-client/modules/views/data_views_client.html) instance in order to create a data view from the configuration you built. The `create()` method returns the ID for the data view, which you will need for subsequent analysis and retraining."
"After obtaining your customized configuration, you have to initialize a [`DataViewsClient`](http://citrineinformatics.github.io/python-citrination-client/modules/views/data_views_client.html) instance in order to create a data view from the configuration you built."
]
},
{
Expand All @@ -148,26 +163,48 @@
"outputs": [],
"source": [
"# Instantiate the base CitrinationClient\n",
"site = 'https://citrination.com' # site you want to access; we'll use the public site\n",
"client = CitrinationClient(api_key=os.environ.get('CITRINATION_API_KEY'), site=site)\n",
"site = 'https://citrination.com' # site you want to access; we'll use the public site\n",
"client = CitrinationClient(api_key=environ.get('CITRINATION_API_KEY'), \n",
" site=site)\n",
"\n",
"# Instantiate the DataViewsClient\n",
"views_client = client.data_views\n",
"views_client # reveal the methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `create()` method for the `DataViewClient` takes as input:\n",
"* `configuration`: A view configuration, like the template you created above.\n",
"* `name`: A name for the data view (must be unique among your data views).\n",
"* `description`: A description for the data view.\n",
"\n",
"and returns the ID for the data view, which you will need for subsequent analysis and retraining."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a data view using the above configuration and store the ID\n",
"view_name = 'PyCC View ' + str(uuid.uuid4()) # random name to avoid clashes\n",
"view_name = 'PyCC View ' + str(uuid4())[:6] # random name to avoid clashes\n",
"view_desc = 'This view was created by the PyCC API tutorial.'\n",
"view_id = views_client.create(configuration=view_config, name=view_name, description=view_desc)\n",
"view_id = views_client.create(configuration=view_config, \n",
" name=view_name, \n",
" description=view_desc)\n",
"print('Data view {} was successfully created.'.format(view_id))\n",
"print('It can be accessed at {}/data_views/{}.'.format(site, view_id))"
"print('It can be accessed at {}/data_views/{}'.format(site, view_id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clicking the above URL will take you to the data view you just created on your deployment of Citrination."
]
},
{
Expand Down Expand Up @@ -198,7 +235,7 @@
"metadata": {},
"source": [
"### Check status of services\n",
"If there's a lot of data, training might take some time, and you might want to check when `predict` services are ready. Other possible services include `experimental_design`, `data_reports`, and `model_reports`."
"If there's a lot of data, training might take some time, and you might want to check when certain services are ready. Possible services enabled by data views include `predict`, `experimental_design`, `data_reports`, and `model_reports`."
]
},
{
Expand All @@ -207,13 +244,19 @@
"metadata": {},
"outputs": [],
"source": [
"# Use a loop to monitor status\n",
"# Use a loop to monitor view status\n",
"while True:\n",
" predict_state = views_client.get_data_view_service_status(view_id).predict.reason\n",
" print(predict_state)\n",
" if predict_state == 'Predict services are ready.':\n",
" view_status = views_client.get_data_view_service_status(data_view_id=view_id)\n",
" \n",
" # Design and Predict are most important endpoints to check\n",
" if (view_status.experimental_design.ready and\n",
" view_status.predict.event.normalized_progress == 1.0):\n",
" print(\"Data view ready!\")\n",
" print(\"Data view URL: {}/data_views/{}\".format(site, view_id))\n",
" break\n",
" time.sleep(10)"
" else:\n",
" print(\"Waiting for data view services...\")\n",
" sleep(10)"
]
},
{
Expand Down Expand Up @@ -271,7 +314,7 @@
"To recap, this notebook went through the steps for creating a data view using the API.\n",
"1. First, we used the `DataViewBuilder` object to specify the configuration.\n",
"2. Then, we trained the model, which is simple as long as the configuration is correct.\n",
"3. Lastly, we explored some of the post-processing capabilities, such as retraining and submitting predictions."
"3. We showed how to monitor the status of various endpoints enabled by data views."
]
},
{
Expand Down
Loading