Skip to content
This repository was archived by the owner on May 9, 2023. It is now read-only.

minor updates; stylistic touch-ups #35

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
**/.DS_Store

# User generated files
**/temp/
**/design-*.json
**/toy_*.json

Expand Down

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"\n",
"*Authors: Enze Chen, Eric Lundberg*\n",
"\n",
"In this notebook, we will cover how to *create* a data view using the [Citrination API](http://citrineinformatics.github.io/python-citrination-client/). Data views provide the configuration necessary in order to perform machine learning and identify relationships in your data. We will demonstrate this functionality using the [Band gaps from Strehlow and Cook](https://citrination.com/datasets/1160/show_search?searchMatchOption=fuzzyMatch) dataset, where we will create a view mapping: \n",
"In this notebook, we will cover how to *create* a data view using the [Citrination API](http://citrineinformatics.github.io/python-citrination-client/). Data views provide the configuration necessary in order to perform machine learning and data analysis. We will demonstrate this functionality using the [Band gaps from Strehlow and Cook](https://citrination.com/datasets/1160/show_search?searchMatchOption=fuzzyMatch) dataset, where we will create a view mapping: \n",
"\n",
"$$\\text{Chemical formula (inorganic) + Crystallinity (categorical)} \\longrightarrow \\boxed{\\text{ML model}} \\longrightarrow \\text{Band gap (real)}$$"
]
Expand Down Expand Up @@ -79,10 +79,9 @@
"outputs": [],
"source": [
"# Standard packages\n",
"import json\n",
"import os\n",
"import time\n",
"import uuid # generating random IDs\n",
"from os import environ # get environment variables\n",
"from time import sleep # wait time\n",
"from uuid import uuid4 # generating random IDs\n",
"\n",
"# Third-party packages\n",
"from citrination_client import *\n",
Expand All @@ -97,12 +96,18 @@
"\n",
"[Back to ToC](#Table-of-contents)\n",
"\n",
"The [`DataViewBuilder`](http://citrineinformatics.github.io/python-citrination-client/modules/views/ml_config_builder.html) class handles the configuration for data views and returns a **configuration** object that is an input for the `DataViewsClient`. The configuration specifies the datasets, model, and descriptors. Some of the important parameters to note are:\n",
"* **dataset_ids**: An array of strings, one for each dataset ID that should be included in the view.\n",
"* **descriptors**: A descriptor instance, which could be `{RealDescriptor, InorganicDescriptor, OrganicDescriptor, CategoricalDescriptor,` or `AlloyCompositionDescriptor}`.\n",
" * **Note 1**: Chemical formulas for the API take the key `formula`.\n",
" * **Note 2**: Properties take the key `Property <property name>`.\n",
"* **roles**: A role for each descriptor, as a string, which could be `{input, output, latentVariable, ignored}`."
"The [`DataViewBuilder`](http://citrineinformatics.github.io/python-citrination-client/modules/views/ml_config_builder.html) class handles the configuration for data views and returns a **configuration** object that is an input for the `DataViewsClient`. The configuration specifies:\n",
"* The datasets you want to include.\n",
"* The ML model you want to use.\n",
"* Which properties you want to use as descriptors. \n",
"\n",
"Some of the important parameters to note are:\n",
"* `dataset_ids`: An array of strings, one for each dataset ID that should be included in the view.\n",
"* `descriptors`: A descriptor instance, which is one of `{RealDescriptor, InorganicDescriptor, OrganicDescriptor, CategoricalDescriptor,` or `AlloyCompositionDescriptor}`.\n",
" * *Note 1*: Chemical formulas for the API take the key `\"formula\"`.\n",
" * *Note 2*: Properties take the key `\"Property [property name]\"`.\n",
" * *Note 3*: Strings are **Case-sensitive!**\n",
"* `roles`: A role for each descriptor, as a string, which is one of `{'input', 'output', 'latentVariable',` or `'ignored'}`."
]
},
{
Expand All @@ -115,16 +120,26 @@
"dv_builder = DataViewBuilder()\n",
"dv_builder.dataset_ids(['172242']) # ID number for band gaps dataset\n",
"\n",
"# Define descriptors\n",
"# Define crystallinity descriptor\n",
"crystallinity = ['Single crystalline', 'Polycrystalline', 'Amorphous'] # Obtained from dataset\n",
"desc_crystal = CategoricalDescriptor(key='Property Crystallinity', categories=crystallinity)\n",
"dv_builder.add_descriptor(descriptor=desc_crystal, role='input')\n",
"desc_crystal = CategoricalDescriptor(key='Property Crystallinity', \n",
" categories=crystallinity)\n",
"dv_builder.add_descriptor(descriptor=desc_crystal, \n",
" role='input')\n",
"\n",
"desc_formula = InorganicDescriptor(key='formula', threshold=1.0) # threshold <= 1.0; default in future releases\n",
"dv_builder.add_descriptor(descriptor=desc_formula, role='input')\n",
"# Define chemical formula descriptor\n",
"desc_formula = InorganicDescriptor(key='formula', \n",
" threshold=1.0)\n",
"dv_builder.add_descriptor(descriptor=desc_formula, \n",
" role='input')\n",
"\n",
"desc_bandgap = RealDescriptor(key='Property Band gap', lower_bound=0.0, upper_bound=1e2, units='eV')\n",
"dv_builder.add_descriptor(descriptor=desc_bandgap, role='output')\n",
"# Define band gap descriptor\n",
"desc_bandgap = RealDescriptor(key='Property Band gap', \n",
" lower_bound=0.0, \n",
" upper_bound=1e3, \n",
" units='eV')\n",
"dv_builder.add_descriptor(descriptor=desc_bandgap, \n",
" role='output')\n",
"\n",
"# Build the configuration once all the pieces are in place\n",
"view_config = dv_builder.build()"
Expand All @@ -138,7 +153,7 @@
"\n",
"[Back to ToC](#Table-of-contents)\n",
"\n",
"After obtaining your customized configuration, you have to initialize a [`DataViewsClient`](http://citrineinformatics.github.io/python-citrination-client/modules/views/data_views_client.html) instance in order to create a data view from the configuration you built. The `create()` method returns the ID for the data view, which you will need for subsequent analysis and retraining."
"After obtaining your customized configuration, you have to initialize a [`DataViewsClient`](http://citrineinformatics.github.io/python-citrination-client/modules/views/data_views_client.html) instance in order to create a data view from the configuration you built."
]
},
{
Expand All @@ -148,26 +163,48 @@
"outputs": [],
"source": [
"# Instantiate the base CitrinationClient\n",
"site = 'https://citrination.com' # site you want to access; we'll use the public site\n",
"client = CitrinationClient(api_key=os.environ.get('CITRINATION_API_KEY'), site=site)\n",
"site = 'https://citrination.com' # site you want to access; we'll use the public site\n",
"client = CitrinationClient(api_key=environ.get('CITRINATION_API_KEY'), \n",
" site=site)\n",
"\n",
"# Instantiate the DataViewsClient\n",
"views_client = client.data_views\n",
"views_client # reveal the methods"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `create()` method for the `DataViewClient` takes as input:\n",
"* `configuration`: A view configuration, like the template you created above.\n",
"* `name`: A name for the data view (must be unique among your data views).\n",
"* `description`: A description for the data view.\n",
"\n",
"and returns the ID for the data view, which you will need for subsequent analysis and retraining."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Create a data view using the above configuration and store the ID\n",
"view_name = 'PyCC View ' + str(uuid.uuid4()) # random name to avoid clashes\n",
"view_name = 'PyCC View ' + str(uuid4())[:6] # random name to avoid clashes\n",
"view_desc = 'This view was created by the PyCC API tutorial.'\n",
"view_id = views_client.create(configuration=view_config, name=view_name, description=view_desc)\n",
"view_id = views_client.create(configuration=view_config, \n",
" name=view_name, \n",
" description=view_desc)\n",
"print('Data view {} was successfully created.'.format(view_id))\n",
"print('It can be accessed at {}/data_views/{}.'.format(site, view_id))"
"print('It can be accessed at {}/data_views/{}'.format(site, view_id))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Clicking the above URL will take you to the data view you just created on your deployment of Citrination."
]
},
{
Expand Down Expand Up @@ -198,7 +235,7 @@
"metadata": {},
"source": [
"### Check status of services\n",
"If there's a lot of data, training might take some time, and you might want to check when `predict` services are ready. Other possible services include `experimental_design`, `data_reports`, and `model_reports`."
"If there's a lot of data, training might take some time, and you might want to check when certain services are ready. Possible services enabled by data views include `predict`, `experimental_design`, `data_reports`, and `model_reports`."
]
},
{
Expand All @@ -207,13 +244,19 @@
"metadata": {},
"outputs": [],
"source": [
"# Use a loop to monitor status\n",
"# Use a loop to monitor view status\n",
"while True:\n",
" predict_state = views_client.get_data_view_service_status(view_id).predict.reason\n",
" print(predict_state)\n",
" if predict_state == 'Predict services are ready.':\n",
" view_status = views_client.get_data_view_service_status(data_view_id=view_id)\n",
" \n",
" # Design and Predict are most important endpoints to check\n",
" if (view_status.experimental_design.ready and\n",
" view_status.predict.event.normalized_progress == 1.0):\n",
" print(\"Data view ready!\")\n",
" print(\"Data view URL: {}/data_views/{}\".format(site, view_id))\n",
" break\n",
" time.sleep(10)"
" else:\n",
" print(\"Waiting for data view services...\")\n",
" sleep(10)"
]
},
{
Expand Down Expand Up @@ -271,7 +314,7 @@
"To recap, this notebook went through the steps for creating a data view using the API.\n",
"1. First, we used the `DataViewBuilder` object to specify the configuration.\n",
"2. Then, we trained the model, which is simple as long as the configuration is correct.\n",
"3. Lastly, we explored some of the post-processing capabilities, such as retraining and submitting predictions."
"3. We showed how to monitor the status of various endpoints enabled by data views."
]
},
{
Expand Down
Loading