Add codecov API token, small fixes to tutorials (#475)

amrit110 · web-flow · commit 64482ab2201b · 2023-08-24T20:10:24.000-04:00
diff --git a/.github/workflows/docs_deploy.yml b/.github/workflows/docs_deploy.yml
@@ -45,6 +45,7 @@ jobs:
         with:
           action: codecov/codecov-action@v3.1.3
           with: |
+            token: ${{ secrets.CODECOV_TOKEN }}
             file: ./coverage.xml
             name: codecov-umbrella
             fail_ci_if_error: true
diff --git a/.github/workflows/integration_tests.yml b/.github/workflows/integration_tests.yml
@@ -56,6 +56,7 @@ jobs:
         with:
           action: codecov/codecov-action@v3.1.3
           with: |
+            token: ${{ secrets.CODECOV_TOKEN }}
             file: ./coverage.xml
             name: codecov-umbrella
             fail_ci_if_error: true
diff --git a/docs/source/intro.rst b/docs/source/intro.rst
@@ -1,13 +1,14 @@
-.. figure:: https://github.com/VectorInstitute/cyclops/blob/main/docs/source/theme/static/cyclops_logo-dark.png?raw=true
+.. figure::
+   https://github.com/VectorInstitute/cyclops/blob/main/docs/source/theme/static/cyclops_logo-dark.png?raw=true
    :alt: cyclops Logo
 
 --------------
 
 |PyPI| |code checks| |integration tests| |docs| |codecov| |docker|
 |license|
 
-``cyclops`` is a framework for facilitating research and deployment of
-ML models for healthcare. It provides a few high-level APIs namely:
+``cyclops`` is a toolkit for facilitating research and deployment of ML
+models for healthcare. It provides a few high-level APIs namely:
 
 -  ``query`` - Query EHR databases (such as MIMIC-IV)
 -  ``data`` - Create datasets for training, inference and evaluation. We
@@ -24,8 +25,8 @@ ML models for healthcare. It provides a few high-level APIs namely:
 
 -  ``evaluate`` - Evaluate models on clinical prediction tasks
 -  ``monitor`` - Detect dataset shift relevant for clinical use cases
--  ``report`` - Create `model
-   cards <https://vectorinstitute.github.io/cyclops/api/tutorials/mimiciii/model_card.html>`__
+-  ``report`` - Create `model report
+   cards <https://vectorinstitute.github.io/cyclops/api/tutorials/kaggle/model_card.html>`__
    for clinical ML models
 
 ``cyclops`` also provides a library of end-to-end use cases on clinical
diff --git a/docs/source/tutorials/kaggle/heart_failure_prediction.ipynb b/docs/source/tutorials/kaggle/heart_failure_prediction.ipynb
@@ -143,7 +143,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Sex values"
+    "### Sex values"
    ]
   },
   {
@@ -187,7 +187,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "####  Age distribution"
+    "###  Age distribution"
    ]
   },
   {
@@ -233,7 +233,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Outcome distribution"
+    "### Outcome distribution"
    ]
   },
   {
@@ -346,7 +346,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Identifying feature types\n",
+    "### Identifying feature types\n",
     "\n",
     "Cyclops `TabularFeatures` class helps to identify feature types, an essential step before preprocessing the data. Understanding feature types (numerical/categorical/binary) allows us to apply appropriate preprocessing steps for each type."
    ]
@@ -372,7 +372,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#### Creating data preprocessors\n",
+    "### Creating data preprocessors\n",
     "\n",
     "We create a data preprocessor using sklearn's ColumnTransformer. This helps in applying different preprocessing steps to different columns in the dataframe. For instance, binary features might be processed differently from numeric features."
    ]
diff --git a/docs/source/tutorials/synthea/los_prediction.ipynb b/docs/source/tutorials/synthea/los_prediction.ipynb
@@ -277,7 +277,7 @@
    "source": [
     "## Data Inspection and Preprocessing\n",
     "\n",
-    "#### Drop NaNs based on the `NAN_THRESHOLD`"
+    "### Drop NaNs based on the `NAN_THRESHOLD`"
    ]
   },
   {
@@ -348,7 +348,7 @@
    "id": "dc5b45cb-2406-4330-b2fc-3b4823ff0c17",
    "metadata": {},
    "source": [
-    "#### Length of stay distribution"
+    "### Length of stay distribution"
    ]
   },
   {
@@ -380,7 +380,7 @@
    "id": "05156094-56e8-49c5-8e3c-478a1797db62",
    "metadata": {},
    "source": [
-    "#### Outcome distribution"
+    "### Outcome distribution"
    ]
   },
   {
@@ -466,7 +466,7 @@
    "id": "e48376c2-a437-41f4-96fa-ea75f182f7b7",
    "metadata": {},
    "source": [
-    "#### Gender distribution"
+    "### Gender distribution"
    ]
   },
   {
@@ -518,7 +518,7 @@
     "tags": []
    },
    "source": [
-    "####  Age distribution"
+    "###  Age distribution"
    ]
   },
   {
@@ -570,7 +570,7 @@
    "id": "483c9bb5-57bf-4a2c-960f-35f7e76eff1d",
    "metadata": {},
    "source": [
-    "#### Identifying feature types\n",
+    "### Identifying feature types\n",
     "\n",
     "Cyclops `TabularFeatures` class helps to identify feature types, an essential step before preprocessing the data. Understanding feature types (numerical/categorical/binary) allows us to apply appropriate preprocessing steps for each type."
    ]
@@ -636,7 +636,7 @@
    "id": "a2738074-00be-46fa-999f-77f85add9469",
    "metadata": {},
    "source": [
-    "#### Creating data preprocessors\n",
+    "### Creating data preprocessors\n",
     "\n",
     "We create a data preprocessor using sklearn's ColumnTransformer. This helps in applying different preprocessing steps to different columns in the dataframe. For instance, binary features might be processed differently from numeric features."
    ]
diff --git a/docs/source/tutorials_use_cases.rst b/docs/source/tutorials_use_cases.rst
@@ -1,34 +1,40 @@
 Example use cases
 =================
 
-
-Binary classification using tabular data
-----------------------------------------
-
+Tabular data
+------------
 
 Kaggle Heart Failure Prediction
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This is a binary classification problem where the goal is to predict
-risk of heart disease. The `dataset <https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction>`_
+risk of heart disease. The `heart failure dataset <https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction>`_
 is available on Kaggle. The dataset contains 11 features and 1 target
 variable.
 
-
 .. toctree::
 
     tutorials/kaggle/heart_failure_prediction.ipynb
 
 
-Chest X-ray classification
---------------------------
+Synthea Prolonged Length of Stay Prediction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This is a binary classification problem where the goal is to predict
+whether a patient will have a prolonged length of stay in the hospital
+(more than 7 days). The `synthea dataset <https://github.com/synthetichealth/synthea>`_
+is generated using Synthea which is a synthetic patient generator. The dataset
+contains observations, medications and procedures as features.
+
+.. toctree::
 
-The `CXRClassificationTask` task is a multi-label classification task that predicts the
-presence of different thoracic diseases given a chest X-ray image.
+    tutorials/synthea/los_prediction.ipynb
 
+Image data
+----------
 
-NIH Chest X-ray dataset
-^^^^^^^^^^^^^^^^^^^^^^^
+NIH Chest X-ray classification
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 This tutorial showcases the use of the ``tasks`` API to implement a chest X-ray
 classification task. The dataset used is the `NIH Chest X-ray dataset <https://nihcc.app.box.com/v/ChestXray-NIHCC>`__, which contains 112,120 frontal-view X-ray images of 30,805 unique patients with 14 disease labels.
diff --git a/poetry.lock b/poetry.lock

Original file line number	Diff line number	Diff line change
`@@ -143,7 +143,7 @@`
`143`	`143`	`"cell_type": "markdown",`
`144`	`144`	`"metadata": {},`
`145`	`145`	`"source": [`
`146`		`- "#### Sex values"`
	`146`	`+ "### Sex values"`
`147`	`147`	`]`
`148`	`148`	`},`
`149`	`149`	`{`
`@@ -187,7 +187,7 @@`
`187`	`187`	`"cell_type": "markdown",`
`188`	`188`	`"metadata": {},`
`189`	`189`	`"source": [`
`190`		`- "#### Age distribution"`
	`190`	`+ "### Age distribution"`
`191`	`191`	`]`
`192`	`192`	`},`
`193`	`193`	`{`
`@@ -233,7 +233,7 @@`
`233`	`233`	`"cell_type": "markdown",`
`234`	`234`	`"metadata": {},`
`235`	`235`	`"source": [`
`236`		`- "#### Outcome distribution"`
	`236`	`+ "### Outcome distribution"`
`237`	`237`	`]`
`238`	`238`	`},`
`239`	`239`	`{`
`@@ -346,7 +346,7 @@`
`346`	`346`	`"cell_type": "markdown",`
`347`	`347`	`"metadata": {},`
`348`	`348`	`"source": [`
`349`		`- "#### Identifying feature types\n",`
	`349`	`+ "### Identifying feature types\n",`
`350`	`350`	`"\n",`
`351`	`351`	"Cyclops `TabularFeatures` class helps to identify feature types, an essential step before preprocessing the data. Understanding feature types (numerical/categorical/binary) allows us to apply appropriate preprocessing steps for each type."
`352`	`352`	`]`
`@@ -372,7 +372,7 @@`
`372`	`372`	`"cell_type": "markdown",`
`373`	`373`	`"metadata": {},`
`374`	`374`	`"source": [`
`375`		`- "#### Creating data preprocessors\n",`
	`375`	`+ "### Creating data preprocessors\n",`
`376`	`376`	`"\n",`
`377`	`377`	`"We create a data preprocessor using sklearn's ColumnTransformer. This helps in applying different preprocessing steps to different columns in the dataframe. For instance, binary features might be processed differently from numeric features."`
`378`	`378`	`]`
Original file line number	Diff line number	Diff line change
`@@ -277,7 +277,7 @@`
`277`	`277`	`"source": [`
`278`	`278`	`"## Data Inspection and Preprocessing\n",`
`279`	`279`	`"\n",`
`280`		- "#### Drop NaNs based on the `NAN_THRESHOLD`"
	`280`	+ "### Drop NaNs based on the `NAN_THRESHOLD`"
`281`	`281`	`]`
`282`	`282`	`},`
`283`	`283`	`{`
`@@ -348,7 +348,7 @@`
`348`	`348`	`"id": "dc5b45cb-2406-4330-b2fc-3b4823ff0c17",`
`349`	`349`	`"metadata": {},`
`350`	`350`	`"source": [`
`351`		`- "#### Length of stay distribution"`
	`351`	`+ "### Length of stay distribution"`
`352`	`352`	`]`
`353`	`353`	`},`
`354`	`354`	`{`
`@@ -380,7 +380,7 @@`
`380`	`380`	`"id": "05156094-56e8-49c5-8e3c-478a1797db62",`
`381`	`381`	`"metadata": {},`
`382`	`382`	`"source": [`
`383`		`- "#### Outcome distribution"`
	`383`	`+ "### Outcome distribution"`
`384`	`384`	`]`
`385`	`385`	`},`
`386`	`386`	`{`
`@@ -466,7 +466,7 @@`
`466`	`466`	`"id": "e48376c2-a437-41f4-96fa-ea75f182f7b7",`
`467`	`467`	`"metadata": {},`
`468`	`468`	`"source": [`
`469`		`- "#### Gender distribution"`
	`469`	`+ "### Gender distribution"`
`470`	`470`	`]`
`471`	`471`	`},`
`472`	`472`	`{`
`@@ -518,7 +518,7 @@`
`518`	`518`	`"tags": []`
`519`	`519`	`},`
`520`	`520`	`"source": [`
`521`		`- "#### Age distribution"`
	`521`	`+ "### Age distribution"`
`522`	`522`	`]`
`523`	`523`	`},`
`524`	`524`	`{`
`@@ -570,7 +570,7 @@`
`570`	`570`	`"id": "483c9bb5-57bf-4a2c-960f-35f7e76eff1d",`
`571`	`571`	`"metadata": {},`
`572`	`572`	`"source": [`
`573`		`- "#### Identifying feature types\n",`
	`573`	`+ "### Identifying feature types\n",`
`574`	`574`	`"\n",`
`575`	`575`	"Cyclops `TabularFeatures` class helps to identify feature types, an essential step before preprocessing the data. Understanding feature types (numerical/categorical/binary) allows us to apply appropriate preprocessing steps for each type."
`576`	`576`	`]`
`@@ -636,7 +636,7 @@`
`636`	`636`	`"id": "a2738074-00be-46fa-999f-77f85add9469",`
`637`	`637`	`"metadata": {},`
`638`	`638`	`"source": [`
`639`		`- "#### Creating data preprocessors\n",`
	`639`	`+ "### Creating data preprocessors\n",`
`640`	`640`	`"\n",`
`641`	`641`	`"We create a data preprocessor using sklearn's ColumnTransformer. This helps in applying different preprocessing steps to different columns in the dataframe. For instance, binary features might be processed differently from numeric features."`
`642`	`642`	`]`