pymc-devs
diff --git a/‎examples/samplers/fast_sampling_with_jax_and_numba.ipynb
Lines changed: 51 additions & 147 deletions b/‎examples/samplers/fast_sampling_with_jax_and_numba.ipynb
Lines changed: 51 additions & 147 deletions
@@ -27,7 +27,7 @@
     "pm.sample()\n",
     "```\n",
     "\n",
-    "The default PyMC sampler uses a Python-based NUTS implementation that provides maximum compatibility with all PyMC features. This sampler is always used when working with models that contain discrete variables, as it's the only option that supports non-gradient based samplers like Slice and Metropolis. While this sampler can compile the underlying model to different backends (C, Numba, or JAX) using the `compile_kwargs` parameter, it still maintains Python overhead that can limit performance for large models.\n",
+    "The default PyMC sampler uses a Python-based NUTS implementation that provides maximum compatibility with all PyMC features. This sampler is required when working with models that contain discrete variables, as it's the only option that supports non-gradient based samplers like Slice and Metropolis. While this sampler can compile the underlying model to different backends (C, Numba, or JAX) using PyTensor's compilation system via the `compile_kwargs` parameter, it maintains Python overhead that can limit performance for large models.\n",
     "\n",
     "### Nutpie Sampler\n",
     "\n",
@@ -37,7 +37,7 @@
     "pm.sample(nuts_sampler=\"nutpie\", nuts_sampler_kwargs={\"backend\": \"jax\", \"gradient_backend\": \"pytensor\"})\n",
     "```\n",
     "\n",
-    "Nutpie is on the cutting-edge of PyMC sampling performance. Written in Rust, it eliminates most Python overhead and provides exceptional performance for continuous models. The Numba backend typically offers the highest performance for most use cases, while the JAX backend excels with very large models and provides GPU acceleration capabilities. Nutpie is particularly well-suited for production workflows where sampling speed is critical.\n",
+    "Nutpie is PyMC's cutting-edge performance sampler. Written in Rust, it eliminates Python overhead and provides exceptional performance for continuous models. The Numba backend typically offers the highest performance for most use cases, while the JAX backend excels with very large models and provides GPU acceleration capabilities. Nutpie is particularly well-suited for production workflows where sampling speed is critical.\n",
     "\n",
     "### NumPyro Sampler\n",
     "\n",
@@ -47,33 +47,30 @@
     "pm.sample(nuts_sampler=\"numpyro\", nuts_sampler_kwargs={\"chain_method\": \"vectorized\"})\n",
     "```\n",
     "\n",
-    "NumPyro provides a mature JAX-based sampling implementation that integrates seamlessly with the broader JAX ecosystem. This sampler typically performs best with small to medium-sized models and offers excellent GPU support. NumPyro benefits from years of development within the JAX community and provides reliable performance characteristics, though it may have compilation overhead for very large models.\n",
+    "NumPyro provides a mature JAX-based sampling implementation that integrates seamlessly with the broader JAX ecosystem. This sampler benefits from years of development within the JAX community and provides reliable performance characteristics, with excellent GPU support for accelerated computation.\n",
     "\n",
     "### BlackJAX Sampler\n",
     "\n",
     "```python\n",
     "pm.sample(nuts_sampler=\"blackjax\")\n",
     "```\n",
     "\n",
-    "BlackJAX offers another JAX-based sampling implementation focused on flexibility and research applications. While it provides similar capabilities to NumPyro, it's less commonly used in production environments. BlackJAX can be valuable for experimental workflows or when specific JAX-based features are required that aren't available in other samplers.\n",
-    "\n",
+    "BlackJAX offers another JAX-based sampling implementation focused on flexibility and research applications. While it provides similar capabilities to NumPyro, it's less commonly used in production environments. BlackJAX can be valuable for experimental workflows or when specific JAX-based features are required."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "## Performance Guidelines\n",
     "\n",
     "Understanding when to use each sampler depends on several key factors including model size, variable types, and computational requirements.\n",
     "\n",
-    "**Model Size Considerations**\n",
-    "\n",
-    "For small models, NumPyro typically provides the best balance of performance and reliability. The compilation overhead is minimal, and the mature JAX implementation handles these models efficiently. Larger models often benefit from Nutpie with the Numba backend, which provides excellent performance without the memory overhead sometimes associated with JAX compilation.\n",
-    "\n",
-    "Large models generally perform best with either Nutpie's JAX backend or Nutpie's Numba backend. The choice between these depends on whether GPU acceleration is needed and how the model's computational graph interacts with each backend's optimization strategies.\n",
+    "For **small models**, NumPyro typically provides the best balance of performance and reliability. The compilation overhead is minimal, and its mature JAX implementation handles these models efficiently. **Large models** generally perform best with Nutpie's Numba backend for consistent CPU performance or Nutpie's JAX backend when GPU acceleration is needed or memory efficiency is critical.\n",
     "\n",
-    "**Variable Type Requirements**\n",
+    "Models containing **discrete variables** must use PyMC's built-in sampler, as it's the only implementation that supports compatible (*i.e.*, non-gradient based) sampling algorithms. For purely continuous models, all sampling backends are available, making performance the primary consideration.\n",
     "\n",
-    "Models containing discrete variables have no choice but to use PyMC's built-in sampler, as it's the only implementation that supports the necessary Slice and Metropolis sampling algorithms. For purely continuous models, all sampling backends are available, making performance the primary consideration.\n",
-    "\n",
-    "**Computational Backend Selection**\n",
-    "\n",
-    "Numba excels at CPU optimization and provides consistent performance across different model types. It's particularly effective for models with complex mathematical operations that benefit from just-in-time compilation. JAX offers superior performance for very large models and provides natural GPU acceleration, making it ideal when computational resources are a limiting factor. The traditional C backend serves as a reliable fallback option with broad compatibility but typically offers lower performance than the alternatives."
+    "**Numba** excels at CPU optimization and provides consistent performance across different model types. It's particularly effective for models with complex mathematical operations that benefit from just-in-time compilation. **JAX** offers superior performance for very large models and provides natural GPU acceleration, making it ideal when computational resources are a limiting factor. The **C** backend serves as a reliable fallback option with broad compatibility but typically offers lower performance than the alternatives."
    ]
   },
   {
@@ -90,11 +87,18 @@
     }
    ],
    "source": [
+    "import platform\n",
+    "\n",
     "import arviz as az\n",
     "import matplotlib.pyplot as plt\n",
     "import numpy as np\n",
     "import pymc as pm\n",
     "\n",
+    "if platform.system() == \"linux\":\n",
+    "    import multiprocessing\n",
+    "\n",
+    "    multiprocessing.set_start_method(\"spawn\", force=True)\n",
+    "\n",
     "rng = np.random.default_rng(seed=42)\n",
     "print(f\"Running on PyMC v{pm.__version__}\")"
    ]
@@ -228,55 +232,23 @@
      "text": [
       "Initializing NUTS using jitter+adapt_diag...\n",
       "Multiprocess sampling (4 chains in 4 jobs)\n",
-      "NUTS: [w, z]\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "c34297902e6f4d118f552495bdace798",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Output()"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "data": {
-      "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
-      ],
-      "text/plain": []
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.\n",
-      "The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n",
-      "The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details\n"
+      "NUTS: [w, z]\n",
+      "Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.\n"
      ]
     },
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 16 s, sys: 417 ms, total: 16.4 s\n",
-      "Wall time: 22.4 s\n"
+      "CPU times: user 7.8 s, sys: 375 ms, total: 8.17 s\n",
+      "Wall time: 13.8 s\n"
      ]
     }
    ],
    "source": [
     "%%time\n",
     "with PPCA:\n",
-    "    idata_pymc = pm.sample()"
+    "    idata_pymc = pm.sample(progressbar=False)"
    ]
   },
   {
@@ -295,8 +267,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 45.6 s, sys: 813 ms, total: 46.5 s\n",
-      "Wall time: 35.6 s\n"
+      "CPU times: user 42.3 s, sys: 798 ms, total: 43.1 s\n",
+      "Wall time: 32.2 s\n"
      ]
     }
    ],
@@ -324,8 +296,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 33.8 s, sys: 9.67 s, total: 43.5 s\n",
-      "Wall time: 16.9 s\n"
+      "CPU times: user 32.7 s, sys: 11.7 s, total: 44.4 s\n",
+      "Wall time: 17.1 s\n"
      ]
     }
    ],
@@ -363,8 +335,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 53.8 s, sys: 2.47 s, total: 56.3 s\n",
-      "Wall time: 44.3 s\n"
+      "CPU times: user 53.1 s, sys: 2.65 s, total: 55.8 s\n",
+      "Wall time: 43.4 s\n"
      ]
     }
    ],
@@ -409,60 +381,24 @@
       "Multiprocess sampling (4 chains in 4 jobs)\n",
       "NUTS: [w, z]\n",
       "/var/home/fonnesbeck/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n",
-      "  self.pid = os.fork()\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "715185d8daef43cdaed775149ca32369",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Output()"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
+      "  self.pid = os.fork()\n",
       "/var/home/fonnesbeck/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n",
-      "  self.pid = os.fork()\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
-      ],
-      "text/plain": []
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.\n",
+      "  self.pid = os.fork()\n",
+      "Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 6 seconds.\n",
       "The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n",
       "The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details\n"
      ]
     }
    ],
    "source": [
     "with PPCA:\n",
-    "    idata_c = pm.sample(nuts_sampler=\"pymc\", compile_kwargs={\"mode\": \"fast_run\"})\n",
+    "    idata_c = pm.sample(nuts_sampler=\"pymc\", compile_kwargs={\"mode\": \"fast_run\"}, progressbar=False)\n",
     "\n",
     "# with PPCA:\n",
-    "#     idata_pymc_numba = pm.sample(nuts_sampler=\"pymc\", compile_kwargs={\"mode\": \"numba\"})\n",
+    "#     idata_pymc_numba = pm.sample(nuts_sampler=\"pymc\", compile_kwargs={\"mode\": \"numba\"}, progressbar=False)\n",
     "\n",
     "# with PPCA:\n",
-    "#     idata_pymc_jax = pm.sample(nuts_sampler=\"pymc\", compile_kwargs={\"mode\": \"jax\"})"
+    "#     idata_pymc_jax = pm.sample(nuts_sampler=\"pymc\", compile_kwargs={\"mode\": \"jax\"}, progressbar=False)"
    ]
   },
   {
@@ -495,47 +431,11 @@
       ">BinaryGibbsMetropolis: [cluster]\n",
       ">NUTS: [mu, sigma]\n",
       "/var/home/fonnesbeck/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n",
-      "  self.pid = os.fork()\n"
-     ]
-    },
-    {
-     "data": {
-      "application/vnd.jupyter.widget-view+json": {
-       "model_id": "0d08b347ee5d43dca776a9844f714ae6",
-       "version_major": 2,
-       "version_minor": 0
-      },
-      "text/plain": [
-       "Output()"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
+      "  self.pid = os.fork()\n",
       "/var/home/fonnesbeck/repos/pymc-examples/.pixi/envs/default/lib/python3.12/multiprocessing/popen_fork.py:66: RuntimeWarning: os.fork() was called. os.fork() is incompatible with multithreaded code, and JAX is multithreaded, so this will likely lead to a deadlock.\n",
-      "  self.pid = os.fork()\n"
-     ]
-    },
-    {
-     "data": {
-      "text/html": [
-       "<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\"></pre>\n"
-      ],
-      "text/plain": []
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
+      "  self.pid = os.fork()\n",
       "Sampling 4 chains for 1_000 tune and 1_000 draw iterations (4_000 + 4_000 draws total) took 5 seconds.\n",
-      "There were 19 divergences after tuning. Increase `target_accept` or reparameterize.\n",
+      "There were 9 divergences after tuning. Increase `target_accept` or reparameterize.\n",
       "The rhat statistic is larger than 1.01 for some parameters. This indicates problems during sampling. See https://arxiv.org/abs/1903.08008 for details\n",
       "The effective sample size per chain is smaller than 100 for some parameters.  A higher number is needed for reliable rhat and ess computation. See https://arxiv.org/abs/1903.08008 for details\n"
      ]
@@ -548,7 +448,7 @@
     "    sigma = pm.HalfNormal(\"sigma\", 1, shape=2)\n",
     "    obs = pm.Normal(\"obs\", mu=mu[cluster], sigma=sigma[cluster], observed=rng.normal(0, 1, 100))\n",
     "\n",
-    "    trace_discrete = pm.sample()"
+    "    trace_discrete = pm.sample(progressbar=False)"
    ]
   },
   {
@@ -570,20 +470,24 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Last updated: Sat May 24 2025\n",
+      "Last updated: Mon May 26 2025\n",
       "\n",
       "Python implementation: CPython\n",
       "Python version       : 3.12.10\n",
       "IPython version      : 9.2.0\n",
       "\n",
       "pytensor: 2.30.3\n",
-      "aeppl   : not installed\n",
-      "xarray  : 2025.4.0\n",
+      "arviz   : 0.21.0\n",
+      "pymc    : 5.22.0\n",
+      "numpyro : 0.18.0\n",
+      "blackjax: 0.0.0\n",
+      "nutpie  : 0.14.3\n",
       "\n",
-      "numpy     : 2.2.6\n",
       "pymc      : 5.22.0\n",
-      "matplotlib: 3.10.3\n",
       "arviz     : 0.21.0\n",
+      "platform  : 1.0.8\n",
+      "numpy     : 2.2.6\n",
+      "matplotlib: 3.10.3\n",
       "\n",
       "Watermark: 2.5.0\n",
       "\n"
@@ -592,7 +496,7 @@
    ],
    "source": [
     "%load_ext watermark\n",
-    "%watermark -n -u -v -iv -w -p pytensor,aeppl,xarray"
+    "%watermark -n -u -v -iv -w -p pytensor,arviz,pymc,numpyro,blackjax,nutpie"
    ]
   },
   {