PacktPublishing
diff --git a/‎Chapter08/8.04 Demystifying GAN Loss Function.ipynb
Lines changed: 333 additions & 0 deletions b/‎Chapter08/8.04 Demystifying GAN Loss Function.ipynb
Lines changed: 333 additions & 0 deletions
@@ -0,0 +1,333 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Demystifying GAN Loss Function"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that we have understood how GANs work in detail, we will examine the loss function of GAN. Before going ahead let us recap the notations. \n",
+    "\n",
+    "* A noise which is fed as an input to the generator is represented by $z$ \n",
+    "\n",
+    "* Uniform or normal distribution from which the noise $z$ is sampled is represented by $p_z$\n",
+    "\n",
+    "* An input image is represented by $x$\n",
+    "\n",
+    "* Real data distribution i.e distribution of our training set is represented by $p_r$\n",
+    "\n",
+    "* Fake data distribution i.e distribution of the generator is represented by $p_g$\n",
+    "\n",
+    "When we write, $x \\sim p_{r}(x)$ , it implies that image $x$ is sampled from the real distribution $p_r$\n",
+    ". Similarly, $x \\sim p_{g}(x)$ denotes that image $x$ is sampled from the generator\n",
+    "distribution $p_g$  and $z \\sim p_{z}(z)$ implies that the generator input $z$ is sampled from the\n",
+    "uniform distribution $p_z$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As we learned that both the generator and discriminator are neural networks and both of\n",
+    "them update their parameters through backpropagation. We need to find the\n",
+    "optimal generator parameter $\\theta_g $ and discriminator parameter $\\theta_d$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Discriminator Loss "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we will see the loss function of the discriminator. We know that the goal of the\n",
+    "discriminator is to classify whether the image is real or fake image. Let us denote\n",
+    "discriminator by $D$.\n",
+    "\n",
+    "The loss function of the discriminator is given as, \n",
+    "\n",
+    "$$\\max _{d} L(D, G)=\\mathbb{E}_{x \\sim p_{r}(x)}\\left[\\log D\\left(x ; \\theta_{d}\\right)\\right]+\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "What does this mean though? Let us see each term by term. \n",
+    "\n",
+    "### First term\n",
+    "\n",
+    "Let us look at the first term,\n",
+    "\n",
+    "$$ \\mathbb{E}_{x \\sim p_{r}} \\log (D(x))$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* $x \\sim p_{r}(x)$  implies we are sampling input $x$ from the real data distribution $p_r$, so $x$ is a\n",
+    "real image. \n",
+    "\n",
+    "* $D(x)$ implies that we are feeding the input image $x$ to the discriminator $D$ and it will\n",
+    "return the probability of input image $x$ to be a real image. \n",
+    "\n",
+    "Since we know that $x$ is a real image i.e from a real data distribution, we need to maximize the probability of $D(x)$:\n",
+    "\n",
+    "$$\\max D(x)$$\n",
+    "\n",
+    "But instead of maximizing raw probabilities we maximize log probabilities as we learned in\n",
+    "chapter 7, we can write, \n",
+    "\n",
+    "$$ \\max \\log D(x)$$\n",
+    "\n",
+    "So our final equation becomes:\n",
+    "\n",
+    "$$\\max \\mathbb{E}_{x \\sim p_{r}(x)}[\\log D(x)]$$\n",
+    "\n",
+    "__$\\mathbb{E}_{x \\sim p_{r}(x)}[\\log D(x)]$ implies the expectations of the log likelihood of\n",
+    "input images sampled from the real data distribution being real.__"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Second term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let us look at the second term\n",
+    "\n",
+    "$$\\mathbb{E}_{z \\sim p_{(z)}}[\\log (1-D(G(z)))] $$\n",
+    "\n",
+    "\n",
+    "* $z \\sim p_{z}(z)$ implies we are sampling a random noise $z$ from the uniform distribution $p_z$.\n",
+    "\n",
+    "* $G(z)$ implies that the generator $G$ takes the random noise $z$ as an input and returns an\n",
+    "image based on its implicitly learned distribution $p_g$.\n",
+    "\n",
+    "* $D(G(z))$  implies we are feeding the image generated by the generator to the\n",
+    "discriminator $D$ and it will return the probability that input image to be a real image. \n",
+    "\n",
+    "\n",
+    "If we subtract 1 from $D(G(z))$  then it will return the probability of the input image being\n",
+    "a fake image.\n",
+    "\n",
+    "$$1-D(G(z))$$\n",
+    "\n",
+    "Since we know $z$ is not a real image, the discriminator will maximize this probability,\n",
+    "ie discriminator maximizes the probability $z$ of being classified as a fake image. So we write\n",
+    "\n",
+    "$\\max 1-D(G(z))$\n",
+    "\n",
+    "Instead of maximizing raw probabilities, we maximize the log probability, so we write,\n",
+    "\n",
+    "$$ \\max \\log (1-D(G(z)))$$\n",
+    "\n",
+    "__$\\mathbb{E}_{z \\sim p_{z}(z)}[\\log (1-D(G(z)))]_{\\mathrm{i}}$  implies the expectations i.e expectations of the log\n",
+    "likelihood of input images generated by the generator being fake.__"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Final term"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So, combining these two terms, loss function of the discriminator is given as,\n",
+    "\n",
+    "$$ \\max _{d} L(D, G)=\\mathbb{E}_{x \\sim p_{r}(x)}\\left[\\log D\\left(x ; \\theta_{d}\\right)\\right]+\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$\n",
+    "\n",
+    "Where $\\theta_d$ and $\\theta_g$ are the parameters of the discriminator and generator network\n",
+    "respectively"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Generator loss\n",
+    "\n",
+    "The loss function of the generator can be given as,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$ \\min _{g} L(D, G)=\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$\n",
+    "\n",
+    "We know that the goal of the generator is to fool the discriminator to classify the fake image\n",
+    "as a real image. \n",
+    "\n",
+    "In the previous section, we saw, $\\mathbb{E}_{z \\sim p_{z}(z)}[\\log (1-D(G(z)))]_{\\mathrm{}}$    implies the probability of classifying the input image as a\n",
+    "fake image and the discriminator maximizes this probabilities for correctly classifying the\n",
+    "fake image as fake. \n",
+    "\n",
+    "\n",
+    "But the generator wants to minimize this probability. As the generator wants to fool the\n",
+    "discriminator, it minimizes this probability of input image being classified as fake. The loss\n",
+    "function of the generator can be given as,\n",
+    "\n",
+    "$$\\min _{g} L(D, G)=\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Total Loss\n",
+    "\n",
+    "\n",
+    "We just learned the loss function of generator and discriminator, combining these two\n",
+    "losses, we write our final loss function can be written as,"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "$$ \\min _{G} \\max _{D} L(D, G)=\\mathbb{E}_{x \\sim p_{r}(x)}[\\log D(x)]+\\mathbb{E}_{z \\sim p_{z}(z)}[\\log (1-D(G(z)))]$$\n",
+    "\n",
+    "\n",
+    "So our objective function is basically a min-max objective function i.e maximization for the\n",
+    "discriminator and minimization for the generator and we find the optimal generator\n",
+    "parameter $\\theta_g$ and discriminator parameter $\\theta_d$ through backpropagating the respective\n",
+    "networks.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "So we perform gradient ascent i.e maximization on the discriminator and update the discriminator parameter $\\theta_d$:\n",
+    "   \n",
+    "   $$ \\nabla_{\\theta_{d}} \\frac{1}{m} \\sum_{i=1}^{m}\\left[\\log D\\left(\\boldsymbol{x}^{(i)}\\right)+\\log \\left(1-D\\left(G\\left(\\boldsymbol{z}^{(i)}\\right)\\right)\\right)\\right]$$\n",
+    "   \n",
+    "   \n",
+    "And gradient descent i.e minimization on the generator and update the generator parameter $\\theta_g$:\n",
+    "\n",
+    "$$\\nabla_{\\theta_{g}} \\frac{1}{m} \\sum_{i=1}^{m} \\log \\left(1-D\\left(G\\left(\\boldsymbol{z}^{(i)}\\right)\\right)\\right)$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "However, optimizing the above generator objective does not work properly and causes a\n",
+    "stability issue. So we introduce a new form of loss called heuristic loss. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Heuristic Loss\n",
+    "\n",
+    "There is no change in the loss function of the discriminator it is written as,\n",
+    "\n",
+    "$$ \\max _{d} L(D, G)=\\mathbb{E}_{x \\sim p_{r}(x)}\\left[\\log D\\left(x ; \\theta_{d}\\right)\\right]+\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, let us look at the generator loss, \n",
+    "\n",
+    "$$ \\min _{g} L(D, G)=\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right] $$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Can we change it to a maximizing equation just like our discriminators? How can we do\n",
+    "that? We know that $ 1-D(G(Z)$ returns the probability of input image being fake and\n",
+    "generator is minimizing this probability. \n",
+    "\n",
+    "\n",
+    "Instead of doing this, we can write $D(G(z))$ it implies the probability of input image\n",
+    "being real and now our generator can maximize this probability. It implies a generator is\n",
+    "maxing the probability of the input fake image being classified as a real image. So the loss\n",
+    "function of our generator now becomes,\n",
+    "\n",
+    "$$\\max _{g} L(D, G)=\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$\n",
+    "\n",
+    "So, now we have both the loss function of our discriminator and generator as maximizing\n",
+    "terms i.e,\n",
+    "\n",
+    "$$\\max _{d} L(D, G)=\\mathbb{E}_{x \\sim p_{r}(x)}\\left[\\log D\\left(x ; \\theta_{d}\\right)\\right]+\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(1-D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$\n",
+    "\n",
+    "$$\\max _{g} L(D, G)=\\mathbb{E}_{z \\sim p_{z}(z)}\\left[\\log \\left(D\\left(G\\left(z ; \\theta_{g}\\right) ; \\theta_{d}\\right)\\right)\\right]$$\n",
+    "\n",
+    "\n",
+    "\n",
+    "But instead of maximizing, if we can minimize the loss then we can apply our favorite\n",
+    "gradient descent algorithms. Now how can we convert our maximizing problem into a\n",
+    "minimization problem? It;'s so simple, just add a negative sign.\n",
+    "So, our final loss function for the discriminator is given as,\n",
+    "\n",
+    "\n",
+    "$$ \\boxed{L^{D}=-\\mathbb{E}_{x \\sim p_{r}(x)}[\\log D(x)]-\\mathbb{E}_{z \\sim p_{z}(z)}[\\log (1-D(G(z))]}$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "and the generator loss is,\n",
+    "\n",
+    "$$ \\boxed{L^{G}=-\\mathbb{E}_{z \\sim p_{z}(z)}[\\log (D(G(z)))]}$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In the next section, we will learn how to use GAN to generate images of handwritten digits. "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 2",
+   "language": "python",
+   "name": "python2"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}