{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "RlHTqK9CeZlQ"
      },
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import tensorflow as tf\n",
        "from tensorflow import keras\n",
        "%matplotlib inline \n",
        "import matplotlib.pyplot as plt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GbiiGCT2epyL"
      },
      "source": [
        "## Higher derivates wrt multi-dimensional tensors\n",
        "Recall that if $ x \\in \\mathbb{R}^d$ and $y \\in \\mathbb{R}^D$, then $g := grad(y,x) \\in \\mathbb{R}^d$ with $g_i = \\sum_{j=1}^D \\frac{\\partial y_j}{\\partial x_i}$\n",
        "\n",
        "Let us consider the following example. Define two tensors\n",
        "$$x = \\begin{bmatrix}x_1\\\\x_2\\\\x_3\\end{bmatrix}= \\begin{bmatrix}2.0\\\\3.0\\\\4.0\\end{bmatrix}, \\quad y = \\begin{bmatrix}y_1\\\\y_2\\\\y_3\\end{bmatrix}= \\begin{bmatrix}-1.0\\\\2.0\\\\-4.0\\end{bmatrix}$$\n",
        "and the concatenated tensor\n",
        "$$\n",
        "z = \\begin{bmatrix}x_1 & y_1\\\\x_2 & y_2\\\\x_3 & y_3\\end{bmatrix} = \\begin{bmatrix}2.0 & -1.0 \\\\3.0 &2.0 \\\\4.0 & -4.0\\end{bmatrix}\n",
        "$$\n",
        "Let us define a scalar tensor\n",
        "$$u = 3 x \\cdot x + 2 x\\cdot y + y \\cdot y = \\sum_{i=1}^3(3x_i^2 + 2x_i y_i + y_i^2)$$\n",
        "Then the following is true for the gradient operations on $u$\n",
        "$$\n",
        "a := grad(u,x) = \\begin{bmatrix} \\frac{\\partial u}{\\partial x_1} \\\\ \\frac{\\partial u}{\\partial x_2} \\\\ \\frac{\\partial u}{\\partial x_3}\\end{bmatrix} =\n",
        "\\begin{bmatrix} 6x_1 + 2y_1 \\\\ 6x_2 + 2y_2 \\\\ 6x_3 + 2y_3\\end{bmatrix} = \\begin{bmatrix} 10 \\\\ 22 \\\\ 16\\end{bmatrix}\n",
        "$$  \n",
        "$$\n",
        "b := grad(a,x) = \\begin{bmatrix} \\sum_{j=1}^3 \\frac{\\partial a_j}{\\partial x_1} \\\\ \\sum_{j=1}^3 \\frac{\\partial a_j}{\\partial x_2} \\\\ \\sum_{j=1}^3 \\frac{\\partial a_j}{\\partial x_3}\\end{bmatrix} =\n",
        "\\begin{bmatrix} 6 \\\\ 6 \\\\ 6\\end{bmatrix}\n",
        "$$  \n",
        "$$\n",
        "c := grad(u,y) = \\begin{bmatrix} \\frac{\\partial u}{\\partial y_1} \\\\ \\frac{\\partial u}{\\partial y_2} \\\\ \\frac{\\partial u}{\\partial y_3}\\end{bmatrix} =\n",
        "\\begin{bmatrix} 2x_1 + 2y_1 \\\\ 2x_2 + 2y_2 \\\\ 2x_3 + 2y_3\\end{bmatrix} = \\begin{bmatrix} 2 \\\\ 10 \\\\ 0\\end{bmatrix}\n",
        "$$ \n",
        "$$\n",
        "d := grad(c,y) = \\begin{bmatrix} \\sum_{j=1}^3 \\frac{\\partial c_j}{\\partial y_1} \\\\ \\sum_{j=1}^3 \\frac{\\partial c_j}{\\partial y_2} \\\\ \\sum_{j=1}^3 \\frac{\\partial c_j}{\\partial y_3}\\end{bmatrix} =\n",
        "\\begin{bmatrix} 2 \\\\ 2 \\\\ 2\\end{bmatrix}\n",
        "$$ \n",
        "$$\n",
        "e := grad(u,z) = \\begin{bmatrix} \\frac{\\partial u}{\\partial z_{11}} & \\frac{\\partial u}{\\partial z_{12}} \\\\  \\frac{\\partial u}{\\partial z_{21}} & \\frac{\\partial u}{\\partial z_{22}} \\\\  \\frac{\\partial u}{\\partial z_{31}} & \\frac{\\partial u}{\\partial z_{32}}\\end{bmatrix} =\n",
        "\\begin{bmatrix} 6x_1 + 2y_1 & 2x_1 + 2y_1 \\\\ 6x_2 + 2y_2 & 2x_2 + 2y_2 \\\\ 6x_3 + 2y_3 & 2x_3 + 2y_3\\end{bmatrix} = \\begin{bmatrix} 10 &2 \\\\ 22 & 10 \\\\ 16 & 0\\end{bmatrix}\n",
        "$$ \n",
        "$$\n",
        "f := grad(e,z) = \\begin{bmatrix} \\sum_{i=1}^3\\sum_{j=1}^2\\frac{\\partial e_{ij}}{\\partial z_{11}} & \\sum_{i=1}^3\\sum_{j=1}^2\\frac{\\partial e_{ij}}{\\partial z_{12}} \\\\  \\sum_{i=1}^3\\sum_{j=1}^2\\frac{\\partial e_{ij}}{\\partial z_{21}} & \\sum_{i=1}^3\\sum_{j=1}^2\\frac{\\partial e_{ij}}{\\partial z_{22}} \\\\  \\sum_{i=1}^3\\sum_{j=1}^2\\frac{\\partial e_{ij}}{\\partial z_{31}} & \\sum_{i=1}^3\\sum_{j=1}^2\\frac{\\partial e_{ij}}{\\partial z_{32}}\\end{bmatrix} =\n",
        "\\begin{bmatrix} 8 & 4 \\\\ 8 & 4 \\\\8 & 4\\end{bmatrix}\n",
        "$$ \n",
        "**Note that the columns of $f$ are not equal $b$ or $d$. This is because the computation of $f$ also involved cross-derivative terms. This is demonstrated below**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "xWpCZ0Tlejz_",
        "outputId": "f658c4e3-4de7-4cf0-d26a-ac19dfb91189"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "x:  [[2.]\n",
            " [3.]\n",
            " [4.]]\n",
            "\n",
            "y:  [[-1.]\n",
            " [ 2.]\n",
            " [-4.]]\n",
            "\n",
            "z:  [[ 2. -1.]\n",
            " [ 3.  2.]\n",
            " [ 4. -4.]]\n",
            "\n",
            "u:  [[ 9.]\n",
            " [43.]\n",
            " [32.]]\n",
            "\n",
            "grad(u,x):  [[10.]\n",
            " [22.]\n",
            " [16.]]\n",
            "\n",
            "grad(dux,x):  [[6.]\n",
            " [6.]\n",
            " [6.]]\n",
            "\n",
            "grad(u,y):  [[ 2.]\n",
            " [10.]\n",
            " [ 0.]]\n",
            "\n",
            "grad(duy,y):  [[2.]\n",
            " [2.]\n",
            " [2.]]\n",
            "\n",
            "grad(u,z):  [[10.  2.]\n",
            " [22. 10.]\n",
            " [16.  0.]]\n",
            "\n",
            "grad(du,z):  [[8. 4.]\n",
            " [8. 4.]\n",
            " [8. 4.]]\n"
          ]
        }
      ],
      "source": [
        "x = tf.Variable([[2.0],[3.0],[4.0]])\n",
        "y = tf.Variable([[-1.0],[2.0],[-4.0]])\n",
        "z = tf.Variable(tf.concat((x,y),axis=1))\n",
        "\n",
        "print('\\nx: ',x.numpy())\n",
        "print('\\ny: ',y.numpy())\n",
        "print('\\nz: ',z.numpy())\n",
        "\n",
        "with tf.GradientTape(persistent=True) as t1:\n",
        "  with tf.GradientTape(persistent=True) as t2:\n",
        "    u = 3*x*x + 2*x*y + y*y\n",
        "  dux = t2.gradient(u,x)\n",
        "  duy = t2.gradient(u,y)\n",
        "d2ux = t1.gradient(dux,x)   \n",
        "d2uy = t1.gradient(duy,y)   \n",
        "\n",
        "del t1,t2\n",
        "\n",
        "print('\\nu: ',u.numpy())\n",
        "print('\\ngrad(u,x): ',dux.numpy())\n",
        "print('\\ngrad(dux,x): ',d2ux.numpy())\n",
        "print('\\ngrad(u,y): ',duy.numpy())\n",
        "print('\\ngrad(duy,y): ',d2uy.numpy())\n",
        "\n",
        "with tf.GradientTape() as t1:\n",
        "  with tf.GradientTape() as t2:\n",
        "    u = 3*z[:,0]*z[:,0] + 2*z[:,0]*z[:,1] + z[:,1]*z[:,1]\n",
        "  duz = t2.gradient(u,z)\n",
        "d2uz = t1.gradient(duz,z)    \n",
        "\n",
        "print('\\ngrad(u,z): ',duz.numpy())\n",
        "print('\\ngrad(du,z): ',d2uz.numpy())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Bd3KqR3aobdz"
      },
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "colab": {
      "collapsed_sections": [],
      "name": "MultiDimGardient.ipynb",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.8.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}