Sunday, September 18, 2022

Complex Derivatives of Functions of Several Variables

Introduction

Frequently, in the field of signal processing, the most appropriate interpretation of the data is given by writing it in terms of complex numbers ($\mathbb{C}$). The common operations with different signals can be simplified with the rules of addition and multiplication in $\mathbb{C}$, namely:
$$[a_1,\;a_2] + [b_1,\;b_2] = [a_1+b_1,\;a_2+b_2]$$
and
$$[a_1,\;a_2] \cdot [b_1,\;b_2] = [a_1b_1-a_2b_2,\;a_1b_2+a_2b_1]$$
so that you can simply write $a+b$ and $a\cdot b$ instead, where $a=[a_1,\;a_2]$ and $b=[b_1,\;b_2]$.

Many textbooks on the field of signal processing present the algorithms separately into their real and complex forms, while some don't even present the complex form of some algorithms (e.g. [1]). The conversion of an algorithm from the complex form to the real one is usually trivial, but the other way around usually isn't. That shows the importance of having an adequate definition of derivatives of functions $f:\mathbb{C}^N\to\mathbb{C}^M$ that is the closest possible analogue of the real derivative.

In the literature of complex analysis, it's hard to find a good reference for this problem, because most textbooks don't develop the study of Wirtinger derivatives of non-analytical functions (e.g. [2]), even though there are many applications of such functions, such as complex least squares problems. In Wikipedia, we find the following quote:
Despite their ubiquitous use, it seems that there is no text listing all the properties of Wirtinger derivatives: however, fairly complete references are the short course on multidimensional complex analysis by Andreotti (1976, pp. 3–5), the monograph of Gunning & Rossi (1965, pp. 3–6), and the monograph of Kaup & Kaup (1983, p. 2,4) which are used as general references in this and the following sections.

https://en.wikipedia.org/w/index.php?title=Wirtinger_derivatives&oldid=692448620&#Formal_definition (Permalink)
In this post, we present rigorous definitions of derivatives of functions $f:\mathbb{C}^N\to\mathbb{C}^M$ that apply to certain non-analytical functions, with countless practical applications. Our definitions, apart from transposition operations, are, in essence, the same as the ones given in [3], with the advantage of simplifying the analogy between the real and the complex cases.


Wirtinger Derivatives

Overview

The real line gives us only two directions to choose relatively to a particular point. If one wants to calculate the derivative of a real function $f$ at $x=x_0$, both limits
$$\lim_{x \to x_0^+} \frac{f(x)-f(x_0)}{x-x_0}$$
and
$$\lim_{x \to x_0^-} \frac{f(x)-f(x_0)}{x-x_0}$$
must exist and yield the same result. In the complex plane, however, there are infinitely many possible directions to choose. The traditional way to define derivatives of complex functions requires that all limits, from all directions, give the same result. More precisely:
$$\frac{df}{dz}(z_0) = \lim_{\|z-z_0\|\to 0} \frac{f(z)-f(z_0)}{z-z_0}.$$
This is a very restrictive definition, but with incredibly powerful results:
In mathematics, holomorphic functions are the central objects of study in complex analysis. A holomorphic function is a complex-valued function of one or more complex variables that is complex differentiable in a neighborhood of every point in its domain. The existence of a complex derivative in a neighborhood is a very strong condition, for it implies that any holomorphic function is actually infinitely differentiable and equal to its own Taylor series.

The term analytic function is often used interchangeably with "holomorphic function", although the word "analytic" is also used in a broader sense to describe any function (real, complex, or of more general type) that can be written as a convergent power series in a neighborhood of each point in its domain. The fact that all holomorphic functions are complex analytic functions, and vice versa, is a major theorem in complex analysis.

https://en.wikipedia.org/w/index.php?title=Holomorphic_function&oldid=689604303 (Permalink)
In the literature of signal processing, a common definition of derivative of uni-dimensional complex functions ($f:\mathbb{C}\to\mathbb{C}$) is the Wirtinger derivative, which will be described later in this section. In practice, it's usually cumbersome to directly apply the definition of this derivative. Instead, an alternative way of calculating it is usually presented in the literature, involving the partial derivatives $\partial f/\partial z$ and $\partial f/\partial \bar{z}$ (where $\bar{z}$ is the complex conjugate of $z$). But a crucial problem is overlooked by all textbooks I came across: this definition is, in principle, ambiguous. We could interpret the expression $\|z\|^2$, for example, as a function of $z$ alone or as a function of both $z$ and $\bar{z}$ if we write it as $z\bar{z}$. Furthermore, the classical definition of partial derivatives is only valid for independent variables. We can't simply vary $z$ without automatically varying $\bar{z}$ at the same time. This problem seems to be ignored by even the most rigorous sources I could find (e.g., [4]). In this section, we fill in this gap by giving a mathematically rigorous treatment to those definitions. But in order to do so, some preliminary results need to be presented.

Preliminary Results

Theorem 1

(Mean Value Theorem for Complex Functions) Suppose that $\Omega\subseteq\mathbb{C}$ is an open convex set, that $f:\Omega\to\mathbb{C}$ is an analytical function in $\Omega$ and that $a$ and $b$ are distinct points in $\Omega$. It follows that there are points $u$ and $v$ in $L_{a,b}$ (the open segment between $a$ and $b$) such that:
\begin{align}\Re\left\{\frac{f(a)-f(b)}{a-b}\right\} &= \Re\{f'(u)\}\\\\\Im\left\{\frac{f(a)-f(b)}{a-b}\right\} &= \Im\{f'(v)\}\end{align}
where $\Re$ and $\Im$ indicate the real and imaginary parts of a complex number, respectively, and $f'$ is the derivative of $f$.

Proof: See [5].

Theorem 2

(Hartogs' Theorem) If a complex function $f:\mathbb{C}^N\to\mathbb{C}$ is holomorphic in each of its variables separately in an open domain $D\subseteq\mathbb{C}^N$, then $f$ is holomorphic in $D$.

Proof: See [6].

Theorem 3

Let $f\left(z_1,z_2\right):\mathbb{C}\times\mathbb{C}\rightarrow\mathbb{C}$ such that, $\forall a,b,z\in\mathbb{C}$, the functions $f_{1a}(z)\triangleq f(a,z)$ and $f_{2b}(z)\triangleq f(z,b)$ are holomorphic. Suppose that $z_1$ e $z_2$ are $\mathbb{R}\times\mathbb{R}\rightarrow\mathbb{C}$ and $C^1$ functions in $\mathbb{C}$, with arguments $x$ and $y$. It follows that the partial derivatives $\partial f/\partial x$ and $\partial f/\partial y$ exist in $\mathbb{C}$.

Proof: We start by showing that $\partial f/\partial x$ exists. The proof for $\partial f/\partial y$ is analogous. To do so, we first show that the following limit exists:
$$\lim_{\Delta x\to 0} \frac{f\left(z_1+\Delta z_1,z_2+\Delta z_2\right) - f\left(z_1,z_2\right)}{\Delta x} \label{lim}\tag{1}$$
where $z_1 \triangleq z_1\left(x, y\right)$, $z_2 \triangleq z_2\left(x, y\right)$ and
\begin{align}\Delta z_1 &\triangleq z_1\left(x+\Delta x, y\right)-z_1 \\\\ \Delta z_2 &\triangleq z_2\left(x+\Delta x, y\right)-z_2.\end{align}

We can rewrite the numerator of (\ref{lim}) as:
$$\begin{gather}f\left(z_1+\Delta z_1,z_2+\Delta z_2\right) - f\left(z_1,z_2+\Delta z_2\right) \\\\ + f\left(z_1,z_2+\Delta z_2\right) - f\left(z_1,z_2\right).\end{gather}$$

Thus we can complete the proof by showing that the following limits exist:
\begin{gather}  \lim_{\Delta x\to 0} \frac{f\left(z_1+\Delta z_1,z_2+\Delta z_2\right) - f\left(z_1,z_2+\Delta z_2\right)}{\Delta x} \\\\  \lim_{\Delta x\to 0} \frac{f\left(z_1,z_2+\Delta z_2\right) - f\left(z_1,z_2\right)}{\Delta x}. \end{gather}

Defining the expressions inside the limits above as $L_1(\Delta x)$ and $L_2(\Delta x)$, respectively, after some algebraic manipulations we can apply Theorem 1 to show that there are points $u_1$ and $v_1$ between $z_1$ and $z_1+\Delta z_1$ and points $u_2$ and $v_2$ between $z_2$ and $z_2+\Delta z_2$ such that:

\begin{align} L_1(\Delta x) = &\Re\left\{\dfrac{\partial f}{\partial z_1}\left(u_1,z_2+\Delta z_2\right)\right\}\frac{\Delta z_1}{\Delta x} \\\\ &+ i\Im\left\{\dfrac{\partial f}{\partial z_1}\left(v_1,z_2+\Delta z_2\right)\right\}\frac{\Delta z_1}{\Delta x} \label{L1}\tag{2} \\\\ L_2(\Delta x) = &\Re\left\{\dfrac{\partial f}{\partial z_2}\left(z_1,u_2\right)\right\}\frac{\Delta z_2}{\Delta x} \\\\ &+ i\Im\left\{\dfrac{\partial f}{\partial z_2}\left(z_1,v_2\right)\right\}\frac{\Delta z_2}{\Delta x} \label{L2}\tag{3}.\end{align}

Now we proceed to show that the limits of the partial derivatives in the expressions above exist. Given that $z_1$ and $z_2$ are $C^1$, the limits of $\Delta z_1/\Delta x$ and $\Delta z_2/\Delta x$ when $\Delta x$ tends to zero exist and equal $\partial z_1/\partial x$ and $\partial z_2/\partial x$, respectively. We also note that, by the squeeze theorem, we have:
\begin{align} \lim_{\Delta x\to 0} u_1 &= \lim_{\Delta x\to 0} v_1 = z_1 \\\\ \lim_{\Delta x\to 0} u_2 &= \lim_{\Delta x\to 0} v_2 = z_2. \end{align}

By Theorem 2, the partial derivatives of $f$ are continuous. Consequently, the limits of the partial derivatives in (\ref{L1}) and (\ref{L2}) exist and are given by:
\begin{align} \lim_{\Delta x\to 0} &\dfrac{\partial f}{\partial z_1}\left(u_1,z_2+\Delta z_2\right) = \dfrac{\partial f}{\partial z_1}\left(z_1,z_2\right) \\\\  \lim_{\Delta x\to 0} &\dfrac{\partial f}{\partial z_1}\left(v_1,z_2+\Delta z_2\right) = \dfrac{\partial f}{\partial z_1}\left(z_1,z_2\right) \\\\  \lim_{\Delta x\to 0} &\dfrac{\partial f}{\partial z_2}\left(z_1,u_2\right) = \dfrac{\partial f}{\partial z_2}\left(z_1,z_2\right) \\\\  \lim_{\Delta x\to 0} &\dfrac{\partial f}{\partial z_2}\left(z_1,v_2\right) = \dfrac{\partial f}{\partial z_2}\left(z_1,z_2\right).\end{align}
$$\tag*{$\Box$}$$

Theorem 4

Let $f,g:\mathbb{C}\times\mathbb{C}\rightarrow\mathbb{C}$ such that, $\forall a,b,z\in\mathbb{C}$, the functions $f_{1a}(z)\triangleq f(a,z)$, $f_{2b}(z)\triangleq f(z,b)$, $g_{1a}(z)\triangleq g(a,z)$ and $g_{2b}(z)\triangleq g(z,b)$ are holomorphic. Furthermore, suppose that $\forall z\in\mathbb{C}, f(z,\bar{z})=g(z,\bar{z})$. Then:
$$ \left.\begin{bmatrix}\dfrac{\partial f}{\partial a}(a,b) \\[2.2ex] \dfrac{\partial f}{\partial b}(a,b)\end{bmatrix}\right\rvert_{\begin{aligned}a&=z \\ b&=\bar{z}\end{aligned}}
  = \left.\begin{bmatrix}\dfrac{\partial g}{\partial a}(a,b) \\[2.2ex] \dfrac{\partial g}{\partial b}(a,b)\end{bmatrix}\right\rvert_{\begin{aligned}a&=z \\ b&=\bar{z}\end{aligned}} $$

Proof: Let $x = \Re\{z\}$ be the real part of $z$ and $y = \Im\{z\}$ the imaginary part of $z$. Define:
\begin{align} z_1 &\triangleq x + iy \label{z1}\tag{4} \\\\  z_2 &\triangleq \bar{z_1} = x - iy \label{z2}\tag{5}\end{align}

Note that $f$, $z_1$ and $z_2$ obey the conditions of Theorem 3. Hence, we can differentiate $f(z_1,z_2)$ with respect to $x$ and $y$, which gives:
\begin{align} \dfrac{\partial f}{\partial x} &= \dfrac{\partial f}{\partial z_1}\dfrac{\partial z_1}{\partial x} + \dfrac{\partial f}{\partial z_2}\dfrac{\partial z_2}{\partial x} \label{dfdx}\tag{6} \\\\ \dfrac{\partial f}{\partial y} &= \dfrac{\partial f}{\partial z_1}\dfrac{\partial z_1}{\partial y} + \dfrac{\partial f}{\partial z_2}\dfrac{\partial z_2}{\partial y} \label{dfdy}\tag{7} \end{align}
where:
\begin{align} \dfrac{\partial f}{\partial z_1} &\triangleq \dfrac{df_{2z_2}}{dz_1}(z_1) = \left.\dfrac{\partial f}{\partial a}(a,b)\right\rvert_{\begin{aligned}a&=z_1 \\ b&=z_2\end{aligned}} \label{defPartial1}\tag{8} \\\\ \dfrac{\partial f}{\partial z_2} &\triangleq \dfrac{df_{1z_1}}{dz_2}(z_2) = \left.\dfrac{\partial f}{\partial b}(a,b)\right\rvert_{\begin{aligned}a&=z_1 \\ b&=z_2\end{aligned}} \label{defPartial2}\tag{9} \end{align}

From equations (\ref{z1}) and (\ref{z2}), we have:
\begin{align} \dfrac{\partial z_1}{\partial x} &= \dfrac{\partial z_2}{\partial x} = 1 \\\\ \dfrac{\partial z_1}{\partial y} &= i \\\\ \dfrac{\partial z_2}{\partial y} &= -i \end{align}

Substituting that into equations (\ref{dfdx}) and (\ref{dfdy}), we get:
\begin{align} \dfrac{\partial f}{\partial x} &= \dfrac{\partial f}{\partial z_1} + \dfrac{\partial f}{\partial z_2} \\\\ \dfrac{\partial f}{\partial y} &= i\dfrac{\partial f}{\partial z_1} - i\dfrac{\partial f}{\partial z_2} \end{align}

Solving the system above for $\partial f/\partial z_1$ and $\partial f/\partial z_2$ yields:
\begin{align} \dfrac{\partial f}{\partial z_1} &= \frac{1}{2}\left(\dfrac{\partial f}{\partial x}-i\dfrac{\partial f}{\partial y}\right) \label{wirtinger1}\tag{10} \\\\ \dfrac{\partial f}{\partial z_2} &= \frac{1}{2}\left(\dfrac{\partial f}{\partial x}+i\dfrac{\partial f}{\partial y}\right) \label{wirtinger2}\tag{11} \end{align}

Now, since $f(z,\bar{z})=g(z,\bar{z})$, we have:
\begin{align} \dfrac{\partial f}{\partial x} &= \dfrac{\partial g}{\partial x} \\\\ \dfrac{\partial f}{\partial y} &= \dfrac{\partial g}{\partial y} \end{align}
and, consequently:
\begin{align} \dfrac{\partial f}{\partial z_1} &= \frac{1}{2}\left(\dfrac{\partial g}{\partial x}-i\dfrac{\partial g}{\partial y}\right) = \dfrac{\partial g}{\partial z_1} \\\\ \dfrac{\partial f}{\partial z_2} &= \frac{1}{2}\left(\dfrac{\partial g}{\partial x}+i\dfrac{\partial g}{\partial y}\right) = \dfrac{\partial g}{\partial z_2}\end{align}
$$\tag*{$\Box$}$$

This theorem, though it might seem obvious at a first glance, is not true for its analogue for real functions. For example, take $f:\mathbb{R}\to\mathbb{R}$ given by $f(z) = 0$, where $z\in\mathbb{R}$. Instead of taking a single value of $z$, suppose that now $z$ is a real function of two variables, $x$ and $y$. In the complex case, we would have $z = x+iy$, but now we need a real value, so let's simply use $0$ instead of $i$, so that $z(x,y) = x$. Let's also define $\bar{z}(x,y) \triangleq z(x,-y) = x$. We can write two equivalent representations of $f$ in terms of $z$ and $\bar{z}$: $f \equiv h_1(z,\bar{z})$ and $f \equiv h_2(z,\bar{z})$, where $h_1(a,b) \triangleq 0$ and $h_2(a,b) \triangleq a-b$. But the two representations of that same function give us different partial derivatives:
\begin{align} \left.\dfrac{\partial h_1}{\partial a}(a,b)\right\rvert_{\begin{aligned}a&=z(x,y) \\ b&=\bar{z}(x,y)\end{aligned}} &= 0 \\\\ \left.\dfrac{\partial h_2}{\partial a}(a,b)\right\rvert_{\begin{aligned}a&=z(x,y) \\ b&=\bar{z}(x,y)\end{aligned}} &= 1 \end{align}

What's going on? This happens because the proof for Theorem 4 relies on solving the system of equations (\ref{dfdx}) and (\ref{dfdy}), but now we have $\partial z_1/\partial y = 0$ and $\partial z_2/\partial y = 0$, which cancels $\partial f/\partial z_1$ and $\partial f/\partial z_2$ on equation (\ref{dfdy}).

But what if we replace $i$ with a nonzero real number? To be precise, suppose $f(a,b)$ and $f(a,b)$ are functions $\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ with partial derivatives ($\partial f/\partial a$, $\partial f/\partial b$, $\partial g/\partial a$ and $\partial g/\partial b$) in an open domain $D\subseteq\mathbb{R}^2$. Pick a nonzero $k\in\mathbb{R}$. If for every $x$ and $y$ so that $(x+ky, x-ky)\in D$ we have that $f(x+ky, x-ky) = g(x+ky, x-ky)$, then does the following identity hold in $D$?
$$ \left.\begin{bmatrix}\dfrac{\partial f}{\partial a}(a,b) \\[2.2ex] \dfrac{\partial f}{\partial b}(a,b)\end{bmatrix}\right\rvert_{\begin{aligned}a&=x+ky \\ b&=x-ky\end{aligned}}
  = \left.\begin{bmatrix}\dfrac{\partial g}{\partial a}(a,b) \\[2.2ex] \dfrac{\partial g}{\partial b}(a,b)\end{bmatrix}\right\rvert_{\begin{aligned}a&=x+ky \\ b&=x-ky\end{aligned}} $$
Well, we can simply pick $x=\frac{a+b}{2}$ and $y=\frac{a-b}{2k}$ to show that $f$ and $g$ are actually the same functions in $D$, so the identity above indeed holds. But note that we can't use that same trick for the complex case, because $x$ and $y$ need to be real numbers, while now $\frac{a+b}{2}$ and $\frac{a-b}{2i}$ can be complex. Worse yet, there's an interesting example in the real case where the function is differentiable with respect to both $a$ and $b$, but not with respect to $x$ and $y$:
$$f(a,b) =
\begin{cases}
\cfrac{ab}{a^2+b^2},  & \text{if $a^2+b^2 \neq 0$} \\[2ex]
0, & \text{if $a^2+b^2 = 0$}
\end{cases}$$
This is because Theorem 2 (Hartogs' Theorem) doesn't hold for its real analogue and we used it to prove Theorem 3.

Definition of Wirtinger Derivatives

The unicity of the partial derivatives relatively to the representation of a complex function $f(z,\bar{z})$ which obeys the conditions of Theorem 4 guarantees that there is no ambiguity in the abuse of notation $\partial f/\partial z$ and $\partial f/\partial \bar{z}$, as long as these derivatives are taken as defined in (\ref{defPartial1}) and (\ref{defPartial2}). For example, take $f(a,b) = a$ and $g(a,b) = \bar{b}$. We have that $f(z,\bar{z}) = g(z,\bar{z}) = z$, but $\partial f/\partial z \neq \partial g/\partial z$:
\begin{align} \dfrac{\partial f}{\partial z} &= \left.\dfrac{\partial f}{\partial a}(a,b)\right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = \left.1 \right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = 1 \\\\ \dfrac{\partial g}{\partial z} &= \left.\dfrac{\partial g}{\partial a}(a,b)\right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = \left.0 \right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = 0 \end{align}
That's because only $f$ obeys the conditions of Theorem 4.

The relations found in (\ref{wirtinger1}) and (\ref{wirtinger2}) motivate a generalization of the definition of complex derivative of functions that don't obey the conditions of Theorem 4, presented by Wilhelm Wirtinger in [7].

Definition 1

(Wirtinger Derivatives) Let $f(z):\mathbb{C}\to\mathbb{C}$ be a function with partial derivatives $\partial f/\partial x$ and $\partial f/\partial y$, where $x=\Re\{z\}$ and $y=\Im\{z\}$. We define the complex derivative of $f$ with respect to $z$ and $\bar{z}$, respectively, as:
\begin{align} \dfrac{\partial f}{\partial z} &\triangleq \frac{1}{2}\left(\dfrac{\partial f}{\partial x}-i\dfrac{\partial f}{\partial y}\right) \\\\ \dfrac{\partial f}{\partial \bar{z}} &\triangleq \frac{1}{2}\left(\dfrac{\partial f}{\partial x}+i\dfrac{\partial f}{\partial y}\right) \end{align}
$$\tag*{$\triangle$}$$

The conclusion is that we can calculate the derivative of a function $f(z):\mathbb{C}\to\mathbb{C}$ in two alternative ways. One way is to first verify if the partial derivatives $\partial f/\partial x$ and $\partial f/\partial y$ exist. If the answer is negative, the derivative does not exist and there's nothing we can do. But if the answer is positive, we can apply the Wirtinger derivatives directly. If this operation is too cumbersome, we can follow an alternative path: we verify if $f(z)$ can be written as $g(z,\bar{z})$ such that, $\forall a,b,z\in\mathbb{C}$, the functions $g_{1a}(z)\triangleq g(a,z)$ and $g_{2b}(z)\triangleq g(z,b)$ are holomorphic. If that is possible, we can calculate $\partial f/\partial z$ and $\partial f/\partial \bar{z}$ as in (\ref{defPartial1}) and (\ref{defPartial2}). The result, according to (\ref{wirtinger1}) and (\ref{wirtinger2}), will coincide with the Wirtinger derivatives. The latter approach is usually easier in practice.

Example 1

Calculate the derivatives of $f(z)=\|z\|^2$, where $z\in\mathbb{C}$.

Solution: Writting $z = z(x,y) = x+iy$ and $f(z) = f(z(x,y)) = x^2+y^2$, we have, by the Wirtinger derivatives:
\begin{align} \dfrac{\partial f}{\partial z} &\triangleq \frac{1}{2}\left(2x-i2y\right) = \bar{z} \\\\ \dfrac{\partial f}{\partial \bar{z}} &\triangleq \frac{1}{2}\left(2x+i2y\right) = z \end{align}

Alternatively, we can write $f(z)$ as $f(z) = g(z,\bar{z}) = z\bar{z}$, where $g(a,b) = ab$. Since $g$ is simply a linear function in $a$ and $b$ separately, the functions $g_{1a}(z)\triangleq g(a,z) = az$ and $g_{2b}(z)\triangleq g(z,b) = zb$ are holomorphic. Thus:
\begin{align} \dfrac{\partial f}{\partial z} &= \left.\dfrac{\partial g}{\partial a}(a,b)\right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = \left.b \right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = \bar{z} \\\\ \dfrac{\partial f}{\partial \bar{z}} &= \left.\dfrac{\partial g}{\partial b}(a,b)\right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = \left.a \right\rvert_{\begin{aligned}a&=z \\[-4pt] b&=\bar{z}\end{aligned}} = z \end{align}
$$\tag*{$\Diamond$}$$

Also note that a small variation $\delta z = \delta x + i\delta y$ gives a variation of $\delta f$ which can be approximated by:
\begin{align} \delta f &\approx \dfrac{\partial f}{\partial x}\delta x + \dfrac{\partial f}{\partial y}\delta y \\\\ &= \frac{1}{2}\left(\dfrac{\partial f}{\partial x}-i\dfrac{\partial f}{\partial y}\right)\left(\delta x+i\delta y\right) \\\\ &\quad + \frac{1}{2}\left(\dfrac{\partial f}{\partial x}+i\dfrac{\partial f}{\partial y}\right)\left(\delta x-i\delta y\right) \\\\ &= \dfrac{\partial f}{\partial z}\delta z + \dfrac{\partial f}{\partial \bar{z}}\delta \bar{z} \end{align}
If the function is $f:\mathbb{C}\to\mathbb{R}$, we have $\partial f/\partial \bar{z} = \overline{\partial f/\partial z}$. Thus, the expression above simplifies to:
$$ \delta f \approx 2\Re\left\{\dfrac{\partial f}{\partial z}\delta z\right\}$$

Complex Gradient

In complex linear algebra, transposition and conjugation are often taken simultaneously, which motivates the definition of the hermitian of a vector or matrix $A$: $A^*\triangleq \bar{A}^T$. In this and in the subsequent sections, we're going to show some of the definitions of derivatives of functions of several variables that exist in the signal processing literature and then suggest the definition (or notation) that minimizes the use of operations of transposition and conjugation separately. The goal is to create an intuition that, by using the appropriate definitions, the hermitian in the complex case can be interpreted as the natural generalization of transposition in the real case.

Using the Wirtinger derivatives, the definitions of complex gradient given by Hjørungnes [3], Brandwood [4] and Diniz [1] for functions $f:\mathbb{C}^N\to\mathbb{C}$ are:
\begin{align} \nabla_z' f &\triangleq \begin{bmatrix}\dfrac{\partial f}{\partial z_1} \\[2ex] \dfrac{\partial f}{\partial z_2} \\ \vdots \\ \dfrac{\partial f}{\partial z_N}\end{bmatrix} \qquad\text{and}\qquad \nabla_{\bar{z}}' f \triangleq \begin{bmatrix}\dfrac{\partial f}{\partial z_1^*} \\[2ex] \dfrac{\partial f}{\partial z_2^*} \\ \vdots \\ \dfrac{\partial f}{\partial z_N^*}\end{bmatrix} \end{align}
where $\{z_1,z_2,\cdots,z_N\}$ are the arguments of $f$ and the apostrophe in $\nabla'$ was used only to avoid ambiguity in the notation for other definitions. This representation of both gradients as column vectors seems to be the most widely used in the literature, possibly to maintain consistency with the works of Brandwood [4] and Hjørungnes [8].

Another definition, given by Haykin [9], is:
$$ \nabla'' f \triangleq 2\nabla_{\bar{z}}' f $$

In [9], $\nabla_z' f$ and $\nabla_{\bar{z}}' f$ are written as $\partial f/\partial z$ and $\partial f/\partial \bar{z}$, respectively.

Another definition, offered by van den Bos [10], is:
$$ \nabla''' f \triangleq \begin{bmatrix}\dfrac{\partial f}{\partial z_1} \\[2ex] \dfrac{\partial f}{\partial z_1^*} \\ \vdots \\ \dfrac{\partial f}{\partial z_N} \\[2ex] \dfrac{\partial f}{\partial z_N^*}\end{bmatrix}_{2N\times 1} $$

Finally, the proposed definitions are the ones suggested by Sayed [11], described below.

Definition 2

(Complex Gradient) Let $f$ be a function $f:\mathbb{C}^N\to \mathbb{C}$. We define the complex gradients $\nabla_z f$ and $\nabla_{z^*}f$, given that the partial derivatives below exist, as:
$$ \nabla_z f \triangleq \begin{bmatrix}\dfrac{\partial f}{\partial z_1} & \dfrac{\partial f}{\partial z_2} & \cdots & \dfrac{\partial f}{\partial z_N}\end{bmatrix} = \nabla_z'^T f $$
$$ \nabla_{z^*} f \triangleq \begin{bmatrix}\dfrac{\partial f}{\partial z_1^*} \\[2ex] \dfrac{\partial f}{\partial z_2^*} \\ \vdots \\ \dfrac{\partial f}{\partial z_N^*}\end{bmatrix} = \nabla_{\bar{z}}'f $$
$$\tag*{$\triangle$}$$

Notice that $\nabla_z f$ is defined as a row vector, but $z$ is a column vector. In the tensor jargon for the real case, this is because the gradient is a covariant vector, while $z$ is contravariant. Also note that if $f$ is $f:\mathbb{C}^N\to\mathbb{R}$, then $\nabla_{z^*} f = (\nabla_z f)^*$. Now we can establish the following theorems:

Theorem 5

Let $f(z):\mathbb{C}^N\to\mathbb{R}$ be a function with holomorphic representations in $z$ and $z^*$ separately. Then either of the conditions $\nabla_z f = 0$ or $\nabla_{z^*} f = 0$ are necessary and sufficient to determine a stationary point of $f$.

Proof: According to [4], the conditions are given by $\nabla_z' f = 0$ or $\nabla_{\bar{z}}' f = 0$.

Theorem 6

Let $f$ defined as in Theorem 5. The complex conjugate gradient $\nabla_{z^*}f = (\nabla_z f)^*$ indicates the direction of maximum variation of $f$. Furthermore, an infinitesimal change in $z$ ($dz$), corresponds to a change in $f$ given by:
\begin{align} df &= \nabla_z f dz + dz^*\nabla_{z^*} f \\\\  &= 2\Re\left\{\nabla_z f dz\right\} \end{align}

Proof: According to [4], $df$ is given by $2\Re\left\{\left(\nabla_{\bar{z}}' f\right)^* dz\right\}$ and the direction of maximum variation is given by $\nabla_{\bar{z}}'f$. Since $f$ is real, we have $\nabla_{\bar{z}}'f = \nabla_{z^*}f = (\nabla_z f)^*$.

Complex Jacobian

In [3], Hjørungnes defines jacobians $\mathcal{D}_z f$ and $\mathcal{D}_{\bar{z}} f$ of functions $f:\mathbb{C}^N\to \mathbb{C}^M$ as:
\begin{align}  \mathcal{D}_z f &\triangleq \begin{bmatrix} \dfrac{\partial f_1}{\partial z_1} & \cdots & \dfrac{\partial f_1}{\partial z_N} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_M}{\partial z_1} & \cdots & \dfrac{\partial f_M}{\partial z_N} \end{bmatrix}_{M\times N} \\ \mathcal{D}_{\bar{z}} f &\triangleq \begin{bmatrix} \dfrac{\partial f_1}{\partial z_1^*} & \cdots & \dfrac{\partial f_1}{\partial z_N^*} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_M}{\partial z_1^*} & \cdots & \dfrac{\partial f_M}{\partial z_N^*} \end{bmatrix}_{M\times N} \end{align}

Our definition will be a straightforward generalization of the complex gradients previously defined, such that for functions $f:\mathbb{C}^N\to \mathbb{C}$, jacobians and complex gradients are the same.

Definition 3

(Complex Jacobian) Let $f:\mathbb{C}^N\to \mathbb{C}^M$. We define the complex jacobians $\mathcal{J}_z f$ and $\mathcal{J}_{z^*} f$, if they exist, as:

\begin{align} \mathcal{J}_z f &\triangleq \begin{bmatrix} \dfrac{\partial f_1}{\partial z_1} & \cdots & \dfrac{\partial f_1}{\partial z_N} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_M}{\partial z_1} & \cdots & \dfrac{\partial f_M}{\partial z_N} \end{bmatrix}_{M\times N} = \mathcal{D}_z f \\ \mathcal{J}_{z^*} f &\triangleq \begin{bmatrix} \dfrac{\partial f_1}{\partial z_1^*} & \cdots & \dfrac{\partial f_M}{\partial z_1^*} \\ \vdots & \ddots & \vdots \\ \dfrac{\partial f_1}{\partial z_N^*} & \cdots & \dfrac{\partial f_M}{\partial z_N^*} \end{bmatrix}_{N\times M} = \mathcal{D}_{\bar{z}}^T f \end{align}
$$\tag*{$\triangle$}$$

Complex Hessian

The following theorem will be useful for interpreting complex hessians:

Theorem 7

Let $f:\mathbb{C}^N\to \mathbb{C}$ such that the second-order derivatives $\frac{\partial^2 f}{\partial z_i\partial z_k}$, $\frac{\partial^2 f}{\partial z_i\partial z_k^*}$, $\frac{\partial^2 f}{\partial z_i^*\partial z_k}$, and $\frac{\partial^2 f}{\partial z_i^*\partial z_k^*}$ exist $\forall i,k\in\{1,2,\dots,N\}$. Thus:

\begin{align} \frac{\partial^2 f}{\partial z_i\partial z_k} &= \frac{\partial^2 f}{\partial z_k\partial z_i}\\ \frac{\partial^2 f}{\partial z_i\partial z_k^*} &= \frac{\partial^2 f}{\partial z_k^*\partial z_i} \\ \frac{\partial^2 f}{\partial z_i^*\partial z_k} &= \frac{\partial^2 f}{\partial z_k\partial z_i^*} \\ \frac{\partial^2 f}{\partial z_i^*\partial z_k^*} &= \frac{\partial^2 f}{\partial z_k^*\partial z_i^*} \end{align}

Proof: See [8].

The definition of complex hessian for functions $f:\mathbb{C}^N\to\mathbb{C}$ suggested by van den Bos [10] is the matrix of size $2N\times 2N$ given by:

\begin{align} \mathcal{H}'''_f &\triangleq \begin{bmatrix} \dfrac{\partial^2 f}{\partial z_1^*\partial z_1} & \dfrac{\partial^2 f}{\partial z_1^*\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_1^*\partial z_N} & \dfrac{\partial^2 f}{\partial z_1^*\partial z_N^*} \\[2ex] \dfrac{\partial^2 f}{\partial z_1\partial z_1} & \dfrac{\partial^2 f}{\partial z_1\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_1\partial z_N} & \dfrac{\partial^2 f}{\partial z_1\partial z_N^*} \\ \vdots & \vdots & \ddots & \vdots & \vdots \\[1ex] \dfrac{\partial^2 f}{\partial z_N^*\partial z_1} & \dfrac{\partial^2 f}{\partial z_N^*\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_N^*\partial z_N} & \dfrac{\partial^2 f}{\partial z_N^*\partial z_N^*} \\[2ex] \dfrac{\partial^2 f}{\partial z_N\partial z_1} & \dfrac{\partial^2 f}{\partial z_N\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_N\partial z_N} & \dfrac{\partial^2 f}{\partial z_N\partial z_N^*} \end{bmatrix} \end{align}

Hjørungnes [8], in contrast, generalizes the idea of defining two distinct gradients for the complex case to define four distinct complex hessians for functions $f:\mathbb{C}^N\to\mathbb{C}$ that satisfy certain criteria:

Definition 4

(Complex Hessian) Let $f:\mathbb{C}^N\to \mathbb{C}$ such that the second-order derivatives $\frac{\partial^2 f}{\partial z_i\partial z_k}$, $\frac{\partial^2 f}{\partial z_i\partial z_k^*}$, $\frac{\partial^2 f}{\partial z_i^*\partial z_k}$, and $\frac{\partial^2 f}{\partial z_i^*\partial z_k^*}$ exist $\forall i,k\in\{1,2,\dots,N\}$, and for which their second-order differential can be written as:

\begin{align} d^2f = \begin{bmatrix}dz^* & d^T z\end{bmatrix}\begin{bmatrix}A_{0,0} & A_{0,1} \\ A_{1,0} & A_{1,1}\end{bmatrix}\begin{bmatrix}dz \\ d^T z^*\end{bmatrix} \end{align}

where the matrices $A_{i,k}\in\mathbb{C}^{N\times N}$, $i,k\in\{0,1\}$, can be dependent of $z$ and $\bar{z}$, but not of $dz$ and $dz^*$. Then the complex hessian matrices are defined as:

\begin{align} \mathcal{H}_{z^*,z}f &\triangleq \frac{A_{0,0}+A_{1,1}^T}{2} = (\mathcal{H}'_{\bar{z},z}f)^T \\ \mathcal{H}_{z,z^*}f &\triangleq \frac{A_{0,0}^T+A_{1,1}}{2} = (\mathcal{H}'_{z,\bar{z}}f)^T \\ \mathcal{H}_{z,z}f &\triangleq \frac{A_{1,0}+A_{1,0}^T}{2} = (\mathcal{H}'_{z,z}f)^T \\ \mathcal{H}_{z^*,z^*}f &\triangleq \frac{A_{0,1}+A_{0,1}^T}{2} = (\mathcal{H}'_{\bar{z},\bar{z}}f)^T \end{align}

Explicitly, we have:

\begin{align} \mathcal{H}_{z^*,z}f &= \begin{bmatrix}\dfrac{\partial^2 f}{\partial z_1^*\partial z_1} & \cdots & \dfrac{\partial^2 f}{\partial z_1^*\partial z_N} \\ \vdots & \ddots & \vdots \\[1ex] \dfrac{\partial^2 f}{\partial z_N^*\partial z_1} & \cdots & \dfrac{\partial^2 f}{\partial z_N^*\partial z_N} \end{bmatrix} = \mathcal{J}_{z^*}(\mathcal{J}_z f) \\ \mathcal{H}_{z,z^*}f &= \begin{bmatrix}\dfrac{\partial^2 f}{\partial z_1\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_1\partial z_N^*} \\ \vdots & \ddots & \vdots \\[1ex] \dfrac{\partial^2 f}{\partial z_N\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_N\partial z_N^*} \end{bmatrix} = \mathcal{J}_z(\mathcal{J}_{z^*} f) \\ \mathcal{H}_{z,z}f &= \begin{bmatrix}\dfrac{\partial^2 f}{\partial z_1\partial z_1} & \cdots & \dfrac{\partial^2 f}{\partial z_N\partial z_1} \\ \vdots & \ddots & \vdots \\[1ex] \dfrac{\partial^2 f}{\partial z_1\partial z_N} & \cdots & \dfrac{\partial^2 f}{\partial z_N\partial z_N} \end{bmatrix} = \mathcal{J}_z(\mathcal{J}_z f)^T \\ \mathcal{H}_{z^*,z^*}f &= \begin{bmatrix}\dfrac{\partial^2 f}{\partial z_1^*\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_1^*\partial z_N^*} \\ \vdots & \ddots & \vdots \\[1ex] \dfrac{\partial^2 f}{\partial z_N^*\partial z_1^*} & \cdots & \dfrac{\partial^2 f}{\partial z_N^*\partial z_N^*} \end{bmatrix} = \mathcal{J}_{z^*}(\mathcal{J}_{z^*} f)^T \end{align}
$$\tag*{$\triangle$}$$

In order to determine whether a stationary point of a function $f:\mathbb{C}^N\to\mathbb{R}$ is a local minimum, local maximum, or a saddle point, one should check if the matrix:

\begin{align} \mathcal{H}_f \triangleq \begin{bmatrix}\mathcal{H}_{z^*,z}f & \mathcal{H}_{z^*,z^*}f \\ \mathcal{H}_{z,z}f & \mathcal{H}_{z,z^*}f\end{bmatrix}_{2N\times 2N} \end{align}

is positive-definite, negative-definite, or indefinite, respectively. In [8], an example is given of a function whose hessian $\mathcal{H}_{z^*,z}f$ is positive-definite, but whose corresponding stationary point is a saddle point, because the matrix $\mathcal{H}_f$ is indefinite.

References

[1] Paulo S. R. Diniz, Adaptive Filtering—Algorithms and Practical Implementation. Springer, Fourth Edition, (2013).

[2] Marcio G. Soares, Cálculo em uma Variável Complexa. Instituto Nacional de Matemática Pura e Aplicada (IMPA), Coleção Matemática Universitária, 4ª ed., (2007). (In Portuguese)

[3] Are Hjørungnes, Complex-Valued Matrix Derivatives: With Applications in Signal Processing and Communications. Cambridge University Press, (2011).

[4] D. H. Brandwood, A Complex Gradient Operator and its Application in Adaptive Array Theory. Communications, Radar and Signal Processing, IEE Proceedings F, Volume 130, Issue 1, pp. 11-16, (February 1983).

[5] J.-Cl. Evard, F. Jafari, A Complex Rolle's Theorem. American Mathematical Monthly, Vol. 99, Issue 9, pp. 858-861, (Nov. 1992).

[6] Robert C. Gunning, Introduction to Holomorphic Functions of Several VariablesVolume I: Function Theory. Wadsworth & Brooks/Cole, Mathematics Series, p. 15, (1990).

[7] Wirtinger, Wilhelm, Zur formalen Theorie der Funktionen von mehr komplexen Veränderlichen. Mathematische Annalen, 97: 357-375, (1926). (In German)

[8] Are Hjørungnes, Complex-Valued Matrix Differentiation: Techniques and Key Results. IEEE Transactions on Signal Processing, Volume 55, Issue 6, pp. 2740-2746, (June 2007).

[9] Simon S. Haykin, Adaptive Filter Theory. Prentice Hall, 3rd Edition (1996).

[10] A. van den Bos, Complex Gradient and Hessian. Vision, Image and Signal Processing, IEE Proceedings, Vol. 141,  Issue 6, (Dec. 1994).

[11] Ali H. Sayed, Adaptive Filters. Wiley, (2008).

No comments:

Post a Comment