# Julian's musings

## Implicit differentiation I

I’ve been thinking about implicit differentiation with my colleagues recently. How do we teach it (at high school level), and what subtleties are involved? It started by trying to understand what we mean by the equation

\begin{equation} \frac{dy}{dx}=1\biggm/\frac{dx}{dy}. \label{eq:recip} \end{equation}

Some questions raised by this include:

(a) What does this equation mean?

(b) How can we explain this to students and also why it is true?

(c) Where would this result be useful to them (besides in artificial exam questions)?

In this post, I will offer some thoughts on (a) and (b), but I’m still fairly stuck on (c).

A typical textbook explanation of the formula begins as follows: “Suppose that $x$ is given as a function of $y$” and then goes on to give a reasonable-looking explanation involving $\delta x$ and $\delta y$. Some books draw a sketch to illustrate this, while others just use algebra.

In a particular commonly-used textbook, a few examples then show how this can be used when we have $x=f(y)$ for some function $f$. One of them is $x=y^2$. Here we have $\frac{dx}{dy}=2y$, so $\frac{dy}{dx}=\frac{1}{2y}$. The textbook notes that although this could be written as $\frac{dy}{dx}=\frac{1}{2\sqrt{x}}$, it is more common to leave it as a function of $y$, matching the form of the original relation.

But if we sketch the graph of $x=y^2$, it becomes clear that this note is simply incorrect. Here, if we regard the derivative as $\frac{1}{2\sqrt{x}}$, then at both $A(4,2)$ and $B(4,-2)$, we would obtain the derivative $\frac{dy}{dx}=\frac{1}{4}$, which is clearly wrong. However, the original version $\frac{1}{2y}$ would give derivatives of $\frac{1}{4}$ at $A$ but $-\frac{1}{4}$ at $B$. (And we can’t fix things by saying, “Well, the derivative is $\pm\frac{1}{2\sqrt{x}}$”, because how do we decide which sign to take at any particular value of $x$?)

So there is something inherently different about the two offered forms of the derivative: one is given as a function of $y$ and “works”, while the other is given as a function of $x$ and fails, and it is clearly because we are given $x$ as a function of $y$, so $\frac{dx}{dy}$ is a meaningful function of $y$.

Another point to note is that when we write $\frac{dy}{dx}$, we are thinking of $y$ as a function of $x$, and then asking how the function $y$ changes as $x$ changes. Therefore, when we write $\frac{dx}{dy}$, we are thinking of $x$ as a function of $y$ – as it is in our case, and then asking how $x$ changes as $y$ changes. So the original equation \eqref{eq:recip} is actually relating the behaviour of $x$ as a function of $y$ to the behaviour of $y$ as a function of $x$. It is not even obvious that this makes sense, as we have seen that $y$ may not be a function of $x$!

There is a function from analysis called “The Inverse Function Theorem” which sheds light on this. I’ll briefly describe that later, but in our context, it (roughly) tells us the following:

Consider the function $x=f(y)$, and assume that at $(x_0, y_0)$ (where $x_0=f(y_0)$), the derivative $f’(y_0)$ is non-zero. Then we can restrict the domain of $f$ to an interval containing $y_0$ so that it becomes invertible with inverse $y=g(x)$, say. Then $g(x)$ is differentiable and we have

where $x=f(y)$ and $y$ lies in this restricted domain. In other notation, this equation reads

So in our case of $x=y^2$, when looking at the point $A(4,2)$, we could restrict the domain of the function to $1<y<3$ as shown here: (We could alternatively have restricted to $y>0$, but it makes no difference to the derivative at $A$.) Then the function is one-to-one on the domain $1<y<3$, so it has an inverse $y=+\sqrt{x}$ there, and we have $\frac{dy}{dx}=1\bigm/\frac{dx}{dy}$ as required. And if we wish, we could write the derivative in terms of $x$ as $\frac{dy}{dx}=\frac{1}{2\sqrt{x}}$. If, on the other hand, we looked at the point $B(4,-2)$, then we could restrict the domain to $-3<y<-1$ and find that the inverse function is $y=-\sqrt{x}$. In this case, then, $\frac{dy}{dx}=-\frac{1}{2\sqrt{x}}$. Finally, at the origin, we have $\frac{dx}{dy}=0$: the function does not have a local inverse there, and we do not have a value for $\frac{dy}{dx}$. (There is some sense in which it is infinite at the origin.)

How can we explain this subtlety to students?

One way may just be to offer them examples such as the above, and ask how we can write the derivative $\frac{dy}{dx}$.

A visual argument for the relationship between $\frac{dy}{dx}$ and $\frac{dx}{dy}$ is the approach the textbook offered, once we understand that we are talking about functions and their inverses.

An alternative argument, which is more algebraic, is to use the chain rule: if $y=g(x)$ is the (local) inverse of $x=f(y)$, then we have $g(f(y))=y$. If we differentiate both sides with respect to $y$, we obtain

If we write $x=f(y)$, then this becomes our familiar $g’(x).f’(y)=1$, or $g’(x)=1/f’(y)$.

(It may also be worth noting that $x=f(y)$ may have an inverse even if $f’(y_0)=0$, for example $x=y^3$ has the inverse $y=\sqrt{x}$, but this is not differentiable at the origin.)

This still doesn’t give a reason for why students might want to use this result! And of course, any time that we want to find $\frac{dy}{dx}$ and we are given $x$ as a function of $y$, we can differentiate both sides with respect to $x$, using implicit differentiation. And that renders this result somewhat pointless for school calculus. So any thoughts on why students might find a need for this would be welcomed!

### The Inverse Function Theorem

I mentioned the Inverse Function Theorem earlier. Here’s a statement of the theorem from Tom Apostol’s “Mathematical Analysis” (2nd edition).

Theorem 13.6 (The Inverse Function Theorem) Assume $\mathbf{f}=(f_1,\dots,f_n)\in C’$ (i.e., continuously differentiable) on an open set $S$ in $\mathbb{R}^n$, and let $T=\mathbf{f}(S)$. If the Jacobian determinant $J_{\mathbf{f}}(\mathbf{a})\ne 0$ for some point $\mathbf{a}$ in $S$, then there are two open sets $X\subseteq S$ and $Y\subseteq T$ and a uniquely determined function $\mathbf{g}$ such that

(a) $\mathbf{a}\in X$ and $\mathbf{f}(\mathbf{a})\in Y$,

(b) $Y=\mathbf{f}(X)$,

(c) $\mathbf{f}$ is one-to-one on $X$,

(d) $\mathbf{g}$ is defined on $Y$, $\mathbf{g}(Y)=X$, and $\mathbf{g}[\mathbf{f}(\mathbf{x})]=\mathbf{x}$ for every $\mathbf{x}$ in $X$,

(e) $\mathbf{g}\in C’$ on $Y$.

(I won’t attempt to explain the technical terms here, as this post is too long already; the internet has much on these for the interested reader.)

We can apply this theorem to our context. We are dealing initially with a function $x=f(y)$, so we take $n=1$ and let $\mathbf{f}=(f_1)=(f)$. Our functions at high school level are almost all well-behaved (that is, smooth), except perhaps at an occasional point, so we will just ignore the $C’$ issue, so we can take $S$ to be the domain of the function $f$ and $T$ to be its range.

The Jacobian determinant for our one-dimensional function $f$ is just $f’(y)$, so then this theorem simplifies to the (less precisely stated) result we gave above, noting though that the $\mathbf{x}$ of the theorem is our $y$, and $\mathbf{a}$ is our $y_0$. The relationship between the derivatives follows from (d) using the chain rule, as we described above.