Julian's Musings

The PyTorch add_module() function

Julian Gilbey — 2021-04-23T18:00:00+01:00

I have been building some bespoke PyTorch models, and have just been stung by a bug; it turns out that using the add_module() method is sometimes critical to making a PyTorch model work. Without this method, the program may just crash, but might also just about work but give completely meaningless results.

Though there do not seem to be any hints about this in the documentation, it seems that PyTorch determines the layers or others Modules used in a particular Module by looking at the type of object stored in each member of the Module. And if that object is not a Module, PyTorch does not recognise it and will not backpropagate through it.

Here is an example. We take the Quickstart from the PyTorch tutorials webpage. The neural network is defined in it as follows:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Note that the layers, nn.Flatten() and nn.Sequential(...), are stored as members of the NeuralNetwork object, by writing self.flatten = ... and so on.

Let us now change the code to store these layers in a list instead:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        flatten = nn.Flatten()
        linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
        self.layers = [flatten, linear_relu_stack]

    def forward(self, x):
        x = self.layers[0](x)
        logits = self.layers[1](x)
        return logits

The forward() method is also modified to use the appropriate element of the list of layers. But now PyTorch ignores the self.layers member, as it is not a Module, and the following code breaks quite badly.

The simplest way to fix this, while keeping the layers as a list, is to inform PyTorch about the existence of these layers using the add_module() method, as follows:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        flatten = nn.Flatten()
        linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
        self.layers = [flatten, linear_relu_stack]
        for i, layer in enumerate(self.layers):
            self.add_module(f"layer_{i}", layer)

    def forward(self, x):
        x = self.layers[0](x)
        logits = self.layers[1](x)
        return logits

The first parameter of the add_module() method is a name that PyTorch will use to refer to the layer when printing the neural network model, while the second is the layer itself. The name can also be used to refer to the layer as an attribute of the Module object, so it is presumably important that the names are unique within the Module and potentially helpful if they are valid Python identifiers (though if they are not, they can still be accessed using getattr()).

And with that addition, the code once again works.

(If you are wondering why we would store the layers in a list in the first place, I had a use case where the network was constructed with a variable number of layers passed to __init__() as a list.)

Edits 27 April 2021

A colleague has just pointed out the nn.ModuleList class to me. So this problem could also be solved in a simpler way as follows:

class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        flatten = nn.Flatten()
        linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )
        self.layers = nn.ModuleList([flatten, linear_relu_stack])

    def forward(self, x):
        x = self.layers[0](x)
        logits = self.layers[1](x)
        return logits

Installing PyTorch on Debian bullseye (Debian 11)

Julian Gilbey — 2021-04-12T09:20:00+01:00

I have just rebuilt PyTorch for my Debian bullseye machine (Debian 11 as it will become) from source. This blog post documents how I did this.

System setup

I have the following key packages/libraries installed. There will certainly be others that I have overlooked, of course; please feel free to let me know of anything I’ve overlooked.

Python: python3-dev (currently version 3.9.2-2)
Essential packages (according to the PyTorch installation page) are:
- ninja-build
- cmake
- libmagma-dev
- python3-numpy
- python3-yaml
- python3-setuptools
- python3-cffi
- python3-typing-extensions
- python3-future
- python3-six
- python3-requests
The PyTorch website also says that mkl should be included, which is probably provided by the libmkl-dev package. But that provides alternatives to the libblas library packages, so it fine to install one of the open source libblas-dev packages instead (libblas-dev, libblas64-dev, libopenblas-dev or libopenblas64-dev).
Though the PyTorch page says that dataclasses is required, this is only for Python versions less than 3.7, so there is no need to install this.
CUDA: The Debian non-free archive includes the package nvidia-cuda-toolkit, which does everything required (unlike the case with TensorFlow).
cuDNN: libcudnn8-dev (version 8.1.0.77-1+cuda11.2; this package was downloaded directly from the NVIDIA website)

Building PyTorch

I started by following the guidance on the PyTorch installing from source webpage.

Downloading the PyTorch source code

I followed the instructions as given; I also checked out the latest release branch, which at the time of writing is r1.8.1.

$ git clone --recursive https://github.com/pytorch/pytorch
$ cd pytorch
$ git checkout -t origin/release/1.8
$ git submodule sync
$ git submodule update --init --recursive

(where the last two commands are needed if updating an existing checkout, though they do not do any harm if not).

Pre-build

The version number in the PyTorch sources does not match the official version number for some reason. So I “corrected” it:

$ echo 1.8.1 > version.txt

Building the package

It’s a one-line command:

$ python3 setup.py install --user

This compilation step is pretty long, but that is all that is needed.

To clean the build directory after it had finished, I ran:

$ python3 setup.py clean

Building torchvision and torchaudio

Installing dependencies

torchvision looks for certain graphics libraries during the build. I’m not sure which are strictly required, but here are the packages to install to ensure that everything is present (in addition to having PyTorch itself already installed as above):

libpng-dev
libpng-tools
libjpeg-dev
ffmpeg
libavcodec-dev
libavformat-dev
libavutil-dev
libswresample-dev
libswscale-dev

There is also an extra dependency on python3-scipy in the setup.py extras_require option; I am unclear whether this is needed.

torchaudio does not have any dependencies beyond PyTorch itself.

Downloading and building

I used pip (or pip3) to install rather than directly using setup.py, as otherwise the package gets installed in egg format.

I installed torchvision as follows:

$ git clone https://github.com/pytorch/vision.git
$ cd vision
$ git checkout -t origin/release/0.9
$ echo 0.9.1 > version.txt
$ pip install --user .

Cleaning, if needed, apparently still needs to be done via setup.py:

python3 setup.py clean

torchaudio was only a tiny bit trickier:

$ git clone --recursive https://github.com/pytorch/audio.git
$ cd audio
$ git checkout -t origin/release/0.8
$ git submodule sync
$ git submodule update --init --recursive

(As with PyTorch, the last two commands are needed if updating an existing checkout, though they do not do any harm if not).

Then I edited setup.py, modifying lines 14-16 to read:

# Creating the version file                                                     
version = '0.8.1'
sha = 'e4e171a51714b2b2bd79e1aea199c3f658eddf9a'

Then a one-line build command:

$ pip install --user .

and I was done.

Installing TensorFlow and TensorFlow Addons on Debian bullseye (Debian 11)

Julian Gilbey — 2021-03-24T11:25:00+00:00

Though I usually use PyTorch for my deep learning work, I have just been given a piece of code written using TensorFlow, so I needed to install TensorFlow and TensorFlow Addons on my Debian testing system (aka Debian bullseye, which will shortly become Debian 11). Unfortunately, the binaries available on PyPI are only built for Python 3.6-3.8, but Debian bullseye now runs Python 3.9.

This blog post documents how I managed to build these packages for my system (which was far more effort than it probably should have been!).

System setup

I have the following key packages/libraries installed. There will certainly be others that I have overlooked, of course; please feel free to let me know of anything I’ve overlooked.

Python: python3-dev (currently version 3.9.2-2)
Essential Python packages (according to the TensorFlow installation page) are pip, numpy, wheel and keras_preprocessing. The Debian packages providing these are:
- python3-keras-preprocessing
- python3-numpy
- python3-pip
- python3-wheel
though the version of keras_preprocessing currently in the Debian archive is older than that required by the TensorFlow installation, so it may be wiser to install it with
```
pip install -U --user keras_preprocessing --no-deps
```
as explained on the TensorFlow installation page.
Python packages: either install Debian versions of these or let pip install them during the package installation. The relevant packages seem to be:
- python3-flatbuffers
- python3-google-auto-oauthlib
- python3-grpcio (*)
- python3-h5py
- python3-markdown
- python3-protobuf
- python3-requests
- python3-setuptools
- python3-six
- python3-termcolor
- python3-typeguard (*)
- python3-typing-extensions
- python3-werkzeug
- python3-wrapt and their dependencies.
(*): In these cases, the current Debian version is too old for TensorFlow 2.4.1.
cuDNN: libcudnn8-dev (version 8.1.0.77-1+cuda11.2; this package was downloaded directly from the NVIDIA website)
Go compiler: golang-go - this is needed for bazelisk.

Unfortunately, it turns out that the Debian-packaged CUDA packages do not work when building TensorFlow; see this GitHub issue. It is likely this this will not be fixed. On the other hand, the NVIDIA-provided Debian packages don’t seem to work nicely with some of the other parts of the system, so I never used them.

To get around this, I downloaded cuda_11.2.1_460.32.03_linux.run from the NVIDIA website. (The current version, though, is 11.2.2, but as my Debian CUDA packages are 11.2.1, I downloaded the older version from the Archive of Previous CUDA Releases.)

I needed write permission on the parent directory of the desired target location, so I created a directory /usr/local/cuda-11.2.1 (as root) and then ran

$ chown jdg:jdg /usr/local/cuda-11.2.1

(I could equally have created this directory in some other location without needing to be root.) I then unpacked the CUDA package into it:

$ sh cuda_11.2.1_460.32.03_linux.run --installpath=/usr/local/cuda-11.2.1/cuda

After the installation, I tidied up:

$ mv /usr/local/cuda-11.2.1/cuda/* /usr/local/cuda-11.2.1/
$ rmdir /usr/local/cuda-11.2.1/cuda/

so that everything is now directly in /usr/local/cuda-11.2.1. (At the end of the build process, it appears that this entire directory can be deleted, as long as the relevant Debian CUDA packages are still present.)

Building TensorFlow

I started by following the guidance on the TensorFlow installing from source webpage.

Installing Bazel

There will eventually be a Debian bazel package, but unfortunately that is some way in the future still (there are a team working on this, but there are currently technical difficulties). So I installed Bazelisk, following the instructions on the Baselisk GitHub page:

$ go get github.com/bazelbuild/bazelisk
$ export PATH=$PATH:$(go env GOPATH)/bin
$ (cd $(go env GOPATH)/bin && ln -s bazelisk bazel)

(The export may well be extraneous, as PATH is usually already exported.)

Build directory setup

Since I am building both TensorFlow and TensorFlow Addons, I created a directory called ~/packages/tensorflow and clone the git repositories into that directory; the intention is to build the wheels in that same directory, so everything is together.

Downloading the TensorFlow source code

I followed the instructions as given; I also checked out the latest release branch, which at the time of writing is r2.4.

$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout -t origin/r2.4

Configuring the build

Here begins the fun! Some of these lines have been wrapped to fit better on the screen.

$ ./configure
You have bazel 3.1.0 installed.
Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/lib/python3.9/dist-packages
  /usr/local/lib/python3.9/dist-packages
  /home/jdg/lib/python
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]
/home/jdg/.local/lib/python3.9/site-packages

I can’t write to the system directory, and it seems as though this is where some libraries may be written, so I set it to be my local repository instead. I don’t know whether leaving this as /usr/lib/... would work equally well.

Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.

I accepted the defaults for both of these; I don’t have TensorRT installed.

Inconsistent CUDA toolkit path: /usr vs /usr/libAsking for detailed CUDA configuration...

Please specify the CUDA SDK version you want to use.
[Leave empty to default to CUDA 10]: 11.2


Please specify the cuDNN version you want to use.
[Leave empty to default to cuDNN 7]: 8.1

I have CUDA 11.2.1 installed, so I responded 11.2. Perhaps this would be better as just 11, but I’m not sure. Likewise, I have cuDNN 8.1.0.77 installed, so I responded 8.1.

Please specify the locally installed NCCL version you want to use.
[Leave empty to use http://github.com/nvidia/nccl]: 

I left this empty, as I do not have NCCL installed.

Please specify the comma-separated list of base paths to look for CUDA
libraries and headers. [Leave empty to use the default]:
/usr/local/cuda-11.2.1

This is the point at which everything goes wrong with the Debian-packaged version of the CUDA libraries. The Debian versions can be installed on the system simultaneously during the build, it seems, but they do not work for building TensorFlow. So I gave the path to the NVIDIA-unpackage libraries.

Please specify a list of comma-separated CUDA compute capabilities you
want to build with.

You can find the compute capability of your device at:
https://developer.nvidia.com/cuda-gpus.
Each capability can be specified as "x.y" or "compute_xy" to include
both virtual and binary GPU code, or as "sm_xy" to only include the
binary code.

Please note that each additional compute capability significantly
increases your build time and binary size, and that TensorFlow only
supports compute capabilities >= 3.5 [Default is: 3.5,7.0]:

I set this to 3.5,7.5 based on the information on the webpage referred to. Maybe I don’t need the 3.5 part, but I’m not sure, so I left it in.

Do you want to use clang as CUDA compiler? [y/N]: 

Please specify which gcc should be used by nvcc as the host
compiler. [Default is /usr/bin/gcc]: 

Please specify optimization flags to use during compilation when bazel
option "--config=opt" is specified [Default is -Wno-sign-compare]: 

Would you like to interactively configure ./WORKSPACE for Android
builds? [y/N]:

I left all of these with their default settings.

And the configuration is finished!

Building the pip package

Unfortunately, some of the scripts in the TensorFlow sources call python using the shebang formulation #!/usr/bin/env python, which breaks unless there is a python on PATH; this should presumably be Python 3.x (though I haven’t checked). In Debian 10, python was python2, but in Debian 11, there is no python executable by default (though there is a python-is-python3 package which could be installed, which creates /usr/bin/python as a symlink to /usr/bin/python3). Since I don’t want things to break silently, I didn’t install that package, but instead set up a local symlink for the purpose of this build. (Note that the bazel environment variable PYTHON_BIN_PATH does not help at all.)

$ mkdir ../bin
$ ln -s /usr/bin/python3 ../bin/python
$ PATH=$(realpath ../bin):$PATH

I then ran the bazel command:

$ bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package

This compilation step is pretty long.

Building and installing the pip package

Following the instructions, I ran:

./bazel-bin/tensorflow/tools/pip_package/build_pip_package ..

so that the resulting wheel ended up in the parent directory.

I then installed the wheel; in my case, the command line was:

pip install --user ../tensorflow-2.4.1-cp39-cp39-linux_x86_64.whl

Installing TensorFlow Addons

The instructions for this are on the TensorFlow website here.

I first cloned the repository; again, I started in the directory ~/packages/tensorflow.

$ git clone https://github.com/tensorflow/addons.git
$ cd addons
$ git checkout -t origin/r0.12

The exports, though, are not quite as described. They should instead be the following:

$ export TF_NEED_CUDA=1
$ export CUDA_TOOLKIT_PATH=/usr/local/cuda-11.2.1

(and I have just reported it, so this may well be fixed very soon). The rest ran smoothly:

$ python3 ./configure.py
$ bazel build build_pip_pkg
$ bazel-bin/build_pip_pkg ..
$ pip install ../tensorflow_addons-0.12.2-cp39-cp39-linux_x86_64.whl

And after this, I removed (actually just temporarily renamed) /usr/local/cuda-11.2.1, and everything still works.

What is mathematics, really?

Julian Gilbey — 2019-09-06T09:40:00+01:00

Greg Ashman recently published two provocative posts (the first and the second) in response to Dan Meyer’s post, claiming that “A lot of people don’t seem to understand what mathematics is”. Dan Meyer’s statement:

“Math is only objective, inarguable, and abstract for questions defined so narrowly they’re almost useless to students, teachers, and the world itself.”

formed the starting point of this. Greg’s central thesis is that mathematics uses deductive reasoning, as opposed to other subjects which use inductive reasoning. From this follows the argument that calculating things like p-values is mathematics, whereas evaluating their meaning or usefulness is science. (I encourage you to read the full posts; I have just picked a couple of points out.)

But this seems to be quite a strange argument. Let us consider a metaphor: what is Art? A painting or a sculpture would be examples of Art (though I would not like to begin defining “Art” itself). We might admire them, study them, and so forth, but that would not be “doing Art”. An artist may well do this to gain inspiration, as well as studying art techniques so that they can produce their own novel art.

Mathematics is similar. Let us focus first on pure mathematics.

Doing “pure” mathematics

A beautifully presented piece of deductive reasoning is a piece of mathematics. Mathematicians will study existing such arguments to learn ideas and techniques, but one of their main focuses is to generate novel such arguments, showing that some result is true. This is always a creative process, and sometimes hugely so. The process is messy, exploratory, and full of inductive reasoning. It often takes the form of “I’ve noticed that in all these cases I’ve tried, such-and-such seems to happen. Maybe I can prove that, and that will help me to show that the main result is true.” The deductive argument which results is a product of doing mathematics, just as the painting is the result of the artist’s messy exploration, trial and error, and so on.

And even having a deductive argument is not the end of the story. When presented with a deductive argument, how does one check that it is correct? Perhaps one could do so just by following the given argument and checking every step? Unfortunately, the challenge of justifying any but the simplest theorems using automated theorem provers shows that this is generally far more involved and complicated than it may superficially seem. The skill of seeking counterexamples to arguments, of finding errors in proofs and ambiguities in terminology is a highly creative one, not a logical-deductive one. So even in the seemingly purely logical world of deductive mathematical arguments lies a huge amount of exploration, insight and perhaps even induction.

And those questions which have an objectively correct answer at school level are generally calculations. And even there, we can start asking interesting (and valuable) non-objective questions such as: Is this the best approach? How do these two different methods compare? When would you use one rather than the other? When would we want/need to perform such a calculation? These questions are as important or - given the ability of calculators and computers to do the grunt-work - arguably more important than the calculation themselves.

Another vital aspect of (pure) mathematicians’ work is the generation of conjectures. This is what spurs others on to try to find arguments (proofs) for these conjectures being true, or to show that they are false. At the extreme lie the Millennium Problems, which are very important conjectures which had defied all attempts at solving them. (One has since been solved.) There are also many smaller (but still significant) conjectures, which are some of what continues to drive mathematics. But how did mathematicians come up with them? By doing experiments, and lots of … you guessed it … inductive reasoning.

I am not questioning the central importance of teaching and learning calculation techniques (in the very broadest sense), for it is impossible to do mathematics without them, and I am not arguing for or against any particular pedagogical approach. I likewise strongly agree with Greg that we must teach deductive reasoning in mathematics, because it is so central to the subject and one of the unique qualities of school-level mathematics in comparison to other subjects. (Indeed, I am in the middle of writing a book about this.) But I do claim that just performing calculations, such as calculating p-values or whatever, is not “doing mathematics” - it is simply doing calculations. And likewise, reading other people’s deductive arguments is not “doing mathematics” either (though it is a prerequisite to creating one’s own arguments). “Doing mathematics” must involve some form of exploration, generation of conjectures, and justification (to paraphrase John Mason), and that must take place in the classroom, just as school art lessons are not only about learning specific techniques.

Applied mathematics

So much for pure mathematics being deductive; where does that leave us when we consider applied mathematics? We must first recognise that the boundary between Mathematics and Non-Mathematics is quite fuzzy; a cosmologist or fluid dynamicist might equally find themselves in a Physics Department or an Applied Mathematics department, depending upon the institution. (Likewise, a pure mathematician might end up in a Philosophy Department or Computer Science department if they are studying logic or category theory.)

But returning to the statistical examples with which we began, how do we handle those? I would contend that calculating a p-value is just performing a calculation; without thinking about its meaning or significance, that cannot be considered “doing mathematics”. Is considering the meaning of such calculations actually part of a different discipline, such as Statistics or Physics? That is returning to the question of where one draws the dividing lines, but what seems clear is that we should not be teaching the calculation of something without also teaching the meaning of the calculation. Whether this takes place in a maths lesson or a physics lesson or a statistics lesson is immaterial.

So how does applied mathematics fits into the mathematical family, and should we consider statistics as part of mathematics? That itself is a long discussion, and I am sure that disagreements abound. So a brief thought will have to suffice here: applied mathematicians use their mathematical reasoning powers to apply mathematical tools to interesting and novel problems in the “real world”. And that is, in some sense, very similar to pure mathematicians using their mathematical reasoning powers to apply mathematical tools to interesting and novel problems in the “mathematical world”. So perhaps they are not so different, after all.

Finally, a key educational question

Perhaps the argument between Dan Meyer and Greg Ashman comes down to the question of what we want to teach in our classrooms. Do we want to just teach pure calculation? Or do we want to teach the creativity and messiness of mathematics, alongside the deductive aspects?

Some students are very well-served by having a subject in which they can be objectively right or wrong: it gives them a stability and safety which is lacking in many other subjects. But for other students, who could end up becoming excellent and creative mathematicians, this objectivity is stifling and off-putting, and they will never see mathematics for what it truly is. In the world today, we don’t need people who can just perform calculations: computers are so much better than humans at that. We rather need people who can think and ask good questions, and as a small part of that, who can decide what calculations need to be performed.

The debate about how we educate our students is hugely important. There is no “right” answer. It is, instead, a question of what we believe a mathematics education is aiming to achieve, and there it seems that Dan and Greg fundamentally disagree.

Increasing functions and functions increasing

Julian Gilbey — 2018-10-07T20:00:00+01:00

Here’s the graph of $y=-\dfrac{1}{x}$ for $x\ne0$.

Where is this function increasing? Is it an increasing function?

Looking at various recent examination papers, it has become clear to me that there is significant confusion between these two questions. This post is intended to bring some clarity to the situation.

At the start of this post, I will give an example of the confusion as it appears in exam questions (and probably elsewhere), and clarify what the two different phrases mean using the above example. I will then delve more deeply into the mathematics of these two things, going beyond A-level content, and use some undergraduate analysis to find equivalent conditions for them in terms of the derivatives of the functions. It is fine to skip over the technical stuff and just look at the results (theorems)!

(Exactly the same applies to the use of the term “decreasing”, but for simplicity we will focus on increasing functions in this post.)

Here is an example of a question (based on a real exam question) which typifies the confusion.

The equation of a curve is $y=x^3+4x^2-5x$.

Find the set of values of $x$ for which $y$ is an increasing function of $x$.

If we replace “increasing function” by another familiar A-level term describing functions, “one-to-one function”, the question becomes:

A function is given by $f(x)=x^3+4x^2-5x$.

Find the set of values of $x$ for which $f(x)$ is a one-to-one function of $x$.

This is clearly nonsensical, because whether a function is one-to-one or not is a property of the function as a whole, not a property of the function values at any particular input value.

Likewise, a function either is or is not an increasing function; it is a property of the function as a whole.

Informally (and not quite correctly), we can describe the difference as follows:

A function is an increasing function if larger input values give larger output values.
A function is increasing at a point if at that point, the function has a positive gradient.

An example which shows that these are not the same is the function $f(x)=-\dfrac{1}{x}$ for $x\ne0$ shown above. This function is increasing at every value of $x\ne0$, as the gradient is always positive. However, it is not an increasing function, because $f(1)0$, then it would be an increasing function.

So the above-quoted exam question does not make any sense, just as the modified version did not: either $y$ is an increasing function of $x$ or it is not. If the question had instead asked “Find the set of values of $x$ at which $y$ is increasing,” it would have been fine.

Incidentally, the idea of increasing and decreasing functions connects very well with the issue of rearranging inequalities (increasing the depth of connections within the subject): a function can be applied to both sides of an inequality without changing the direction of the inequality if the function is (strictly) increasing; it can be applied but with a change in the direction of the inequality if the function is (strictly) decreasing, and if the function is neither, then the function cannot be applied to the inequality. So we cannot square both sides of an inequality unless we are restricted to non-negative values, and we cannot take the reciprocal of an inequality unless we have the same restriction (and in that case, we must also reverse the direction of the inequality).

It seems reasonable to assert that if a function is an increasing function, then it will be increasing at every point. There turns out to be some subtlety to this, as we now delve into a little more deeply.

A formal definition of increasing

We can give a formal definition of an increasing function. For example, this definition is from Apostol, Mathematical Analysis, 2nd ed, p94, and identical definitions appear on the internet:

Definition 1: Let $f$ be a real-valued function whose domain is a subset $S$ of $\mathbb{R}$. Then $f$ is said to be an increasing (or nondecreasing) function if for every pair of points $x$ and $y$ in $S$, $xstrictly increasing function. (Decreasing functions are similarly defined.)

Note the distinction between increasing and strictly increasing here: a constant function such as $f(x)=0$ for $x\in\mathbb{R}$ is both an increasing and decreasing function, though it is not a strictly increasing function.

We could also try to come up with a definition of increasing at a point. There are no standard definitions of this idea, and the following proposed definition is certainly beyond A-level in its formality. It is based on the definition of continuity, which is about the behaviour of a function “near” to a point.

Definition 2: Let $f$ be a real-valued function whose domain is a subset $S$ of $\mathbb{R}$. Then $f$ is said to be increasing at the point $x$ in $S$ if there is some $\delta>0$ such that:

for every $y$ in $S$ with $x
If the $\le$ signs are replaced by $<$ signs in these two inequalities, then $f$ is said to be strictly increasing at $x$.

With this definition, the above exam question (reworded) makes sense, and the correct final answer is what the examiner would expect. (One might wonder whether one could make such a local definition of one-to-one, and indeed, this is done when considering the Inverse Function and Implicit Function theorems. But that is a story for another day.)

Using calculus

So far, no calculus has appeared, yet we typically teach our students to determine whether a function is an increasing function or to find where it is increasing by differentiating the function. So let us now consider how we could use calculus to help us.

For us to be able to use calculus, we need to assume that our function is differentiable throughout $S$. We could then propose the following theorem:

Theorem 1 (incorrect attempt): Let $f$ be a real-valued continuous function whose domain is a subset $S$ of $\mathbb{R}$ and is differentiable at every (interior) point of $S$. Then $f$ is an increasing function if and only if $f’(x)>0$ for all $x$ in (the interior of) $S$.

(The use of “interior” is to avoid certain technical complications.)

Unfortunately this fails immediately: the constant function $f(x)=0$ for $x\in\mathbb{R}$ is increasing, yet $f’(x)=0$.

We could try changing this to say that $f$ is a strictly increasing function, but that fails if the function has a point of inflection. For example, $f(x)=x^3$ is a strictly increasing function, even though its derivative is zero at $x=0$.

We could also try changing the condition to say that $f’(x)\ge0$ for all $x$ in $S$. However, this also fails: if the graph has a discontinuity, such as the function $f(x)=-\dfrac{1}{x}$ for $x\ne0$ that we looked at before, then it might have $f’(x)>0$ for all $x$ in $S$, yet not be an increasing function.

This feels more hopeful, though: after all, the only problem now is the “hole” in the domain $S$. And it turns out that if we restrict the domain to be an interval (that is, a subset of the reals with no “holes”), then it will work:

Theorem 1 (correct version): Let $f$ be a real-valued continuous function whose domain is an interval $I$ of $\mathbb{R}$ and is differentiable at every point in (the interior of) $I$. Then $f$ is an increasing function if and only if $f’(x)\ge 0$ for all $x$ in (the interior of) $I$.

The formal proof of this is found below, and though it is quite technical, the theorem itself seems clearly true, and school students could probably be convinced to believe it (at least once it is written in more student-friendly language).

What can we say, though, about whether a (differentiable) function is increasing at a point? Using Definition 2 above, we get the corresponding theorem:

Theorem 2: Let $f$ be a real-valued continuous function whose domain is an interval $I$ of $\mathbb{R}$ and which is differentiable at every (interior) point of $I$. Then is $f$ is increasing at the point $x$ in $I$ if and only if there is some $\delta>0$ for which $f’(y)\ge0$ for all $y$ in (the interior of) $I$ with $x-\delta

Why is it not sufficient to just require $f’(x)\ge0$? Well, consider the functions $f(x)=x^3$ and $f(x)=-x^3$. They both have $f’(x)=0$, yet the first is increasing (indeed, even strictly increasing) at $x=0$, while the second is decreasing at $x=0$. And a function such as $f(x)=x^2$ is neither increasing nor decreasing at $x=0$. So we really do need to consider a small interval around the point of interest.

(Theorem 2 could be extended, with care, to more general subsets of $\mathbb{R}$, as we are only discussing a local property of the function. But it is not particularly interesting to do so.)

So the question of determining at which points a function is increasing (or decreasing) is more subtle than it appears: not only does one have to find where the function has derivative $\le0$ (and not just $<0$), but one also has to determine what is happening at those points where the derivative is zero, as there are different types of stationary points. (At those points where the derivative is strictly positive, the function is certainly strictly increasing, which follows from Theorem 4 below.)

Things get more complicated if we now wish to consider strictly increasing (or decreasing) functions. There is a relatively weak theorem which will suffice much of the time:

Theorem 3: Let $f$ be a continuous real-valued function whose domain is an interval $I$ of $\mathbb{R}$ and which is differentiable at every (interior) point of $I$. Then if $f’(x)>0$ throughout $I$, $f$ is a strictly increasing function.

Note that this is a one-directional theorem; $f(x)=x^3$ for $x\in\mathbb{R}$ is our standard example of a strictly increasing function which does not have $f’(x)>0$ throughout the domain because of the point of inflection at the origin. The proof of Theorem 3 follows exactly as that of Theorem 1.

An easy corollary of this is the following (local) theorem:

Theorem 4: Let $f$ be a continuous real-valued function whose domain is a subset $S$ of $\mathbb{R}$. If $f$ is differentiable at the point $x$ in the interior of $S$ and $f’(x)>0$, then $f$ is strictly increasing at $x$.

This is the theorem which is typically used when answering A-level exam questions such as the one above. Unfortunately, as we see from our example of $f(x)=x^3$, this too is a one-directional theorem: every point at which $f’(x)>0$ is a point at which the function is strictly increasing, but there may be other points where this is the case but where $f’(x)=0$. (If $f’(x)<0$, then the function is strictly decreasing at this point, so it cannot be increasing.) The question of using calculus to determine where a function is increasing, rather than strictly increasing, is somewhat more complicated, as we see from Theorem 2 above. But at A-level, the functions are always nice enough that the only difficulties will be at the stationary points.

There is actually a necessary and sufficient condition for a function to be strictly increasing, but this is more subtle:

Theorem 5: Let $f$ be a continuous real-valued function whose domain is an interval $I$ of $\mathbb{R}$ and which is differentiable at every interior point of $I$. Then $f$ is strictly increasing on $I$ if and only if $f’(x)\ge0$ throughout $I$ and there is no non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the interior of $J$.

The proof can be found below.

Teaching this topic at A-level

Putting this all together, we see that Theorem 4 is the crucial theorem for school use. Teaching the meaning of the term “increasing function” (Definition 1) and a simplified explanation of “increasing at a point” (Definition 2), along with Theorem 4 should give a good grounding. It would also be wise to caution that it is a one-way theorem by comparing and contrasting examples such as $f(x)=x^2$ and $f(x)=x^3$.

Proofs of Theorems 1 and 5

This technical appendix uses tools from undergraduate analysis. The proofs of the other three theorems are very similar to these or they follow immediately from these.

Theorem 1

Let $f$ be a real-valued continuous function whose domain is an interval $I$ of $\mathbb{R}$ and is differentiable at every point in the interior of $I$. Then $f$ is an increasing function if and only if $f’(x)\ge 0$ for all $x$ in the interior of $I$.

Proof

We show first that if $f$ is an increasing function, then $f’(x)\ge0$ for all $x$ in the interior of $I$, and we argue by contradiction. Assume that $f’(x_0)<0$ for some $x_0$ in the interior of $I$. Using the definition of derivative, this means that $\lim\limits_{\substack{x\to x_0\\ x\in I}}\dfrac{f(x)-f(x_0)}{x-x_0}<0$. So there is some $x_1\in I$ (where $x_1\ne x_0$) with $\dfrac{f(x)-f(x_0)}{x-x_0}<0$ (otherwise the limit would be $\ge0$). If $x_1>x_0$, then multiplying by $x_1-x_0$ gives $f(x_1)-f(x_0)<0$, so $f(x_1)0$, so $f(x_1)>f(x_0)$. Either way, this shows that the function is not increasing on $I$, and we have our desired contradition. Thus if $f$ is an increasing function, we must have $f’(x)\ge0$ for all $x$ in the interior of $I$.

Conversely, if $f’(x)\ge0$ for all $x$ in the interior of $I$, then let $x0$, it follows that $f(y)-f(x)\ge0$, so $f(x)\le f(y)$. Therefore $f$ is an increasing function.

Theorem 5

Let $f$ be a continuous real-valued function whose domain is an interval $I$ of $\mathbb{R}$ and which is differentiable at every interior point of $I$. Then $f$ is strictly increasing on $I$ if and only if $f’(x)\ge0$ throughout $I$ and there is no non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the interior of $J$.

Proof

We first prove that if the derivative condition is not met, then $f$ is not strictly increasing on $I$. If $f’(x)<0$ at any point in $I$, then $f$ is not increasing (by Theorem 1), so it is certainly not strictly increasing. If $f’(x)\ge0$ throughout $I$ but there is a non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the interior of $J$, then $f$ is constant throughout $J$ (by the mean-value theorem). In particular, there are $y

Conversely, if $f’(x)\ge0$ throughout $I$, then $f$ is increasing by Theorem 1. Assume now that there is no non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the interior of $J$. But if $f$ were not strictly increasing, then there would be $yf(z)$, contradicting $f$ increasing.) Therefore $f’(x)=0$ throughout this interval, contradicting our assumption. So $f$ must be strictly increasing.

A visit to Michaela

Julian Gilbey — 2018-07-20T15:00:00+01:00

Having recently listened to about 5.5 hours of Craig Barton interviewing Dani Quinn (part 1 and part 2), the Head of Mathematics at Michaela Community School, I decided that it was worth visiting the school to see their principles in action for myself, so last week, I took to the buses to visit Wembley.

Though my main interest was the maths teaching, I was fascinated by the whole experience, so that is what I will focus most of my attention on here. I used to teach in a school (“W”) with a broadly similar type of intake: it was in an area with many students from ethnic minorities and many students on free school meals; that school was also in an area in which there was a grammar school system, so many of the highest-attaining students in the catchment area attended the more selective local schools. This gave me an interesting basis for comparison.

The most obvious thing which struck me was the atmosphere that Katharine and her staff have established in the school. It was very purposeful, and the students I met generally seemed happy and to like the school. They were polite to me, and some were genuinely interested in talking to me. (Or at least they gave the convincing impression that they were!) Some students were immensely proud of what they were doing and showed off their work to me (without my even asking).

Many have written about the very strictly enforced behaviour policies. But what I had not expected was the huge warmth pouring forth from the staff to the students in their lessons, and the humanity pervading the school. Whilst demerits were regularly given for infringements of the school’s very strict behaviour policies - generally accompanied by just a few seconds’ calm explanation of the positive benefits of doing what was expected or the negative impact the behaviour was having on others - merits were given even more liberally (and fairly consistently between lessons) for behaviours the school wants to encourage, such as good vocal projection when answering a question, asking good questions and giving good explanations. And these were always accompanied by brief warm words. This contrasts so dramatically with my experience at “W”, where though some teachers managed their classes well, there wasn’t anything close to a consistent school-wide system at this level of detail. There is clearly a benefit to be gained from having such an consistently enforced system throughout the school, though it is tough for teachers. (Mind you, it is not as tough as teaching in a school where students throw things at teachers on a semi-regular basis.)

The most challenging class I saw was a small bottom-set year 10 class, several of whom had already been permanently excluded from one or more other schools. Yet there they were, behaving and mostly participating in the lesson, learning and targeting a grade 4 or 5 at GCSE Mathematics. Wow. At “W”, lower-middle sets were only targeting a grade D (on the old system, the equivalent of a grade 3 on the new system), and most of them did not achieve even that. The contrast could not be greater.

A few things struck me immediately during the day, without even entering a classroom. The first was the immaculate state of the building: not a speck of litter to be seen during the course of the day. This is in stark contrast to most of the schools I’ve taught in and visited over the years, and vastly different from “W”. The students have clearly been taught to respect their environment.

Is the school’s approach a good thing? This is a difficult question for me to answer. I certainly had a sense that the school was infusing students with British culture (whatever that means), and yet for students living in the UK, is this not a good thing? It will give them significant (British) cultural capital on which they will be able to draw in later life, and which they might well otherwise not gain.

On the other hand, students are constantly being watched, as are staff: for example, visitors (with DBS certificates) are permitted to just walk into any lesson, yet the teachers and students generally didn’t bat an eyelid when I quietly walked in. Yet this significantly reduces the chances of bullying and destructive behaviour: there are no “safe spaces” within the school for bullying or other damaging behaviour to take place without a teacher seeing.

I observed parts of about eight maths lessons during the day (as well as a smattering of other subjects). My concern was that they would be very procedural in nature, given the rigidity of the school system. However, I was pleasantly surprised: while they were clearly teacher-led, the questioning did include a good mix of knowledge and deeper understanding questions. For example, in a lesson on Pythagoras’s Theorem, there were early questions designed to ensure that the students knew which side was the hypotenuse, and later questions which required more thought, such as “If I have a triangle with side lengths 6, 7, 8, can I draw a right-angle here?” (I am not overly concerned with Year 8 students not clearly distinguishing between Pythagoras’s Theorem and its converse. Students were spending enough effort getting to grips with what the question meant.) Some time was spent working on questions from their workbooks, but this was far from the majority of the time.

During one of the lessons, students were asked to read out from their workbooks. I was surprised - though I probably should not have been - at how difficult they found it to read technical vocabulary; how often do we ask our students to read a piece of technical material?

There were also opportunities for discussion in pairs; these were short and effective, and the students were continually encouraged to use the time productively, as anyone could be picked on to answer a question after the discussion time.

Dani spoke much more about the planning process and lesson structure at Michaela in her podcast, which was fascinating, so I won’t say more about it here.

Returning home and reflecting, I have two big questions about this model, and in particular with regard to maths. The first (somewhat mathematics-specific) question is whether students get enough opportunity to think about any (mathematics) problem for a protracted period of time. Hearing about other countries’ approaches, I wonder whether this is a potentially missed opportunity, especially once behaviour is so well-managed that there is a good learning atmosphere.

The second, more pervasive question, is about the use of streaming within the school. (“Streaming” means that students are put in groups which are dependent upon their academic performanace across a number of subjects. They remain in these groups for all of their subjects. As far as I could tell, it is used in Years 7-9 and possibly in Year 10 as well.) It is very effective for behaviour management, as the entire class is together the whole time, including at lesson changeover time. However, I am very unconvinced that it is good for equity, which is part of the school’s mission. Hearing of the experiences of primary and secondary schools which have moved away from ability (or better: attainment) grouping to mixed-attainment grouping, one has to ask whether this would be better for the majority of the students within the school, certainly at Key Stage 3 (11-14 year olds) and possibly older too. Teachers’ academic expectations of students are lower when they are teaching lower-attaining groups, and I strongly doubt that Michaela’s excellent teachers are any less affected by this.

And would I consider teaching at Michaela? I’m not sure it would be the “right” school for me, but I would take it over “W” any day.

Finally, the “family lunch”. The initial poetry reading was like being at a summer youth camp: the energy, enthusiasm and fun were palpable. There was quite a buzz in the room during this! And the discussion over lunch - this time about volunteering, in light of the outstanding work of volunteers in the Thailand cave rescue - was fascinating.

It was a pleasure to visit the school, and I thank the staff for being so open and welcoming. I look forward to hearing of their results, both academic and beyond, in the years to come.

Small angle approximations - an application

Julian Gilbey — 2018-07-08T22:00:00+01:00

I thought a bit more about my previous post on small angle approximations, and decided it might be helpful to describe an application of the small angle approximations. While this example contains non-examinable aspects (at least in single maths A-level), the context should be fairly familiar (or can easily be demonstrated), and the mathematics is accessible to single maths students (at least as a demonstration). It also ties together ideas from mechanics and pure maths, so is helpful in this regard.

The question is: what is the period of a pendulum?

We can model the pendulum as a thin rod (inextensible and rigid) of length $L$, freely pivoted about a point $O$, with a single point mass $P$ of mass $m$ on the end of the rod, as shown here (where $T$ is the tension in the rod):

The velocity and acceleration of $P$ are as follows, where $\dot\theta$ means $\dfrac{d\theta}{dt}$ and $\ddot\theta$ means $\dfrac{d^2\theta}{dt^2}$; a derivation of these can be found at the end:

We can now apply Newton’s second law (“$F=ma$”) to the situation: working perpendicular to the rod, this gives $-mg\sin\theta=mL\ddot\theta$ (the minus sign is because the component of the force $mg$ is in the opposite direction to the $L\ddot\theta$ on our diagram). Rearranging this, we get the differential equation:

\[\dfrac{d^2\theta}{dt^2}=-\dfrac{g}{L}\sin\theta.\]

Unfortunately, this equation is impossible to solve in terms of simple functions. But if we assume that the swing of the pendulum is small, so that $\theta$ is small, then we can approximate $\sin\theta$ by $\theta$, and our differential equation becomes

\[\dfrac{d^2\theta}{dt^2}=-\dfrac{g}{L}\theta.\]

This differential equation (an example of simple harmonic motion) has a solution

\[\theta=A \sin\left(\sqrt{\dfrac{g}{L}}\,t\right)\]

(which is easy to check), where $A$ is the amplitude (maximum angle) of the swing. The period of this swing is $2\pi\sqrt{\dfrac{L}{g}}$, which is independent of the amplitude and the mass at the end of the rod! So as long as the swing is relatively small, the period is only dependent upon the length of the pendulum (and the acceleration due to gravity), which is likely to be a surprising result the first time it is met. This would have had great significance for clock-makers in times gone by.

Deriving the formulae for the velocity and acceleration of $P$

We can work out the velocity and acceleration of $P$ in several different ways. One way is to use coordinates, where $O$ is the origin, and the vertical line is the $y$-axis. Then when $P$ is at an angle of $\theta$, it has a position vector of

\[\mathbf{r}=\begin{pmatrix} L\sin\theta \\ -L\cos\theta\end{pmatrix}.\]

A unit vector in the direction of $\overrightarrow{OP}$ is

\[\mathbf{e}_r=\begin{pmatrix}\sin\theta\\ -\cos\theta\end{pmatrix},\]

and a unit vector perpendicular to this in the direction of increasing $\theta$ is

\[\mathbf{e}_\theta=\begin{pmatrix}\cos\theta \\ \sin\theta\end{pmatrix},\]

as shown in this diagram:

The velocity of $P$ can be found by differentiating $\mathbf{r}$ with respect to time, giving:

\[\dot{\mathbf{r}}=\begin{pmatrix} L\cos\theta.\dot\theta \\ L\sin\theta.\dot\theta\end{pmatrix} = L\dot\theta\mathbf{e}_\theta\]

Then the acceleration can be found by differentiating again (using the product rule on both of the components of $\dot{\mathbf{r}}$) to obtain:

\[\ddot{\mathbf{r}}=\begin{pmatrix} -L\sin\theta.\dot\theta^2 + L\cos\theta.\ddot\theta \\ L\cos\theta.\dot\theta^2 + L\sin\theta. \ddot\theta \end{pmatrix} = -L\dot\theta^2 \mathbf{e}_r + L\ddot\theta \mathbf{e}_\theta.\]

These are the components of the velocity and acceleration shown above.

Without as much rigour, one could observe that the distance of $P$ along the circumference of the circle is given by $L\theta$, so it is reasonable to suggest that its speed is $L\dot\theta$ (as $L$ is a constant). Then the acceleration in this direction is plausibly $L\ddot\theta$, while the radial acceleration - which we are not interested in for this application - is a result of the velocity changing direction.

Small angle approximations

Julian Gilbey — 2018-07-05T22:10:00+01:00

At a conference run by the BBO Maths Hub today, Jo Morgan mentioned that small angle approximations are a topic recently (re)introduced to the single maths A-level course, and many teachers may be unfamiliar with it.

During the day and on my journey home, I thought about this and some of the connections between it and other areas of the syllabus. So here are a few quick thoughts on ways we could think about them, making connections between this and other areas of the syllabus. I hope that this post offers some different perspectives on the topic.

This is a diagram probably familiar from most A-level textbooks (I don’t have one to hand, unfortunately). We have our familiar unit circle, and draw a right-angled triangle with angle $\theta$, opposite $\sin\theta$ and adjacent $\cos\theta$. We also see that the arc length subtended by the angle $\theta$ is $r\theta=\theta$ as the radius is 1. (We must be working in radians for this to be correct!) Already in this diagram, $\sin\theta$ and $\theta$ do not look very different, so $\sin\theta\approx\theta$. On the other hand, $\cos\theta$ looks pretty close to $1$, so we have $\cos\theta\approx1$. Visually, say using GeoGebra, we see that these approximations get better as $\theta$ gets smaller: the arc and the half-chord become closer and closer to each other. We can then work out $\tan\theta=\dfrac{\sin\theta}{\cos\theta}\approx {\theta}{1}=\theta$.

Another way of seeing this approximation to $\tan\theta$ is to draw the triangle with adjacent equal to $1$:

If we take $\sin\theta\approx\theta$, then we can work out a better approximation for $\cos\theta$ using the binomial theorem. We have, for small $\theta$ (positive or negative):

\[\begin{align*} \cos\theta &= \sqrt{1-\sin^2\theta} \\ &\approx \sqrt{1-\theta^2} \\ &= (1-\theta^2)^{\frac{1}{2}} \\ &= 1 - \tfrac{1}{2}\theta^2 + \cdots \end{align*}\]

where we have used the first two terms of the binomial expansion on the last line. So $\cos\theta\approx 1-\frac{1}{2}\theta^2$.

Another way of obtaining the approximation for $\cos\theta$ is to relate cos and sin using a double-angle formula:

\[\cos 2\theta = 1 - 2\sin^2\theta\]

\[\begin{align*} \cos\theta &= 1 - 2 \sin^2 \tfrac{1}{2}\theta \\ &\approx 1 - 2 \bigl(\tfrac{1}{2}\theta\bigr)^2 \\ &= 1 - \tfrac{1}{2}\theta^2 \end{align*}\]

where we have used $\sin\tfrac{1}{2}\theta\approx\tfrac{1}{2}\theta$ on the second line.

The approximations for $\sin\theta$ and $\tan\theta$ are also closely related to the shape of their graphs near the origin (though there is potentially some circular reasoning here - no pun intended!):

We have drawn the graphs of $y=x$ (red), $y=\sin x$ (green) and $y=\tan x$ (blue). Near the origin, the three graphs look very similar, so for small $x$, $\sin x\approx x \approx \tan x$.

This also tells us that at the origin, $\frac{d}{dx}(\sin x)$ and $\frac{d}{dx}(\tan x)$ equal $1$.

We can also argue in the opposite direction. If we have already convinced ourselves why the derivative of $\sin x$ is $\cos x$ using a different approach (for example, by using Rotating derivatives), then we can say that for small values of $x$, the graph of $y=\sin x$ is approximated by the tangent to the graph at $x=0$ (see A tangent is… for more on this point). We can calculate the tangent: since $\frac{d}{dx}(\sin x)=\cos x$ giving $\cos 0 = 1$, and $\sin 0 = 0$, the tangent has equation $y=x$. So for small $x$, $\sin x\approx x$.

Dividing fractions

Julian Gilbey — 2018-06-20T08:05:00+01:00

Why is it that

\[\frac{3}{5}\div\frac{2}{3} = \frac{3}{5}\times\frac{3}{2},\]

or as the rule that students are frequently taught: “turn the second fraction upside-down and multiply”?

I’ve been inspired to revisit this question after listening to Ed Southall talking on Mr Barton’s Maths Podcast, where he mentioned this question.

In this post I suggest a teaching sequence which might lead to an understanding of the rule above, as well as a procedural knowledge of how to perform the rule.

Some comments on a familiar approach

I have seen textbooks and websites explain the rule for division of fractions by talking about how many times we can fit $\frac{1}{3}$ into $\frac{4}{5}$, say, but that seems to me to be quite challenging: students have to hold on to several ideas at once, and make sense of diagrammatic representations at the same time as trying to think about what division means. It also becomes very hard as the fractions become more complicated. In my experience, few students develop a solid understanding through this approach: they either get lost in the reasoning or they resort to following a rule.

This problem ties in quite neatly with some things I have recently read, in particular:

James Tanton’s post The Unreasonableness of K-12 Mathematics, in which he gives an idealised description of the development of the concept of number.
Liping Ma’s book “Knowing and teaching elementary mathematics”, in which US and Chinese teachers’ understanding of this rule is compared.
John Mighton, the founder of JUMP Math, wrote The end of ignorance; he observes there that meaningful symbolic manipulation can precede both an attempt to explain an idea or technique in everyday terms, and the development of understanding; moreover, understanding can emerge from the manipulations if examples are well-chosen and students are given the opportunity to reflect.

An overview of the idea

The calculation $8-5$ means “what number $\square$ makes $\square+5=8$ true?” Similarly, when we write $12\div 3$, we mean “what number $\square$ makes $\square\times3=12$ true?” This says that division is the inverse of multiplication. (More precisely, for each non-zero number $c$, dividing by $c$ is the inverse of multiplying by $c$.) The same applies to division of fractions: $\frac{3}{5}\div\frac{2}{3}$ means “what number $\square$ makes $\square\times\frac{2}{3}=\frac{3}{5}$ true?”

Once we notice that $\frac{3}{2}\times\frac{2}{3}=1$, we can then multiply both sides of this equation by $\frac{3}{5}$ to obtain

\[\frac{3}{5}\times\frac{3}{2}\times\frac{2}{3}=\frac{3}{5}.\]

Therefore $\square$ must be $\frac{3}{5}\times\frac{3}{2}$, or

\[\frac{3}{5}\div\frac{2}{3} = \frac{3}{5}\times\frac{3}{2}.\]

This method will work for any fraction division question, and so these steps give us our familiar rule: “turn the divisor upside-down and multiply”.

A possible teaching sequence

What follows is a suggestion for how these ideas could be introduced over a sequence of lessons, which could span several months or even years. This offers students the chance to revisit the ideas again and again, thereby reinforcing them, as well as building up stronger connections and a deeper understanding. In the later steps, I assume that students can multiply fractions.

All of the questions below are available in this Word document.

Step 1: What is subtraction?

We begin by asking students what other number statements they can deduce from $3+5=8$. There are many possible answers (such as $30+50=80$), and here we highlight those obtained by rearranging the numbers. (These could be encouraged by a question such as “Using only the numbers 3, 5 and 8, what other number statements can you get from $3+5=8$?”) Three key statements are:

\[8-3=5; \qquad 8-5=3; \qquad 5+3=8\]

as well as the same statements written the other way round, such as $5=8-3$; we won’t mention these reversed statements again here.

The last of these three statements says that addition is commutative: the order of adding does not matter. The other two say that subtraction is the inverse of addition: the three problems

\[5+\square=8,\qquad \square+5=8 \qquad \text{and} \qquad 8-5=\square\]

are equivalent, as are similiar problems about $8-3=\square$. Making this connection explicit would be beneficial, especially in relation to the later parts of this sequence of steps.

Students could then be asked to write statements equivalent to statements such as $10-3=\square$ to reinforce this idea.

This idea may well have already been introduced via a bar model approach or using Cuisenaire rods or suchlike.

It is useful to recognise that it doesn’t matter whether we are working with whole numbers, directed numbers, fractions or whatever: subtraction always has this meaning, so returning to this idea periodically will benefit students’ understanding.

Step 2: And what is division?

This is the parallel of Step 1 for multiplication and division. What can be deduced from $3\times4=12$? This again leads to interesting points such as why $30\times40=120$ is an incorrect statement, whereas $30+50=80$ is correct. But for our current purposes, the key deductions are again those obtained by rearrangement:

\[12\div4=3; \qquad 12\div3=4; \qquad 4\times3=12.\]

As before, we see that multiplication is commutative and that division is the inverse of multiplication. In particular, this means that answering the question $12\div4=\square$ is the same as filling in the missing number in $4\times\square=12$; asking students to make deductions from $12\div4=\square$, as above, will reinforce this idea.

Step 3: 1 divided by a unit fraction

A key part of this approach is to learn about reciprocals of fractions. We start with the reciprocals of unit fractions.

For this missing-number problem, I would suggest asking students to work on this themselves rather than showing them how to do the first one. (I am assuming that they already know enough about fractions to work out the answers to these questions.)

\[\begin{align*} \frac{1}{2}\times \square &= 1 \\ \frac{1}{3}\times \square &= 1 \\ \frac{1}{4}\times \square &= 1 \end{align*}\]

Students should spot the pattern. Following this by asking questions such as $\frac{1}{82}\times\square = 1$ can help them to realise that they can now do some very complicated-sounding questions, even if they can’t imagine what $\frac{1}{82}$ of a cake might look like. (I was reminded of this approach by John Mighton’s book.)

Students should then connect this back to the earlier steps, by asking them to rearrange $\frac{1}{2}\times2=1$. This will allow students to (re)discover that $1\div 2=\frac{1}{2}$ (and similarly for the other statements); this can also be used to reinforce the idea that a fraction such as $\frac{1}{2}$ just means “1 divided by 2”. (The division symbol itself suggests this: $\div$ is just a fraction with dots in place of actual numbers.) Another way of rearranging the number statement gives $1\div\frac{1}{2}=2$, which could be related to the “practical” meaning of division: there are 2 halves in a whole.

Step 4: Turning a general fraction into an integer

It might be too big of a jump for some students to go straight to finding the reciprocal of a general fraction, so this step provides a structured intermediate step, once they are developing some confidence with the above idea.

Here is a second sequence of missing-number problems:

\[\begin{align*} \frac{2}{3}\times \square &= 2 \\ \frac{2}{5}\times \square &= 2 \\ \frac{3}{5}\times \square &= 3 \end{align*}\]

Once students have worked out answers to these (and perhaps adding a few more similar examples), either ask them to generalise by making up their own similar examples, or ask superficially harder questions such as $\frac{74}{133} \times \square = 74$, so that the structure becomes clear.

Asking students to rearrange these statements once again results in statements like $2\div3 = \frac{2}{3}$ (further reinforcing the division idea) and $2\div \frac{2}{3} = 3$.

Step 5: Finding reciprocals

A useful preparatory question before this step would be something like: “If you know that $96\times 48=4608$, then what is the missing number in $96\times \square = 2304$?” This recalls the idea that we can divide the product by 2 by dividing the multiplicand (or multiplier) by 2. (The use of two-digit numbers is designed to discourage students from doing a division!)

In this step, we replace the integers on the right-hand sides of the previous set of questions with 1:

\[\begin{align*} \frac{2}{3}\times \square &= 1 \\ \frac{2}{5}\times \square &= 1 \\ \frac{3}{5}\times \square &= 1 \end{align*}\]

If students cannot work out how to answer the first question, it would be helpful to remind them of their answer to $\frac{2}{3}\times \square = 2$. Tying this to the preparatory question above should help them get to the answer.

Again, students can be invited to generalise at this point, or to answer a question like the one in the previous step: $\frac{74}{133} \times \square = 1$. Also, it is helpful to then rearrange these results; we have $1\div \frac{2}{3} = \frac{3}{2}$, and we are seeing the first clear case of turning fractions upside-down.

After these, it could be interesting to also revisit unit fractions: following the same pattern that we have seen, how else could the answer to $\frac{1}{3}\times \square = 1$ be written, besides as $3$?

Step 6: Dividing fractions

Before working on the full-blown division of fractions, it would be useful to preface it by another relevant rearranging activity: how can the number statement $2\times 3\times 4=24$ be rearranged, while keeping all of the numbers involved the same? This gives rise to a number of statements, such as:

\[24\div 4 = 2\times 3; \qquad \frac{24}{2\times 3}=4; \qquad 4\times 2\times 3 = 24.\]

This may cause some difficulty and lead to some interesting class discussions.

And now we can build on the ideas developed in Step 5. How could we complete the following statements?

\[\begin{align*} \frac{2}{3}\times \frac{3}{2} \times \square &= \frac{4}{5} \\ \frac{2}{5}\times \frac{5}{2} \times \square &= \frac{1}{3} \\ \frac{3}{5}\times \frac{5}{3} \times \square &= \frac{7}{2} \\ \frac{1}{3}\times \frac{3}{1} \times \square &= \frac{3}{4} \\ 2\times \frac{1}{2} \times \square &= \frac{1}{5} \end{align*}\]

A prompting question, if needed, is “What is $\frac{2}{3}\times \frac{3}{2}$?”

And then what about these, where the two squares should be filled in following the pattern we have just seen?

\[\begin{align*} \frac{3}{4}\times \square \times \square &= \frac{2}{5} \\ \frac{2}{7}\times \square \times \square &= \frac{4}{3} \\ \frac{1}{5}\times \square \times \square &= \frac{3}{2} \\ 4\times \square \times \square &= \frac{3}{2} \end{align*}\]

Once students feel competent at these, ask how they can use these to work out:

\[\begin{align*} \frac{2}{5} &\div \frac{3}{4} \\ \frac{4}{3} &\div \frac{2}{7} \\ \frac{3}{2} &\div \frac{1}{5} \\ \frac{3}{2} &\div 4 \end{align*}\]

And with this, students have reached a point where the rule for dividing by a fraction will make some sense: we multiply the reciprocal of the divisor (so as to get 1 when it is multiplied by the divisor itself) by the dividend, which is our well-known rule.

Julian's Musings

The PyTorch add_module() function

Installing PyTorch on Debian bullseye (Debian 11)

System setup

Building PyTorch

Downloading the PyTorch source code

Pre-build

Building the package

Building torchvision and torchaudio

Installing dependencies

Downloading and building

Installing TensorFlow and TensorFlow Addons on Debian bullseye (Debian 11)

System setup

Building TensorFlow

Installing Bazel

Build directory setup

Downloading the TensorFlow source code

Configuring the build

Building the pip package

Building and installing the pip package

Installing TensorFlow Addons

What is mathematics, really?

Doing “pure” mathematics

Applied mathematics

Finally, a key educational question

Further reading

Increasing functions and functions increasing

A formal definition of increasing

Using calculus

Teaching this topic at A-level

Proofs of Theorems 1 and 5

Theorem 1

Theorem 5

A visit to Michaela

Small angle approximations - an application

Deriving the formulae for the velocity and acceleration of $P$

Small angle approximations

Dividing fractions

Some comments on a familiar approach

An overview of the idea

A possible teaching sequence

Step 1: What is subtraction?

Step 2: And what is division?

Step 3: 1 divided by a unit fraction

Step 4: Turning a general fraction into an integer

Step 5: Finding reciprocals

Step 6: Dividing fractions