<?xml version="1.0" encoding="UTF-8" ?>

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
   
    <title>Julian's Musings</title>
   
   <link>https://blog.d-and-j.net</link>
   <description>I'm a mathematics teaching resource developer working
   on the Underground Mathematics project.</description>
   <language>en-gb</language>
   <managingEditor>Julian Gilbey</managingEditor>
   <atom:link href="rss" rel="self" type="application/rss+xml" />
   
	
		<item>
  <title>The PyTorch add_module() function</title>
  <link>https://blog.d-and-j.org/deep-learning/2021/04/23/pytorch-add_module.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2021-04-23T18:00:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/deep-learning/2021/04/23/pytorch-add_module.html</guid>
  <description><![CDATA[
     <p>I have been building some bespoke PyTorch models, and have just been
stung by a bug; it turns out that using the <code class="language-plaintext highlighter-rouge">add_module()</code> method is
sometimes critical to making a PyTorch model work.  Without this
method, the program may just crash, but might also just about work but
give completely meaningless results.</p>

<p>Though there do not seem to be any hints about this in the
documentation, it seems that PyTorch determines the layers or others
<code class="language-plaintext highlighter-rouge">Module</code>s used in a particular <code class="language-plaintext highlighter-rouge">Module</code> by looking at the type of
object stored in each member of the <code class="language-plaintext highlighter-rouge">Module</code>.  And if that object is
not a <code class="language-plaintext highlighter-rouge">Module</code>, PyTorch does not recognise it and will not
backpropagate through it.</p>

<p>Here is an example.  We take the
<a href="https://pytorch.org/tutorials/beginner/basics/quickstart_tutorial.html">Quickstart</a>
from the PyTorch tutorials webpage.  The neural network is defined in
it as follows:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">NeuralNetwork</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">NeuralNetwork</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">flatten</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Flatten</span><span class="p">()</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">linear_relu_stack</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">28</span><span class="o">*</span><span class="mi">28</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">()</span>
        <span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">flatten</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">linear_relu_stack</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">logits</span>
</code></pre></div></div>

<p>Note that the layers, <code class="language-plaintext highlighter-rouge">nn.Flatten()</code> and <code class="language-plaintext highlighter-rouge">nn.Sequential(...)</code>, are
stored as members of the <code class="language-plaintext highlighter-rouge">NeuralNetwork</code> object, by writing
<code class="language-plaintext highlighter-rouge">self.flatten = ...</code> and so on.</p>

<p>Let us now change the code to store these layers in a list instead:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">NeuralNetwork</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">NeuralNetwork</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="n">flatten</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Flatten</span><span class="p">()</span>
        <span class="n">linear_relu_stack</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">28</span><span class="o">*</span><span class="mi">28</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">()</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layers</span> <span class="o">=</span> <span class="p">[</span><span class="n">flatten</span><span class="p">,</span> <span class="n">linear_relu_stack</span><span class="p">]</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
        <span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">1</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">logits</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">forward()</code> method is also modified to use the appropriate element
of the list of layers.  But now PyTorch ignores the <code class="language-plaintext highlighter-rouge">self.layers</code>
member, as it is not a <code class="language-plaintext highlighter-rouge">Module</code>, and the following code breaks quite
badly.</p>

<p>The simplest way to fix this, while keeping the layers as a list, is
to inform PyTorch about the existence of these layers using the
<code class="language-plaintext highlighter-rouge">add_module()</code> method, as follows:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">NeuralNetwork</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">NeuralNetwork</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="n">flatten</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Flatten</span><span class="p">()</span>
        <span class="n">linear_relu_stack</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">28</span><span class="o">*</span><span class="mi">28</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">()</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layers</span> <span class="o">=</span> <span class="p">[</span><span class="n">flatten</span><span class="p">,</span> <span class="n">linear_relu_stack</span><span class="p">]</span>
        <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">layer</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">):</span>
            <span class="bp">self</span><span class="p">.</span><span class="n">add_module</span><span class="p">(</span><span class="sa">f</span><span class="s">"layer_</span><span class="si">{</span><span class="n">i</span><span class="si">}</span><span class="s">"</span><span class="p">,</span> <span class="n">layer</span><span class="p">)</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
        <span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">1</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">logits</span>
</code></pre></div></div>

<p>The first parameter of the <code class="language-plaintext highlighter-rouge">add_module()</code> method is a name that
PyTorch will use to refer to the layer when printing the neural
network model, while the second is the layer itself.  The name can
also be used to refer to the layer as an attribute of the <code class="language-plaintext highlighter-rouge">Module</code>
object, so it is presumably important that the names are unique within
the <code class="language-plaintext highlighter-rouge">Module</code> and potentially helpful if they are valid Python
identifiers (though if they are not, they can still be accessed using
<code class="language-plaintext highlighter-rouge">getattr()</code>).</p>

<p>And with that addition, the code once again works.</p>

<p>(If you are wondering why we would store the layers in a list in the
first place, I had a use case where the network was constructed with
a variable number of layers passed to <code class="language-plaintext highlighter-rouge">__init__()</code> as a list.)</p>

<p><em>Edits 27 April 2021</em></p>

<p>A colleague has just pointed out the <code class="language-plaintext highlighter-rouge">nn.ModuleList</code> class to me.  So
this problem could also be solved in a simpler way as follows:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">NeuralNetwork</span><span class="p">(</span><span class="n">nn</span><span class="p">.</span><span class="n">Module</span><span class="p">):</span>
    <span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        <span class="nb">super</span><span class="p">(</span><span class="n">NeuralNetwork</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">__init__</span><span class="p">()</span>
        <span class="n">flatten</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Flatten</span><span class="p">()</span>
        <span class="n">linear_relu_stack</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">Sequential</span><span class="p">(</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">28</span><span class="o">*</span><span class="mi">28</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">512</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">(),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">Linear</span><span class="p">(</span><span class="mi">512</span><span class="p">,</span> <span class="mi">10</span><span class="p">),</span>
            <span class="n">nn</span><span class="p">.</span><span class="n">ReLU</span><span class="p">()</span>
        <span class="p">)</span>
        <span class="bp">self</span><span class="p">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">nn</span><span class="p">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">flatten</span><span class="p">,</span> <span class="n">linear_relu_stack</span><span class="p">])</span>

    <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
        <span class="n">logits</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">1</span><span class="p">](</span><span class="n">x</span><span class="p">)</span>
        <span class="k">return</span> <span class="n">logits</span>
</code></pre></div></div>

  ]]></description>
</item>

	

	
		<item>
  <title>Installing PyTorch on Debian bullseye (Debian 11)</title>
  <link>https://blog.d-and-j.org/deep-learning/2021/04/12/pytorch-debian-11.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2021-04-12T09:20:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/deep-learning/2021/04/12/pytorch-debian-11.html</guid>
  <description><![CDATA[
     <p>I have just rebuilt PyTorch for my Debian bullseye machine (Debian 11
as it will become) from source.  This blog post documents how I did
this.</p>

<h2 id="system-setup">System setup</h2>

<p>I have the following key packages/libraries installed.  There will
certainly be others that I have overlooked, of course; please feel
free to let me know of anything I’ve overlooked.</p>

<ul>
  <li>Python: <code class="language-plaintext highlighter-rouge">python3-dev</code> (currently version 3.9.2-2)</li>
  <li>Essential packages (according to the PyTorch installation page) are:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">ninja-build</code></li>
      <li><code class="language-plaintext highlighter-rouge">cmake</code></li>
      <li><code class="language-plaintext highlighter-rouge">libmagma-dev</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-numpy</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-yaml</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-setuptools</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-cffi</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-typing-extensions</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-future</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-six</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-requests</code></li>
    </ul>
  </li>
  <li>
    <p>The <a href="https://github.com/pytorch/pytorch#from-source">PyTorch
website</a> also says
that <code class="language-plaintext highlighter-rouge">mkl</code> should be included, which is probably provided by the
<code class="language-plaintext highlighter-rouge">libmkl-dev</code> package.  But that provides alternatives to the
<code class="language-plaintext highlighter-rouge">libblas</code> library packages, so it fine to install one of the open
source <code class="language-plaintext highlighter-rouge">libblas-dev</code> packages instead (<code class="language-plaintext highlighter-rouge">libblas-dev</code>,
<code class="language-plaintext highlighter-rouge">libblas64-dev</code>, <code class="language-plaintext highlighter-rouge">libopenblas-dev</code> or <code class="language-plaintext highlighter-rouge">libopenblas64-dev</code>).</p>
  </li>
  <li>
    <p>Though the PyTorch page says that <code class="language-plaintext highlighter-rouge">dataclasses</code> is required, this is
only for Python versions less than 3.7, so there is no need to
install this.</p>
  </li>
  <li>
    <p>CUDA: The Debian non-free archive includes the package
<code class="language-plaintext highlighter-rouge">nvidia-cuda-toolkit</code>, which does everything required (<a href="https://blog.d-and-j.net/deep-learning/2021/03/24/tensorflow-debian-11.html">unlike the
case with TensorFlow</a>).</p>
  </li>
  <li>cuDNN: <code class="language-plaintext highlighter-rouge">libcudnn8-dev</code> (version 8.1.0.77-1+cuda11.2; this package
was downloaded directly from the <a href="https://developer.nvidia.com/cudnn">NVIDIA
website</a>)</li>
</ul>

<h2 id="building-pytorch">Building PyTorch</h2>

<p>I started by following the guidance on the PyTorch <a href="https://github.com/pytorch/pytorch#from-source">installing from
source</a> webpage.</p>

<h3 id="downloading-the-pytorch-source-code">Downloading the PyTorch source code</h3>

<p>I followed the instructions as given; I also checked out the latest
release branch, which at the time of writing is <code class="language-plaintext highlighter-rouge">r1.8.1</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone --recursive https://github.com/pytorch/pytorch
$ cd pytorch
$ git checkout -t origin/release/1.8
$ git submodule sync
$ git submodule update --init --recursive
</code></pre></div></div>

<p>(where the last two commands are needed if updating an existing
checkout, though they do not do any harm if not).</p>

<h3 id="pre-build">Pre-build</h3>

<p>The version number in the PyTorch sources does not match the official
version number for some reason.  So I “corrected” it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ echo 1.8.1 &gt; version.txt
</code></pre></div></div>

<h3 id="building-the-package">Building the package</h3>

<p>It’s a one-line command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 setup.py install --user
</code></pre></div></div>
<p>This compilation step is pretty long, but that is all that is needed.</p>

<p>To clean the build directory after it had finished, I ran:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 setup.py clean
</code></pre></div></div>

<h2 id="building-torchvision-and-torchaudio">Building torchvision and torchaudio</h2>

<h3 id="installing-dependencies">Installing dependencies</h3>

<p><code class="language-plaintext highlighter-rouge">torchvision</code> looks for certain graphics libraries during the build.
I’m not sure which are strictly required, but here are the packages to
install to ensure that everything is present (in addition to having
PyTorch itself already installed as above):</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">libpng-dev</code></li>
  <li><code class="language-plaintext highlighter-rouge">libpng-tools</code></li>
  <li><code class="language-plaintext highlighter-rouge">libjpeg-dev</code></li>
  <li><code class="language-plaintext highlighter-rouge">ffmpeg</code></li>
  <li><code class="language-plaintext highlighter-rouge">libavcodec-dev</code></li>
  <li><code class="language-plaintext highlighter-rouge">libavformat-dev</code></li>
  <li><code class="language-plaintext highlighter-rouge">libavutil-dev</code></li>
  <li><code class="language-plaintext highlighter-rouge">libswresample-dev</code></li>
  <li><code class="language-plaintext highlighter-rouge">libswscale-dev</code></li>
</ul>

<p>There is also an extra dependency on <code class="language-plaintext highlighter-rouge">python3-scipy</code> in the <code class="language-plaintext highlighter-rouge">setup.py</code>
<code class="language-plaintext highlighter-rouge">extras_require</code> option; I am unclear whether this is needed.</p>

<p><code class="language-plaintext highlighter-rouge">torchaudio</code> does not have any dependencies beyond PyTorch itself.</p>

<h3 id="downloading-and-building">Downloading and building</h3>

<p>I used <code class="language-plaintext highlighter-rouge">pip</code> (or <code class="language-plaintext highlighter-rouge">pip3</code>) to install rather than directly using
<code class="language-plaintext highlighter-rouge">setup.py</code>, as otherwise the package gets installed in <code class="language-plaintext highlighter-rouge">egg</code> format.</p>

<p>I installed <code class="language-plaintext highlighter-rouge">torchvision</code> as follows:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/pytorch/vision.git
$ cd vision
$ git checkout -t origin/release/0.9
$ echo 0.9.1 &gt; version.txt
$ pip install --user .
</code></pre></div></div>

<p>Cleaning, if needed, apparently still needs to be done via <code class="language-plaintext highlighter-rouge">setup.py</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python3 setup.py clean
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">torchaudio</code> was only a tiny bit trickier:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone --recursive https://github.com/pytorch/audio.git
$ cd audio
$ git checkout -t origin/release/0.8
$ git submodule sync
$ git submodule update --init --recursive
</code></pre></div></div>

<p>(As with PyTorch, the last two commands are needed if updating an
existing checkout, though they do not do any harm if not).</p>

<p>Then I edited <code class="language-plaintext highlighter-rouge">setup.py</code>, modifying lines 14-16 to read:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># Creating the version file                                                     
version = '0.8.1'
sha = 'e4e171a51714b2b2bd79e1aea199c3f658eddf9a'
</code></pre></div></div>

<p>Then a one-line build command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pip install --user .
</code></pre></div></div>
<p>and I was done.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>Installing TensorFlow and TensorFlow Addons on Debian bullseye (Debian 11)</title>
  <link>https://blog.d-and-j.org/deep-learning/2021/03/24/tensorflow-debian-11.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2021-03-24T11:25:00+00:00</pubDate>
  <guid>https://blog.d-and-j.net/deep-learning/2021/03/24/tensorflow-debian-11.html</guid>
  <description><![CDATA[
     <p>Though I usually use PyTorch for my deep learning work, I have just
been given a piece of code written using TensorFlow, so I needed to
install TensorFlow and TensorFlow Addons on my Debian testing system
(aka Debian bullseye, which will shortly become Debian 11).
Unfortunately, the binaries available on PyPI are only built for
Python 3.6-3.8, but Debian bullseye now runs Python 3.9.</p>

<p>This blog post documents how I managed to build these packages for my
system (which was far more effort than it probably should have been!).</p>

<h2 id="system-setup">System setup</h2>

<p>I have the following key packages/libraries installed.  There will
certainly be others that I have overlooked, of course; please feel
free to let me know of anything I’ve overlooked.</p>

<ul>
  <li>Python: <code class="language-plaintext highlighter-rouge">python3-dev</code> (currently version 3.9.2-2)</li>
  <li>Essential Python packages (according to the TensorFlow installation
page) are <code class="language-plaintext highlighter-rouge">pip</code>, <code class="language-plaintext highlighter-rouge">numpy</code>, <code class="language-plaintext highlighter-rouge">wheel</code> and <code class="language-plaintext highlighter-rouge">keras_preprocessing</code>.  The
Debian packages providing these are:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">python3-keras-preprocessing</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-numpy</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-pip</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-wheel</code></li>
    </ul>

    <p>though the version of <code class="language-plaintext highlighter-rouge">keras_preprocessing</code> currently in the
Debian archive is older than that required by the TensorFlow
installation, so it may be wiser to install it with</p>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install -U --user keras_preprocessing --no-deps
</code></pre></div>    </div>
    <p>as explained on the TensorFlow installation page.</p>
  </li>
  <li>Python packages: either install Debian versions of these or let
  <code class="language-plaintext highlighter-rouge">pip</code> install them during the package installation.  The relevant
  packages seem to be:
    <ul>
      <li><code class="language-plaintext highlighter-rouge">python3-flatbuffers</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-google-auto-oauthlib</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-grpcio</code> (*)</li>
      <li><code class="language-plaintext highlighter-rouge">python3-h5py</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-markdown</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-protobuf</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-requests</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-setuptools</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-six</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-termcolor</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-typeguard</code> (*)</li>
      <li><code class="language-plaintext highlighter-rouge">python3-typing-extensions</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-werkzeug</code></li>
      <li><code class="language-plaintext highlighter-rouge">python3-wrapt</code>
  and their dependencies.</li>
    </ul>

    <p>(*): In these cases, the current Debian version is too old
  for TensorFlow 2.4.1.</p>
  </li>
  <li>cuDNN: <code class="language-plaintext highlighter-rouge">libcudnn8-dev</code> (version 8.1.0.77-1+cuda11.2; this package
was downloaded directly from the NVIDIA website)</li>
  <li>Go compiler: <code class="language-plaintext highlighter-rouge">golang-go</code> - this is needed for <code class="language-plaintext highlighter-rouge">bazelisk</code>.</li>
</ul>

<p>Unfortunately, it turns out that the Debian-packaged CUDA packages do
not work when building TensorFlow; see <a href="https://github.com/tensorflow/tensorflow/issues/40202">this GitHub
issue</a>.  It is
likely this this will not be fixed.  On the other hand, the
NVIDIA-provided Debian packages don’t seem to work nicely with some of
the other parts of the system, so I never used them.</p>

<p>To get around this, I downloaded <code class="language-plaintext highlighter-rouge">cuda_11.2.1_460.32.03_linux.run</code>
from the <a href="https://developer.nvidia.com/cuda-downloads?target_os=Linux&amp;target_arch=x86_64&amp;target_distro=Debian&amp;target_version=10&amp;target_type=runfilelocal">NVIDIA
website</a>.
(The current version, though, is 11.2.2, but as my Debian CUDA
packages are 11.2.1, I downloaded the older version from the <a href="https://developer.nvidia.com/cuda-toolkit-archive">Archive
of Previous CUDA
Releases</a>.)</p>

<p>I needed write permission on the parent directory of the
desired target location, so I created a directory
<code class="language-plaintext highlighter-rouge">/usr/local/cuda-11.2.1</code> (as root) and then ran</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ chown jdg:jdg /usr/local/cuda-11.2.1
</code></pre></div></div>
<p>(I could equally have created this directory in some other location
without needing to be root.)  I then unpacked the CUDA package into
it:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sh cuda_11.2.1_460.32.03_linux.run --installpath=/usr/local/cuda-11.2.1/cuda
</code></pre></div></div>

<p>After the installation, I tidied up:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mv /usr/local/cuda-11.2.1/cuda/* /usr/local/cuda-11.2.1/
$ rmdir /usr/local/cuda-11.2.1/cuda/
</code></pre></div></div>
<p>so that everything is now directly in <code class="language-plaintext highlighter-rouge">/usr/local/cuda-11.2.1</code>.  (At
the end of the build process, it appears that this entire directory
can be deleted, as long as the relevant Debian CUDA packages are still
present.)</p>

<h2 id="building-tensorflow">Building TensorFlow</h2>

<p>I started by following the guidance on the TensorFlow <a href="https://www.tensorflow.org/install/source">installing from
source</a> webpage.</p>

<h3 id="installing-bazel">Installing Bazel</h3>

<p>There will eventually be a Debian bazel package, but unfortunately
that is some way in the future still (there are a team working on
this, but there are currently technical difficulties).  So I installed
Bazelisk, following the instructions on the <a href="https://github.com/bazelbuild/bazelisk">Baselisk GitHub
page</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ go get github.com/bazelbuild/bazelisk
$ export PATH=$PATH:$(go env GOPATH)/bin
$ (cd $(go env GOPATH)/bin &amp;&amp; ln -s bazelisk bazel)
</code></pre></div></div>
<p>(The <code class="language-plaintext highlighter-rouge">export</code> may well be extraneous, as <code class="language-plaintext highlighter-rouge">PATH</code> is usually already
exported.)</p>

<h3 id="build-directory-setup">Build directory setup</h3>

<p>Since I am building both TensorFlow and TensorFlow Addons, I created a
directory called <code class="language-plaintext highlighter-rouge">~/packages/tensorflow</code> and clone the git
repositories into that directory; the intention is to build the wheels
in that same directory, so everything is together.</p>

<h3 id="downloading-the-tensorflow-source-code">Downloading the TensorFlow source code</h3>

<p>I followed the instructions as given; I also checked out the latest
release branch, which at the time of writing is <code class="language-plaintext highlighter-rouge">r2.4</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/tensorflow/tensorflow.git
$ cd tensorflow
$ git checkout -t origin/r2.4
</code></pre></div></div>

<h3 id="configuring-the-build">Configuring the build</h3>

<p>Here begins the fun!  Some of these lines have been wrapped to fit
better on the screen.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./configure
You have bazel 3.1.0 installed.
Found possible Python library paths:
  /usr/lib/python3/dist-packages
  /usr/lib/python3.9/dist-packages
  /usr/local/lib/python3.9/dist-packages
  /home/jdg/lib/python
Please input the desired Python library path to use.  Default is [/usr/lib/python3/dist-packages]
/home/jdg/.local/lib/python3.9/site-packages
</code></pre></div></div>

<p>I can’t write to the system directory, and it seems as though this is
where some libraries may be written, so I set it to be my local
repository instead.  I don’t know whether leaving this as
<code class="language-plaintext highlighter-rouge">/usr/lib/...</code> would work equally well.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Do you wish to build TensorFlow with ROCm support? [y/N]: 
No ROCm support will be enabled for TensorFlow.

Do you wish to build TensorFlow with TensorRT support? [y/N]: 
No TensorRT support will be enabled for TensorFlow.
</code></pre></div></div>
<p>I accepted the defaults for both of these; I don’t have TensorRT
installed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Inconsistent CUDA toolkit path: /usr vs /usr/libAsking for detailed CUDA configuration...

Please specify the CUDA SDK version you want to use.
[Leave empty to default to CUDA 10]: 11.2


Please specify the cuDNN version you want to use.
[Leave empty to default to cuDNN 7]: 8.1
</code></pre></div></div>
<p>I have CUDA 11.2.1 installed, so I responded <code class="language-plaintext highlighter-rouge">11.2</code>.  Perhaps this would
be better as just <code class="language-plaintext highlighter-rouge">11</code>, but I’m not sure.  Likewise, I have cuDNN
8.1.0.77 installed, so I responded <code class="language-plaintext highlighter-rouge">8.1</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Please specify the locally installed NCCL version you want to use.
[Leave empty to use http://github.com/nvidia/nccl]: 
</code></pre></div></div>
<p>I left this empty, as I do not have NCCL installed.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Please specify the comma-separated list of base paths to look for CUDA
libraries and headers. [Leave empty to use the default]:
/usr/local/cuda-11.2.1
</code></pre></div></div>
<p>This is the point at which everything goes wrong with the
Debian-packaged version of the CUDA libraries.  The Debian versions
can be installed on the system simultaneously during the build, it
seems, but they do not work for building TensorFlow.  So I gave the
path to the NVIDIA-unpackage libraries.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Please specify a list of comma-separated CUDA compute capabilities you
want to build with.

You can find the compute capability of your device at:
https://developer.nvidia.com/cuda-gpus.
Each capability can be specified as "x.y" or "compute_xy" to include
both virtual and binary GPU code, or as "sm_xy" to only include the
binary code.

Please note that each additional compute capability significantly
increases your build time and binary size, and that TensorFlow only
supports compute capabilities &gt;= 3.5 [Default is: 3.5,7.0]:
</code></pre></div></div>
<p>I set this to <code class="language-plaintext highlighter-rouge">3.5,7.5</code> based on the information on the webpage
referred to.  Maybe I don’t need the <code class="language-plaintext highlighter-rouge">3.5</code> part, but I’m not sure, so
I left it in.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Do you want to use clang as CUDA compiler? [y/N]: 

Please specify which gcc should be used by nvcc as the host
compiler. [Default is /usr/bin/gcc]: 

Please specify optimization flags to use during compilation when bazel
option "--config=opt" is specified [Default is -Wno-sign-compare]: 

Would you like to interactively configure ./WORKSPACE for Android
builds? [y/N]:
</code></pre></div></div>
<p>I left all of these with their default settings.</p>

<p>And the configuration is finished!</p>

<h3 id="building-the-pip-package">Building the pip package</h3>

<p>Unfortunately, some of the scripts in the TensorFlow sources call
python using the shebang formulation <code class="language-plaintext highlighter-rouge">#!/usr/bin/env python</code>, which
breaks unless there is a <code class="language-plaintext highlighter-rouge">python</code> on <code class="language-plaintext highlighter-rouge">PATH</code>; this should presumably be
Python 3.x (though I haven’t checked).  In Debian 10, <code class="language-plaintext highlighter-rouge">python</code> was
<code class="language-plaintext highlighter-rouge">python2</code>, but in Debian 11, there is no <code class="language-plaintext highlighter-rouge">python</code> executable by
default (though there is a <code class="language-plaintext highlighter-rouge">python-is-python3</code> package which could be
installed, which creates <code class="language-plaintext highlighter-rouge">/usr/bin/python</code> as a symlink to
<code class="language-plaintext highlighter-rouge">/usr/bin/python3</code>).  Since I don’t want things to break silently, I
didn’t install that package, but instead set up a local symlink for the
purpose of this build.  (Note that the bazel environment variable
<code class="language-plaintext highlighter-rouge">PYTHON_BIN_PATH</code> does not help at all.)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mkdir ../bin
$ ln -s /usr/bin/python3 ../bin/python
$ PATH=$(realpath ../bin):$PATH
</code></pre></div></div>

<p>I then ran the bazel command:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ bazel build --config=cuda //tensorflow/tools/pip_package:build_pip_package
</code></pre></div></div>
<p>This compilation step is pretty long.</p>

<h3 id="building-and-installing-the-pip-package">Building and installing the pip package</h3>

<p>Following the instructions, I ran:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>./bazel-bin/tensorflow/tools/pip_package/build_pip_package ..
</code></pre></div></div>
<p>so that the resulting wheel ended up in the parent directory.</p>

<p>I then installed the wheel; in my case, the command line was:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip install --user ../tensorflow-2.4.1-cp39-cp39-linux_x86_64.whl
</code></pre></div></div>

<h2 id="installing-tensorflow-addons">Installing TensorFlow Addons</h2>

<p>The instructions for this are on the TensorFlow website
<a href="https://www.tensorflow.org/addons/overview">here</a>.</p>

<p>I first cloned the repository; again, I started in the directory
<code class="language-plaintext highlighter-rouge">~/packages/tensorflow</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git clone https://github.com/tensorflow/addons.git
$ cd addons
$ git checkout -t origin/r0.12
</code></pre></div></div>

<p>The exports, though, are not quite as described.  They should instead
be the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ export TF_NEED_CUDA=1
$ export CUDA_TOOLKIT_PATH=/usr/local/cuda-11.2.1
</code></pre></div></div>
<p>(and I have just reported it, so this may well be fixed very soon).
The rest ran smoothly:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ python3 ./configure.py
$ bazel build build_pip_pkg
$ bazel-bin/build_pip_pkg ..
$ pip install ../tensorflow_addons-0.12.2-cp39-cp39-linux_x86_64.whl
</code></pre></div></div>

<p>And after this, I removed (actually just temporarily renamed)
<code class="language-plaintext highlighter-rouge">/usr/local/cuda-11.2.1</code>, and everything still works.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>What is mathematics, really?</title>
  <link>https://blog.d-and-j.org/mathematics/teaching/2019/09/06/what-is-mathematics-really.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2019-09-06T09:40:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/mathematics/teaching/2019/09/06/what-is-mathematics-really.html</guid>
  <description><![CDATA[
     <p>Greg Ashman recently published two provocative
posts (the <a href="https://gregashman.wordpress.com/2019/08/31/the-beauty-of-maths-is-that-its-right-or-wrong/">first</a>
and the
<a href="https://gregashman.wordpress.com/2019/09/02/a-lot-of-people-dont-seem-to-know-what-mathematics-is/">second</a>)
in response to <a href="https://blog.mrmeyer.com/2019/humanizing-math-class-means-teaching-math-like-the-humanities/">Dan Meyer’s
post</a>,
claiming that “A lot of people don’t seem to understand what
mathematics is”.  Dan Meyer’s statement:</p>

<blockquote>
  <p>“Math is only objective, inarguable, and abstract for questions
defined so narrowly they’re almost useless to students, teachers,
and the world itself.”</p>
</blockquote>

<p>formed the starting point of this.  Greg’s central thesis is that
mathematics uses deductive reasoning, as opposed to other subjects
which use inductive reasoning.  From this follows the argument that
calculating things like p-values is mathematics, whereas evaluating
their meaning or usefulness is science.  (I encourage you to read the
full posts; I have just picked a couple of points out.)</p>

<p>But this seems to be quite a strange argument.  Let us consider a
metaphor: what is Art?  A painting or a sculpture would be examples of
Art (though I would not like to begin defining “Art” itself).  We
might admire them, study them, and so forth, but that would not be
“doing Art”.  An artist may well do this to gain inspiration, as well
as studying art techniques so that they can produce their own novel
art.</p>

<p>Mathematics is similar.  Let us focus first on pure mathematics.</p>

<h3 id="doing-pure-mathematics">Doing “pure” mathematics</h3>

<p>A beautifully presented piece of deductive reasoning is a piece of
mathematics.  Mathematicians will study existing such arguments to
learn ideas and techniques, but one of their main focuses is to
<em>generate</em> novel such arguments, showing that some result is true.
This is always a creative process, and sometimes hugely so.  The
process is messy, exploratory, and full of inductive reasoning.  It
often takes the form of “I’ve noticed that in all these cases I’ve
tried, such-and-such seems to happen.  Maybe I can prove that, and
that will help me to show that the main result is true.”  The
deductive argument which results is a product of doing mathematics,
just as the painting is the result of the artist’s messy exploration,
trial and error, and so on.</p>

<p>And even having a deductive argument is not the end of the story.
When presented with a deductive argument, how does one check that it
is correct?  Perhaps one could do so just by following the given
argument and checking every step?  Unfortunately, the challenge of
justifying any but the simplest theorems using automated theorem
provers shows that this is generally far more involved and complicated
than it may superficially seem.  The skill of seeking counterexamples
to arguments, of finding errors in proofs and ambiguities in
terminology is a highly creative one, not a logical-deductive one.  So
even in the seemingly purely logical world of deductive mathematical
arguments lies a huge amount of exploration, insight and perhaps even
induction.</p>

<p>And those questions which have an objectively correct answer at school
level are generally calculations.  And even there, we can start asking
interesting (and valuable) non-objective questions such as: Is this
the best approach?  How do these two different methods compare?  When
would you use one rather than the other?  When would we want/need to
perform such a calculation?  These questions are as important or -
given the ability of calculators and computers to do the grunt-work -
arguably more important than the calculation themselves.</p>

<p>Another vital aspect of (pure) mathematicians’ work is the generation
of conjectures.  This is what spurs others on to try to find arguments
(proofs) for these conjectures being true, or to show that they are
false.  At the extreme lie the Millennium Problems, which are very
important conjectures which had defied all attempts at solving them.
(One has since been solved.)  There are also many smaller (but still
significant) conjectures, which are some of what continues to drive
mathematics.  But how did mathematicians come up with them?  By doing
experiments, and lots of … you guessed it … inductive reasoning.</p>

<p>I am not questioning the central importance of teaching and learning
calculation techniques (in the very broadest sense), for it is
impossible to do mathematics without them, and I am not arguing for or
against any particular pedagogical approach.  I likewise strongly
agree with Greg that we must teach deductive reasoning in mathematics,
because it is so central to the subject and one of the unique
qualities of school-level mathematics in comparison to other subjects.
(Indeed, I am in the middle of writing a book about this.)  But I do
claim that just performing calculations, such as calculating p-values
or whatever, is not “doing mathematics” - it is simply doing
calculations.  And likewise, reading other people’s deductive
arguments is not “doing mathematics” either (though it is a
prerequisite to creating one’s own arguments).  “Doing mathematics”
must involve some form of exploration, generation of conjectures, and
justification (to paraphrase John Mason), and that must take place in
the classroom, just as school art lessons are not only about learning
specific techniques.</p>

<h3 id="applied-mathematics">Applied mathematics</h3>

<p>So much for pure mathematics being deductive; where does that leave us
when we consider applied mathematics?  We must first recognise that
the boundary between Mathematics and Non-Mathematics is quite fuzzy; a
cosmologist or fluid dynamicist might equally find themselves in a
Physics Department or an Applied Mathematics department, depending
upon the institution.  (Likewise, a pure mathematician might end up in
a Philosophy Department or Computer Science department if they are
studying logic or category theory.)</p>

<p>But returning to the statistical examples with which we began, how do
we handle those?  I would contend that calculating a p-value is just
performing a calculation; without thinking about its meaning or
significance, that cannot be considered “doing mathematics”.  Is
considering the meaning of such calculations actually part of a
different discipline, such as Statistics or Physics?  That is
returning to the question of where one draws the dividing lines, but
what seems clear is that we should not be teaching the calculation of
something without also teaching the meaning of the calculation.
Whether this takes place in a maths lesson or a physics lesson or a
statistics lesson is immaterial.</p>

<p>So how does applied mathematics fits into the mathematical family, and
should we consider statistics as part of mathematics?  That itself is
a long discussion, and I am sure that disagreements abound.  So a brief
thought will have to suffice here: applied mathematicians use their
mathematical reasoning powers to apply mathematical tools to
interesting and novel problems in the “real world”.  And that is, in
some sense, very similar to pure mathematicians using their
mathematical reasoning powers to apply mathematical tools to
interesting and novel problems in the “mathematical world”.  So
perhaps they are not so different, after all.</p>

<h3 id="finally-a-key-educational-question">Finally, a key educational question</h3>

<p>Perhaps the argument between Dan Meyer and Greg Ashman comes down to
the question of what we want to teach in our classrooms.  Do we want
to just teach pure calculation?  Or do we want to teach the creativity
and messiness of mathematics, alongside the deductive aspects?</p>

<p>Some students are very well-served by having a subject in which they
can be objectively right or wrong: it gives them a stability and
safety which is lacking in many other subjects.  But for other
students, who could end up becoming excellent and creative
mathematicians, this objectivity is stifling and off-putting, and they
will never see mathematics for what it truly is.  In the world today,
we don’t need people who can just perform calculations: computers are
so much better than humans at that.  We rather need people who can
think and ask good questions, and as a small part of that, who can
decide what calculations need to be performed.</p>

<p>The debate about how we educate our students is hugely important.
There is no “right” answer.  It is, instead, a question of what we
believe a mathematics education is aiming to achieve, and there it
seems that Dan and Greg fundamentally disagree.</p>

<h3 id="further-reading">Further reading</h3>

<p>A relatively readable and very interesting book on the nature of proof
and deductive mathematics is Imre Lakatos, <em>Proofs and Refutations</em>,
and several of the ideas in this post have been inspired by it.  It is
quite old, but it highlights how complex the Philosophy of Mathematics
actually is.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>Increasing functions and functions increasing</title>
  <link>https://blog.d-and-j.org/mathematics/teaching/ks5/2018/10/07/increasing-functions.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2018-10-07T20:00:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/mathematics/teaching/ks5/2018/10/07/increasing-functions.html</guid>
  <description><![CDATA[
     <p>Here’s the graph of $y=-\dfrac{1}{x}$ for $x\ne0$.</p>

<p><img src="https://blog.d-and-j.net/assets/increasing-functions/reciprocal.svg" alt="" class="center-image" /></p>

<p>Where is this function increasing?  Is it an increasing function?</p>

<p>Looking at various recent examination papers, it has become clear to
me that there is significant confusion between these two questions.
This post is intended to bring some clarity to the situation.</p>

<p>At the start of this post, I will give an example of the confusion as
it appears in exam questions (and probably elsewhere), and clarify
what the two different phrases mean using the above example.  I will
then delve more deeply into the mathematics of these two things, going
beyond A-level content, and use some undergraduate analysis to find
equivalent conditions for them in terms of the derivatives of the
functions.  It is fine to skip over the technical stuff and just look
at the results (theorems)!</p>

<p>(Exactly the same applies to the use of the term “decreasing”, but for
simplicity we will focus on increasing functions in this post.)</p>

<p>Here is an example of a question (based on a real exam question) which
typifies the confusion.</p>

<blockquote>
  <p>The equation of a curve is $y=x^3+4x^2-5x$.</p>

  <p>Find the set of values of $x$ for which $y$ is an increasing
function of $x$.</p>
</blockquote>

<p>If we replace “increasing function” by another familiar A-level term
describing functions, “one-to-one function”, the question becomes:</p>

<blockquote>
  <p>A function is given by $f(x)=x^3+4x^2-5x$.</p>

  <p>Find the set of values of $x$ for which $f(x)$ is a one-to-one
function of $x$.</p>
</blockquote>

<p>This is clearly nonsensical, because whether a function is one-to-one
or not is a property of the function <em>as a whole</em>, not a property of
the function values at any particular input value.</p>

<p>Likewise, a function either is or is not an <em>increasing function</em>; it
is a property of the function <em>as a whole</em>.</p>

<p>Informally (and not quite correctly), we can describe the difference
as follows:</p>

<ul>
  <li>A function is an <em>increasing function</em> if larger input values give
  larger output values.</li>
  <li>A function is <em>increasing at a point</em> if at that point, the function
  has a positive gradient.</li>
</ul>

<p>An example which shows that these are not the same is the function
$f(x)=-\dfrac{1}{x}$ for $x\ne0$ shown above.  This function is
increasing at every value of $x\ne0$, as the gradient is always
positive.  However, it is not an increasing function, because
$f(1)&lt;f(-1)$.  If, though, we restricted the domain of the function to
$x&gt;0$, then it would be an increasing function.</p>

<p>So the above-quoted exam question does not make any sense, just as the
modified version did not: either $y$ is an increasing function of $x$
or it is not.  If the question had instead asked “Find the set of
values of $x$ at which $y$ is increasing,” it would have been fine.</p>

<p>Incidentally, the idea of increasing and decreasing functions connects
very well with the issue of rearranging inequalities (increasing the
depth of connections within the subject): a function can be applied to
both sides of an inequality without changing the direction of the
inequality if the function is (strictly) increasing; it can be applied
but with a change in the direction of the inequality if the function
is (strictly) decreasing, and if the function is neither, then the
function cannot be applied to the inequality.  So we cannot square
both sides of an inequality unless we are restricted to non-negative
values, and we cannot take the reciprocal of an inequality unless we
have the same restriction (and in that case, we must also reverse the
direction of the inequality).</p>

<p>It seems reasonable to assert that if a function is an increasing
function, then it will be increasing at every point.  There turns out
to be some subtlety to this, as we now delve into a little more
deeply.</p>

<h3 id="a-formal-definition-of-increasing">A formal definition of increasing</h3>

<p>We can give a formal definition of an increasing function.  For
example, this definition is from Apostol, <em>Mathematical Analysis</em>, 2nd
ed, p94, and identical definitions appear on the internet:</p>

<blockquote>
  <p>Definition 1: Let $f$ be a real-valued function whose domain is a
subset $S$ of $\mathbb{R}$.
Then $f$ is said to be an <em>increasing</em> (or <em>nondecreasing</em>)
function if for every pair of points $x$ and $y$ in $S$,
$x&lt;y$ implies $f(x)\le f(y)$.
If $x&lt;y$ implies $f(x)&lt;f(y)$, then $f$ is said to be a <em>strictly
increasing</em> function.  (Decreasing functions are similarly defined.)</p>
</blockquote>

<p>Note the distinction between increasing and strictly increasing here:
a constant function such as $f(x)=0$ for $x\in\mathbb{R}$ is both an
increasing and decreasing function, though it is not a strictly
increasing function.</p>

<p>We could also try to come up with a definition of increasing at a
point.  There are no standard definitions of this idea, and the
following proposed definition is certainly beyond A-level in its
formality.  It is based on the definition of continuity, which is
about the behaviour of a function “near” to a point.</p>

<blockquote>
  <p>Definition 2: Let $f$ be a real-valued function whose domain is a
subset $S$ of $\mathbb{R}$.  Then $f$ is said to be <em>increasing at
the point</em> $x$ in $S$ if there is some $\delta&gt;0$ such that:</p>

  <p>for every $y$ in $S$ with $x&lt;y&lt;x+\delta$, $f(x)\le f(y)$, and
for every $y$ in $S$ with $x-\delta&lt;y&lt;x$, $f(y)\le f(x)$.</p>

  <p>If the $\le$ signs are replaced by $&lt;$ signs in these two
inequalities, then $f$ is said to be <em>strictly increasing at</em> $x$.</p>
</blockquote>

<p>With this definition, the above exam question (reworded) makes sense,
and the correct final answer is what the examiner would expect.  (One
might wonder whether one could make such a local definition of
one-to-one, and indeed, this is done when considering the Inverse
Function and Implicit Function theorems.  But that is a story for
another day.)</p>

<h3 id="using-calculus">Using calculus</h3>

<p>So far, no calculus has appeared, yet we typically teach our students
to determine whether a function is an increasing function or to find
where it is increasing by differentiating the function.  So let us now
consider how we could use calculus to help us.</p>

<p>For us to be able to use calculus, we need to assume that our function
is differentiable throughout $S$.  We could then propose the following
theorem:</p>

<blockquote>
  <p>Theorem 1 (incorrect attempt): Let $f$ be a real-valued continuous
function whose domain is a subset $S$ of $\mathbb{R}$ and is
differentiable at every (interior) point of $S$.  Then $f$ is an
increasing function if and only if $f’(x)&gt;0$ for all $x$ in (the
interior of) $S$.</p>
</blockquote>

<p>(The use of “interior” is to avoid certain technical complications.)</p>

<p>Unfortunately this fails immediately: the constant function $f(x)=0$
for $x\in\mathbb{R}$ is increasing, yet $f’(x)=0$.</p>

<p>We could try changing this to say that $f$ is a <em>strictly</em> increasing
function, but that fails if the function has a point of inflection.
For example, $f(x)=x^3$ is a strictly increasing function, even though
its derivative is zero at $x=0$.</p>

<p><img src="https://blog.d-and-j.net/assets/increasing-functions/cubic.svg" alt="" class="center-image" /></p>

<p>We could also try changing the condition to say that $f’(x)\ge0$ for
all $x$ in $S$.  However, this also fails: if the graph has a
discontinuity, such as the function $f(x)=-\dfrac{1}{x}$ for $x\ne0$
that we looked at before, then it might have $f’(x)&gt;0$ for all $x$ in
$S$, yet not be an increasing function.</p>

<p>This feels more hopeful, though: after all, the only problem now is
the “hole” in the domain $S$.  And it turns out that if we restrict
the domain to be an interval (that is, a subset of the reals with no
“holes”), then it will work:</p>

<blockquote>
  <p>Theorem 1 (correct version): Let $f$ be a real-valued continuous
function whose domain is an interval $I$ of $\mathbb{R}$ and is
differentiable at every point in (the interior of) $I$.  Then $f$ is
an increasing function if and only if $f’(x)\ge 0$ for all $x$ in
(the interior of) $I$.</p>
</blockquote>

<p>The formal proof of this is found below, and though it is quite
technical, the theorem itself seems clearly true, and school students
could probably be convinced to believe it (at least once it is written
in more student-friendly language).</p>

<p>What can we say, though, about whether a (differentiable) function is
increasing at a point?  Using Definition 2 above, we get the
corresponding theorem:</p>

<blockquote>
  <p>Theorem 2: Let $f$ be a real-valued continuous function whose domain
is an interval $I$ of $\mathbb{R}$ and which is differentiable at
every (interior) point of $I$.  Then is $f$ is increasing at the
point $x$ in $I$ if and only if there is some $\delta&gt;0$ for which
$f’(y)\ge0$ for all $y$ in (the interior of) $I$ with
$x-\delta&lt;y&lt;x+\delta$.</p>
</blockquote>

<p>Why is it not sufficient to just require $f’(x)\ge0$?  Well, consider
the functions $f(x)=x^3$ and $f(x)=-x^3$.  They both have $f’(x)=0$,
yet the first is increasing (indeed, even strictly increasing) at
$x=0$, while the second is decreasing at $x=0$.  And a function such
as $f(x)=x^2$ is neither increasing nor decreasing at $x=0$.  So we
really do need to consider a small interval around the point of
interest.</p>

<p><img src="https://blog.d-and-j.net/assets/increasing-functions/cubics.svg" alt="" class="center-image" /></p>

<p><img src="https://blog.d-and-j.net/assets/increasing-functions/quadratic.svg" alt="" class="center-image" /></p>

<p>(Theorem 2 could be extended, with care, to more general subsets of
$\mathbb{R}$, as we are only discussing a local property of the
function.  But it is not particularly interesting to do so.)</p>

<p>So the question of determining at which points a function is
increasing (or decreasing) is more subtle than it appears: not only
does one have to find where the function has derivative $\le0$ (and
not just $&lt;0$), but one also has to determine what is happening at
those points where the derivative is zero, as there are different
types of stationary points.  (At those points where the derivative is
strictly positive, the function is certainly strictly increasing,
which follows from Theorem 4 below.)</p>

<p>Things get more complicated if we now wish to consider strictly
increasing (or decreasing) functions.  There is a relatively weak
theorem which will suffice much of the time:</p>

<blockquote>
  <p>Theorem 3: Let $f$ be a continuous real-valued function whose domain
is an interval $I$ of $\mathbb{R}$ and which is differentiable at
every (interior) point of $I$.  Then if $f’(x)&gt;0$ throughout $I$,
$f$ is a strictly increasing function.</p>
</blockquote>

<p>Note that this is a one-directional theorem; $f(x)=x^3$ for
$x\in\mathbb{R}$ is our standard example of a strictly increasing
function which does not have $f’(x)&gt;0$ throughout the domain because
of the point of inflection at the origin.  The proof of Theorem 3
follows exactly as that of Theorem 1.</p>

<p>An easy corollary of this is the following (local) theorem:</p>

<blockquote>
  <p>Theorem 4: Let $f$ be a continuous real-valued function whose domain
is a subset $S$ of $\mathbb{R}$.  If $f$ is differentiable at the
point $x$ in the interior of $S$ and $f’(x)&gt;0$, then $f$ is strictly
increasing at $x$.</p>
</blockquote>

<p>This is the theorem which is typically used when answering A-level
exam questions such as the one above.  Unfortunately, as we see from
our example of $f(x)=x^3$, this too is a one-directional theorem:
every point at which $f’(x)&gt;0$ is a point at which the function is
strictly increasing, but there may be other points where this is the
case but where $f’(x)=0$.  (If $f’(x)&lt;0$, then the function is
strictly decreasing at this point, so it cannot be increasing.)  The
question of using calculus to determine where a function is
increasing, rather than strictly increasing, is somewhat more
complicated, as we see from Theorem 2 above.  But at A-level, the
functions are always nice enough that the only difficulties will be at
the stationary points.</p>

<p>There is actually a necessary and sufficient condition for a function
to be strictly increasing, but this is more subtle:</p>

<blockquote>
  <p>Theorem 5: Let $f$ be a continuous real-valued function whose domain
is an interval $I$ of $\mathbb{R}$ and which is differentiable at
every interior point of $I$.  Then $f$ is strictly increasing on $I$
if and only if $f’(x)\ge0$ throughout $I$ and there is no
non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the
interior of $J$.</p>
</blockquote>

<p>The proof can be found below.</p>

<h3 id="teaching-this-topic-at-a-level">Teaching this topic at A-level</h3>

<p>Putting this all together, we see that Theorem 4 is the crucial
theorem for school use.  Teaching the meaning of the term “increasing
function” (Definition 1) and a simplified explanation of “increasing
at a point” (Definition 2), along with Theorem 4 should give a good
grounding.  It would also be wise to caution that it is a one-way
theorem by comparing and contrasting examples such as $f(x)=x^2$ and
$f(x)=x^3$.</p>

<hr />

<h3 id="proofs-of-theorems-1-and-5">Proofs of Theorems 1 and 5</h3>

<p>This technical appendix uses tools from undergraduate analysis.  The
proofs of the other three theorems are very similar to these or they
follow immediately from these.</p>

<h4 id="theorem-1">Theorem 1</h4>

<blockquote>
  <p>Let $f$ be a real-valued continuous function whose domain is an
interval $I$ of $\mathbb{R}$ and is differentiable at every point in
the interior of $I$.  Then $f$ is an increasing function if and only
if $f’(x)\ge 0$ for all $x$ in the interior of $I$.</p>
</blockquote>

<p><strong>Proof</strong></p>

<p>We show first that if $f$ is an increasing function, then $f’(x)\ge0$
for all $x$ in the interior of $I$, and we argue by contradiction.
Assume that $f’(x_0)&lt;0$ for some $x_0$ in the interior of $I$.  Using
the definition of derivative, this means that
$\lim\limits_{\substack{x\to x_0\\ x\in
I}}\dfrac{f(x)-f(x_0)}{x-x_0}&lt;0$.  So there is some $x_1\in I$ (where
$x_1\ne x_0$) with $\dfrac{f(x)-f(x_0)}{x-x_0}&lt;0$ (otherwise the limit
would be $\ge0$).  If $x_1&gt;x_0$, then multiplying by $x_1-x_0$ gives
$f(x_1)-f(x_0)&lt;0$, so $f(x_1)&lt;f(x_0)$, If $x_1&lt;x_0$, then multiplying
by $x_1-x_0$ gives $f(x_1)-f(x_0)&gt;0$, so $f(x_1)&gt;f(x_0)$.  Either way,
this shows that the function is not increasing on $I$, and we have our
desired contradition.  Thus if $f$ is an increasing function, we must
have $f’(x)\ge0$ for all $x$ in the interior of $I$.</p>

<p>Conversely, if $f’(x)\ge0$ for all $x$ in the interior of $I$, then
let $x&lt;y$ be any two points in $I$.  Then by the mean-value theorem,
there is some $z$ with $x&lt;z&lt;y$ for which $f(y)-f(x)=f’(z)(y-x)$ (and
note that $z$ lies in the interior of $I$ as $I$ is an interval).
Since $f’(z)\ge0$ by assumption, and $y-x&gt;0$, it follows that
$f(y)-f(x)\ge0$, so $f(x)\le f(y)$.  Therefore $f$ is an increasing
function.</p>

<h4 id="theorem-5">Theorem 5</h4>

<blockquote>
  <p>Let $f$ be a continuous real-valued function whose domain is an
interval $I$ of $\mathbb{R}$ and which is differentiable at every
interior point of $I$.  Then $f$ is strictly increasing on $I$ if
and only if $f’(x)\ge0$ throughout $I$ and there is no non-trivial
subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the interior of
$J$.</p>
</blockquote>

<p><strong>Proof</strong></p>

<p>We first prove that if the derivative condition is not met, then $f$
is not strictly increasing on $I$.  If $f’(x)&lt;0$ at any point in $I$,
then $f$ is not increasing (by Theorem 1), so it is certainly not
strictly increasing.  If $f’(x)\ge0$ throughout $I$ but there is a
non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the
interior of $J$, then $f$ is constant throughout $J$ (by the
mean-value theorem).  In particular, there are $y&lt;z$ in $J$ with
$f(y)=f(z)$, showing that $f$ is not strictly increasing.</p>

<p>Conversely, if $f’(x)\ge0$ throughout $I$, then $f$ is increasing by
Theorem 1.  Assume now that there is no non-trivial subinterval $J$ of
$I$ with $f’(x)=0$ for all $x$ in the interior of $J$.  But if $f$
were <em>not</em> strictly increasing, then there would be $y&lt;z$ in $I$ with
$f(y)=f(z)$, so $f(x)$ is constant on the interval $y&lt;x&lt;z$.  (For if
$f(y)&lt;f(x)$ for some $x$ in this interval, we would have $f(x)&gt;f(z)$,
contradicting $f$ increasing.)  Therefore $f’(x)=0$ throughout this
interval, contradicting our assumption.  So $f$ must be strictly
increasing.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>A visit to Michaela</title>
  <link>https://blog.d-and-j.org/teaching/2018/07/20/michaela-visit.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2018-07-20T15:00:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/teaching/2018/07/20/michaela-visit.html</guid>
  <description><![CDATA[
     <p>Having recently listened to about 5.5 hours of <a href="http://mrbartonmaths.com/">Craig
Barton</a> interviewing Dani Quinn (<a href="http://www.mrbartonmaths.com/blog/dani-quinn-part-1-michaela-school-planning-lessons-low-stakes-tests/">part
1</a>
and <a href="http://www.mrbartonmaths.com/blog/dani-quinn-part-2-michaela-school-behaviour-drills-culture/">part
2</a>),
the Head of Mathematics at <a href="https://mcsbrent.co.uk/">Michaela Community
School</a>, I decided that it was worth visiting
the school to see their principles in action for myself, so last week,
I took to the buses to visit Wembley.</p>

<p><img src="https://blog.d-and-j.net/assets/michaela-visit/head.jpg" alt="" class="center-image" /></p>

<p>Though my main interest was the maths teaching, I was fascinated by
the whole experience, so that is what I will focus most of my
attention on here.  I used to teach in a school (“W”) with a broadly
similar type of intake: it was in an area with many students from
ethnic minorities and many students on free school meals; that school
was also in an area in which there was a grammar school system, so
many of the highest-attaining students in the catchment area attended
the more selective local schools.  This gave me an interesting basis
for comparison.</p>

<p>The most obvious thing which struck me was the atmosphere that
Katharine and her staff have established in the school.  It was very
purposeful, and the students I met generally seemed happy and to like
the school.  They were polite to me, and some were genuinely
interested in talking to me.  (Or at least they gave the convincing
impression that they were!)  Some students were immensely proud of
what they were doing and showed off their work to me (without my even
asking).</p>

<p>Many have written about the very strictly enforced behaviour policies.
But what I had not expected was the huge warmth pouring forth from the
staff to the students in their lessons, and the humanity pervading the
school.  Whilst demerits were regularly given for infringements of the
school’s very strict behaviour policies - generally accompanied by
just a few seconds’ calm explanation of the positive benefits of doing
what was expected or the negative impact the behaviour was having on
others - merits were given even more liberally (and fairly
consistently between lessons) for behaviours the school wants to
encourage, such as good vocal projection when answering a question,
asking good questions and giving good explanations.  And these were
always accompanied by brief warm words.  This contrasts so
dramatically with my experience at “W”, where though some teachers
managed their classes well, there wasn’t anything close to a
consistent school-wide system at this level of detail.  There is
clearly a benefit to be gained from having such an consistently
enforced system throughout the school, though it is tough for
teachers.  (Mind you, it is not as tough as teaching in a school where
students throw things at teachers on a semi-regular basis.)</p>

<p>The most challenging class I saw was a small bottom-set year 10 class,
several of whom had already been permanently excluded from one or more
other schools.  Yet there they were, behaving and mostly participating
in the lesson, learning and targeting a grade 4 or 5 at GCSE
Mathematics.  Wow.  At “W”, lower-middle sets were only targeting a
grade D (on the old system, the equivalent of a grade 3 on the new
system), and most of them did not achieve even that.  The contrast
could not be greater.</p>

<p>A few things struck me immediately during the day, without even
entering a classroom.  The first was the immaculate state of the
building: not a speck of litter to be seen during the course of the
day.  This is in stark contrast to most of the schools I’ve taught in
and visited over the years, and vastly different from “W”.  The
students have clearly been taught to respect their environment.</p>

<p>Is the school’s approach a good thing?  This is a difficult question
for me to answer.  I certainly had a sense that the school was
infusing students with British culture (whatever that means), and yet
for students living in the UK, is this not a good thing?  It will give
them significant (British) cultural capital on which they will be able
to draw in later life, and which they might well otherwise not gain.</p>

<p>On the other hand, students are constantly being watched, as are
staff: for example, visitors (with DBS certificates) are permitted to
just walk into any lesson, yet the teachers and students generally
didn’t bat an eyelid when I quietly walked in.  Yet this significantly
reduces the chances of bullying and destructive behaviour: there are
no “safe spaces” within the school for bullying or other damaging
behaviour to take place without a teacher seeing.</p>

<p>I observed parts of about eight maths lessons during the day (as well
as a smattering of other subjects).  My concern was that they would be
very procedural in nature, given the rigidity of the school system.
However, I was pleasantly surprised: while they were clearly
teacher-led, the questioning did include a good mix of knowledge and
deeper understanding questions.  For example, in a lesson on
Pythagoras’s Theorem, there were early questions designed to ensure
that the students knew which side was the hypotenuse, and later
questions which required more thought, such as “If I have a triangle
with side lengths 6, 7, 8, can I draw a right-angle here?”  (I am not
overly concerned with Year 8 students not clearly distinguishing
between Pythagoras’s Theorem and its converse.  Students were spending
enough effort getting to grips with what the question meant.)  Some
time was spent working on questions from their workbooks, but this was
far from the majority of the time.</p>

<p>During one of the lessons, students were asked to read out from their
workbooks.  I was surprised - though I probably should not have been -
at how difficult they found it to read technical vocabulary; how often
do we ask our students to read a piece of technical material?</p>

<p>There were also opportunities for discussion in pairs; these were
short and effective, and the students were continually encouraged to
use the time productively, as anyone could be picked on to answer a
question after the discussion time.</p>

<p>Dani spoke much more about the planning process and lesson structure
at Michaela in her podcast, which was fascinating, so I won’t say more
about it here.</p>

<hr />

<p>Returning home and reflecting, I have two big questions about this
model, and in particular with regard to maths.  The first (somewhat
mathematics-specific) question is whether students get enough
opportunity to think about any (mathematics) problem for a protracted
period of time.  Hearing about other countries’ approaches, I wonder
whether this is a potentially missed opportunity, especially once
behaviour is so well-managed that there is a good learning atmosphere.</p>

<p>The second, more pervasive question, is about the use of streaming
within the school.  (“Streaming” means that students are put in groups
which are dependent upon their academic performanace across a number
of subjects.  They remain in these groups for all of their subjects.
As far as I could tell, it is used in Years 7-9 and possibly in Year
10 as well.)  It is very effective for behaviour management, as the
entire class is together the whole time, including at lesson
changeover time.  However, I am very unconvinced that it is good for
equity, which is part of the school’s mission.  Hearing of the
experiences of primary and secondary schools which have moved away
from ability (or better: attainment) grouping to mixed-attainment
grouping, one has to ask whether this would be better for the majority
of the students within the school, certainly at Key Stage 3 (11-14
year olds) and possibly older too.  Teachers’ academic expectations of
students are lower when they are teaching lower-attaining groups, and
I strongly doubt that Michaela’s excellent teachers are any less
affected by this.</p>

<p>And would I consider teaching at Michaela?  I’m not sure it would be
the “right” school for me, but I would take it over “W” any day.</p>

<hr />

<p>Finally, the “family lunch”.  The initial poetry reading was like
being at a summer youth camp: the energy, enthusiasm and fun were
palpable.  There was quite a buzz in the room during this!  And the
discussion over lunch - this time about volunteering, in light of the
outstanding work of volunteers in the Thailand cave rescue - was
fascinating.</p>

<p>It was a pleasure to visit the school, and I thank the staff for being
so open and welcoming.  I look forward to hearing of their results,
both academic and beyond, in the years to come.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>Small angle approximations - an application</title>
  <link>https://blog.d-and-j.org/mathematics/teaching/ks5/2018/07/08/small-angles-application.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2018-07-08T22:00:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/mathematics/teaching/ks5/2018/07/08/small-angles-application.html</guid>
  <description><![CDATA[
     <p>I thought a bit more about my <a href="https://blog.d-and-j.net/mathematics/teaching/ks5/2018/07/05/small-angles.html">previous
post</a> on small
angle approximations, and decided it might be helpful to describe an
application of the small angle approximations.  While this example
contains non-examinable aspects (at least in single maths A-level),
the context should be fairly familiar (or can easily be demonstrated),
and the mathematics is accessible to single maths students (at least
as a demonstration).  It also ties together ideas from mechanics and
pure maths, so is helpful in this regard.</p>

<p>The question is: what is the period of a pendulum?</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles-application/old-clock.jpg" alt="" class="center-image" /></p>

<p>We can model the pendulum as a thin rod (inextensible and rigid) of
length $L$, freely pivoted about a point $O$, with a single point mass
$P$ of mass $m$ on the end of the rod, as shown here (where $T$ is the
tension in the rod):</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles-application/pendulum-forces.svg" alt="" class="center-image" /></p>

<p>The velocity and acceleration of $P$ are as follows, where
$\dot\theta$ means $\dfrac{d\theta}{dt}$ and $\ddot\theta$ means
$\dfrac{d^2\theta}{dt^2}$; a derivation of these can be found at the
end:</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles-application/pendulum-velocity.svg" alt="" class="center-image" /></p>

<p>We can now apply Newton’s second law (“$F=ma$”) to the situation:
working perpendicular to the rod, this gives
$-mg\sin\theta=mL\ddot\theta$ (the minus sign is because the component
of the force $mg$ is in the opposite direction to the $L\ddot\theta$
on our diagram).  Rearranging this, we get the differential equation:</p>

\[\dfrac{d^2\theta}{dt^2}=-\dfrac{g}{L}\sin\theta.\]

<p>Unfortunately, this equation is impossible to solve in terms of simple
functions.  But if we <strong>assume that the swing of the pendulum is
small</strong>, so that $\theta$ is small, then we can approximate
$\sin\theta$ by $\theta$, and our differential equation becomes</p>

\[\dfrac{d^2\theta}{dt^2}=-\dfrac{g}{L}\theta.\]

<p>This differential equation (an example of simple harmonic motion) has
a solution</p>

\[\theta=A \sin\left(\sqrt{\dfrac{g}{L}}\,t\right)\]

<p>(which is easy to check), where $A$ is the amplitude (maximum angle)
of the swing.  The period of this swing is $2\pi\sqrt{\dfrac{L}{g}}$,
which is independent of the amplitude and the mass at the end of the
rod!  So as long as the swing is relatively small, the period is only
dependent upon the length of the pendulum (and the acceleration due to
gravity), which is likely to be a surprising result the first time it
is met.  This would have had great significance for clock-makers in
times gone by.</p>

<hr />

<h3 id="deriving-the-formulae-for-the-velocity-and-acceleration-of-p">Deriving the formulae for the velocity and acceleration of $P$</h3>

<p>We can work out the velocity and acceleration of $P$ in several
different ways.  One way is to use coordinates, where $O$ is the
origin, and the vertical line is the $y$-axis.  Then when $P$ is at an
angle of $\theta$, it has a position vector of</p>

\[\mathbf{r}=\begin{pmatrix} L\sin\theta \\
-L\cos\theta\end{pmatrix}.\]

<p>A unit vector in the direction of $\overrightarrow{OP}$ is</p>

\[\mathbf{e}_r=\begin{pmatrix}\sin\theta\\ -\cos\theta\end{pmatrix},\]

<p>and a unit vector perpendicular to this in the direction of increasing
$\theta$ is</p>

\[\mathbf{e}_\theta=\begin{pmatrix}\cos\theta \\
\sin\theta\end{pmatrix},\]

<p>as shown in this diagram:</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles-application/pendulum-components.svg" alt="" class="center-image" /></p>

<p>The velocity of $P$ can be found by differentiating $\mathbf{r}$ with
respect to time, giving:</p>

\[\dot{\mathbf{r}}=\begin{pmatrix} L\cos\theta.\dot\theta \\
L\sin\theta.\dot\theta\end{pmatrix}
= L\dot\theta\mathbf{e}_\theta\]

<p>Then the acceleration can be found by differentiating again (using the
product rule on both of the components of $\dot{\mathbf{r}}$) to
obtain:</p>

\[\ddot{\mathbf{r}}=\begin{pmatrix} -L\sin\theta.\dot\theta^2 +
L\cos\theta.\ddot\theta \\ L\cos\theta.\dot\theta^2 +
L\sin\theta. \ddot\theta \end{pmatrix}
= -L\dot\theta^2 \mathbf{e}_r + L\ddot\theta \mathbf{e}_\theta.\]

<p>These are the components of the velocity and acceleration shown
above.</p>

<p>Without as much rigour, one could observe that the distance of $P$
along the circumference of the circle is given by $L\theta$, so it is
reasonable to suggest that its speed is $L\dot\theta$ (as $L$ is a
constant).  Then the acceleration in this direction is plausibly
$L\ddot\theta$, while the radial acceleration - which we are not
interested in for this application - is a result of the velocity
changing direction.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>Small angle approximations</title>
  <link>https://blog.d-and-j.org/mathematics/teaching/ks5/2018/07/05/small-angles.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2018-07-05T22:10:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/mathematics/teaching/ks5/2018/07/05/small-angles.html</guid>
  <description><![CDATA[
     <p>At a conference run by the <a href="https://bbomathshub.org.uk/">BBO Maths
Hub</a> today, <a href="http://www.resourceaholic.com/">Jo
Morgan</a> mentioned that small angle
approximations are a topic recently (re)introduced to the single maths
A-level course, and many teachers may be unfamiliar with it.</p>

<p>During the day and on my journey home, I thought about this and some
of the connections between it and other areas of the syllabus.  So
here are a few quick thoughts on ways we could think about them,
making connections between this and other areas of the syllabus.  I
hope that this post offers some different perspectives on the topic.</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles/circle.svg" alt="" class="center-image" /></p>

<p>This is a diagram probably familiar from most A-level textbooks (I
don’t have one to hand, unfortunately).  We have our familiar unit
circle, and draw a right-angled triangle with angle $\theta$, opposite
$\sin\theta$ and adjacent $\cos\theta$.  We also see that the arc
length subtended by the angle $\theta$ is $r\theta=\theta$ as the
radius is 1.  (We must be working in radians for this to be correct!)
Already in this diagram, $\sin\theta$ and $\theta$ do not look very
different, so $\sin\theta\approx\theta$.  On the other hand,
$\cos\theta$ looks pretty close to $1$, so we have
$\cos\theta\approx1$.  Visually, say using GeoGebra, we see that these
approximations get better as $\theta$ gets smaller: the arc and the
half-chord become closer and closer to each other.  We can then work
out $\tan\theta=\dfrac{\sin\theta}{\cos\theta}\approx
{\theta}{1}=\theta$.</p>

<p>Another way of seeing this approximation to $\tan\theta$ is to draw
the triangle with adjacent equal to $1$:</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles/circletan.svg" alt="" class="center-image" /></p>

<p>If we take $\sin\theta\approx\theta$, then we can work out a better
approximation for $\cos\theta$ using the binomial theorem.  We have,
for small $\theta$ (positive or negative):</p>

\[\begin{align*}
  \cos\theta &amp;= \sqrt{1-\sin^2\theta} \\
  &amp;\approx \sqrt{1-\theta^2} \\
  &amp;= (1-\theta^2)^{\frac{1}{2}} \\
  &amp;= 1 - \tfrac{1}{2}\theta^2 + \cdots
\end{align*}\]

<p>where we have used the first two terms of the binomial expansion on
the last line.  So $\cos\theta\approx 1-\frac{1}{2}\theta^2$.</p>

<p>Another way of obtaining the approximation for $\cos\theta$ is to
relate cos and sin using a double-angle formula:</p>

\[\cos 2\theta = 1 - 2\sin^2\theta\]

<p>so</p>

\[\begin{align*}
  \cos\theta &amp;= 1 - 2 \sin^2 \tfrac{1}{2}\theta \\
  &amp;\approx 1 - 2 \bigl(\tfrac{1}{2}\theta\bigr)^2 \\
  &amp;= 1 - \tfrac{1}{2}\theta^2
\end{align*}\]

<p>where we have used $\sin\tfrac{1}{2}\theta\approx\tfrac{1}{2}\theta$
on the second line.</p>

<p>The approximations for $\sin\theta$ and $\tan\theta$ are also closely
related to the shape of their graphs near the origin (though there is
potentially some circular reasoning here - no pun intended!):</p>

<p><img src="https://blog.d-and-j.net/assets/small-angles/sintan.svg" alt="" class="center-image" /></p>

<p>We have drawn the graphs of $y=x$ (red), $y=\sin x$ (green) and
$y=\tan x$ (blue).  Near the origin, the three graphs look very
similar, so for small $x$, $\sin x\approx x \approx \tan x$.</p>

<p>This also tells us that at the origin, $\frac{d}{dx}(\sin x)$ and
$\frac{d}{dx}(\tan x)$ equal $1$.</p>

<p>We can also argue in the opposite direction.  If we have already
convinced ourselves why the derivative of $\sin x$ is $\cos x$ using a
different approach (for example, by using <a href="https://undergroundmathematics.org/calculus-trig-log/rotating-derivatives">Rotating
derivatives</a>),
then we can say that for small values of $x$, the graph of $y=\sin x$
is approximated by the tangent to the graph at $x=0$ (see <a href="https://undergroundmathematics.org/introducing-calculus/a-tangent-is">A tangent
is…</a>
for more on this point).  We can calculate the tangent: since
$\frac{d}{dx}(\sin x)=\cos x$ giving $\cos 0 = 1$, and $\sin 0 = 0$, the
tangent has equation $y=x$.  So for small $x$, $\sin x\approx x$.</p>

  ]]></description>
</item>

	

	
		<item>
  <title>Dividing fractions</title>
  <link>https://blog.d-and-j.org/mathematics/teaching/ks2/ks3/2018/06/20/dividing-fractions.html</link>
  <author>Julian Gilbey</author>
  <pubDate>2018-06-20T08:05:00+01:00</pubDate>
  <guid>https://blog.d-and-j.net/mathematics/teaching/ks2/ks3/2018/06/20/dividing-fractions.html</guid>
  <description><![CDATA[
     <p>Why is it that</p>

\[\frac{3}{5}\div\frac{2}{3} = \frac{3}{5}\times\frac{3}{2},\]

<p>or as the rule that students are frequently taught: “turn the second
fraction upside-down and multiply”?</p>

<p>I’ve been inspired to revisit this question after listening to Ed
Southall talking on
<a href="http://www.mrbartonmaths.com/blog/ed-southall-solvemymaths-and-mathematics-pgce-tutor/">Mr Barton’s Maths Podcast</a>,
where he mentioned this question.</p>

<p>In this post I suggest a teaching sequence which might lead to an
understanding of the rule above, as well as a procedural knowledge of
how to perform the rule.</p>

<h2 id="some-comments-on-a-familiar-approach">Some comments on a familiar approach</h2>

<p>I have seen textbooks and websites explain the rule for division of
fractions by talking about how many times we can fit $\frac{1}{3}$
into $\frac{4}{5}$, say, but that seems to me to be quite challenging:
students have to hold on to several ideas at once, and make sense of
diagrammatic representations at the same time as trying to think about
what division means.  It also becomes very hard as the fractions
become more complicated.  In my experience, few students develop a
solid understanding through this approach: they either get lost in the
reasoning or they resort to following a rule.</p>

<p>This problem ties in quite neatly with some things I have recently
read, in particular:</p>

<ul>
  <li><a href="http://www.jamestanton.com/">James Tanton’s</a> post
  <a href="https://medium.com/q-e-d/the-unreasonableness-of-k-12-mathematics-7fd234f25135">The Unreasonableness of K-12 Mathematics</a>,
  in which he gives an idealised description of the development of
  the concept of number.</li>
  <li>Liping Ma’s book “Knowing and teaching elementary mathematics”, in
  which US and Chinese teachers’ understanding of this rule is
  compared.</li>
  <li>John Mighton, the founder of <a href="https://www.jumpmath.org/">JUMP Math</a>,
  wrote
  <a href="https://www.jumpmath.org/jump/en/philosophy">The end of ignorance</a>;
  he observes there that <em>meaningful</em> symbolic manipulation can
  precede both an attempt to explain an idea or technique in
  everyday terms, and the development of understanding; moreover,
  understanding can emerge <em>from</em> the manipulations if examples are
  well-chosen and students are given the opportunity to reflect.</li>
</ul>

<h2 id="an-overview-of-the-idea">An overview of the idea</h2>

<p>The calculation $8-5$ means “what number $\square$ makes $\square+5=8$
true?”  Similarly, when we write $12\div 3$, we mean “what number
$\square$ makes $\square\times3=12$ true?”  This says that division is
the inverse of multiplication.  (More precisely, for each non-zero
number $c$, dividing by $c$ is the inverse of multiplying by $c$.)
The same applies to division of fractions:
$\frac{3}{5}\div\frac{2}{3}$ means “what number $\square$ makes
$\square\times\frac{2}{3}=\frac{3}{5}$ true?”</p>

<p>Once we notice that $\frac{3}{2}\times\frac{2}{3}=1$, we can then
multiply both sides of this equation by $\frac{3}{5}$ to obtain</p>

\[\frac{3}{5}\times\frac{3}{2}\times\frac{2}{3}=\frac{3}{5}.\]

<p>Therefore $\square$ must be $\frac{3}{5}\times\frac{3}{2}$, or</p>

\[\frac{3}{5}\div\frac{2}{3} = \frac{3}{5}\times\frac{3}{2}.\]

<p>This method will work for any fraction division question, and so these
steps give us our familiar rule: “turn the divisor upside-down and
multiply”.</p>

<h2 id="a-possible-teaching-sequence">A possible teaching sequence</h2>

<p>What follows is a suggestion for how these ideas could be introduced
over a sequence of lessons, which could span several months or even
years.  This offers students the chance to revisit the ideas again and
again, thereby reinforcing them, as well as building up stronger
connections and a deeper understanding.  In the later steps, I assume
that students can multiply fractions.</p>

<p>All of the questions below are available in
<a href="https://blog.d-and-j.net/assets/dividing-fractions/dividing-fractions.docx">this Word document</a>.</p>

<h3 id="step-1-what-is-subtraction">Step 1: What is subtraction?</h3>

<p>We begin by asking students what other number statements they can
deduce from $3+5=8$.  There are many possible answers (such as
$30+50=80$), and here we highlight those obtained by rearranging the
numbers.  (These could be encouraged by a question such as “Using only
the numbers 3, 5 and 8, what other number statements can you get from
$3+5=8$?”)  Three key statements are:</p>

\[8-3=5; \qquad 8-5=3; \qquad 5+3=8\]

<p>as well as the same statements written the other way round, such as
$5=8-3$; we won’t mention these reversed statements again here.</p>

<p>The last of these three statements says that addition is
<em>commutative</em>: the order of adding does not matter.  The other two say
that subtraction is the inverse of addition: the three problems</p>

\[5+\square=8,\qquad \square+5=8 \qquad \text{and} \qquad
8-5=\square\]

<p>are equivalent, as are similiar problems about $8-3=\square$.  Making
this connection explicit would be beneficial, especially in relation
to the later parts of this sequence of steps.</p>

<p>Students could then be asked to write statements equivalent to
statements such as $10-3=\square$ to reinforce this idea.</p>

<p>This idea may well have already been introduced via a bar model
approach or using Cuisenaire rods or suchlike.</p>

<p>It is useful to recognise that it doesn’t matter whether we are
working with whole numbers, directed numbers, fractions or whatever:
subtraction always has this meaning, so returning to this idea
periodically will benefit students’ understanding.</p>

<h3 id="step-2-and-what-is-division">Step 2: And what is division?</h3>

<p>This is the parallel of Step 1 for multiplication and division.  What
can be deduced from $3\times4=12$?  This again leads to interesting
points such as why $30\times40=120$ is an incorrect statement, whereas
$30+50=80$ is correct.  But for our current purposes, the key
deductions are again those obtained by rearrangement:</p>

\[12\div4=3; \qquad 12\div3=4; \qquad 4\times3=12.\]

<p>As before, we see that multiplication is commutative and that division
is the inverse of multiplication.  In particular, this means that
answering the question $12\div4=\square$ is the same as filling in the
missing number in $4\times\square=12$; asking students to make
deductions from $12\div4=\square$, as above, will reinforce this idea.</p>

<h3 id="step-3-1-divided-by-a-unit-fraction">Step 3: 1 divided by a unit fraction</h3>

<p>A key part of this approach is to learn about reciprocals of
fractions.  We start with the reciprocals of unit fractions.</p>

<p>For this missing-number problem, I would suggest asking students to
work on this themselves rather than showing them how to do the first
one.  (I am assuming that they already know enough about fractions to
work out the answers to these questions.)</p>

\[\begin{align*}
\frac{1}{2}\times \square &amp;= 1 \\
\frac{1}{3}\times \square &amp;= 1 \\
\frac{1}{4}\times \square &amp;= 1
\end{align*}\]

<p>Students should spot the pattern.  Following this by asking questions
such as $\frac{1}{82}\times\square = 1$ can help them to realise that
they can now do some very complicated-sounding questions, even if they
can’t imagine what $\frac{1}{82}$ of a cake might look like.  (I was
reminded of this approach by John Mighton’s book.)</p>

<p>Students should then connect this back to the earlier steps, by asking
them to rearrange $\frac{1}{2}\times2=1$.  This will allow students to
(re)discover that $1\div 2=\frac{1}{2}$ (and similarly for the other
statements); this can also be used to reinforce the idea that a
fraction such as $\frac{1}{2}$ just means “1 divided by 2”.  (The
division symbol itself suggests this: $\div$ is just a fraction with
dots in place of actual numbers.)  Another way of rearranging the
number statement gives $1\div\frac{1}{2}=2$, which could be related to
the “practical” meaning of division: there are 2 halves in a whole.</p>

<h3 id="step-4-turning-a-general-fraction-into-an-integer">Step 4: Turning a general fraction into an integer</h3>

<p>It might be too big of a jump for some students to go straight to
finding the reciprocal of a general fraction, so this step provides a
structured intermediate step, once they are developing some confidence
with the above idea.</p>

<p>Here is a second sequence of missing-number problems:</p>

\[\begin{align*}
\frac{2}{3}\times \square &amp;= 2 \\
\frac{2}{5}\times \square &amp;= 2 \\
\frac{3}{5}\times \square &amp;= 3
\end{align*}\]

<p>Once students have worked out answers to these (and perhaps adding a
few more similar examples), either ask them to generalise by making up
their own similar examples, or ask superficially harder questions such
as $\frac{74}{133} \times \square = 74$, so that the structure becomes
clear.</p>

<p>Asking students to rearrange these statements once again results in
statements like $2\div3 = \frac{2}{3}$ (further reinforcing the
division idea) and $2\div \frac{2}{3} = 3$.</p>

<h3 id="step-5-finding-reciprocals">Step 5: Finding reciprocals</h3>

<p>A useful preparatory question before this step would be something
like: “If you know that $96\times 48=4608$, then what is the missing
number in $96\times \square = 2304$?”  This recalls the idea that we
can divide the product by 2 by dividing the multiplicand (or
multiplier) by 2.  (The use of two-digit numbers is designed to
discourage students from doing a division!)</p>

<p>In this step, we replace the integers on the right-hand sides of the
previous set of questions with 1:</p>

\[\begin{align*}
\frac{2}{3}\times \square &amp;= 1 \\
\frac{2}{5}\times \square &amp;= 1 \\
\frac{3}{5}\times \square &amp;= 1
\end{align*}\]

<p>If students cannot work out how to answer the first question, it would
be helpful to remind them of their answer to $\frac{2}{3}\times
\square = 2$.  Tying this to the preparatory question above should
help them get to the answer.</p>

<p>Again, students can be invited to generalise at this point, or to
answer a question like the one in the previous step: $\frac{74}{133}
\times \square = 1$.  Also, it is helpful to then rearrange these
results; we have $1\div \frac{2}{3} = \frac{3}{2}$, and we are seeing
the first clear case of turning fractions upside-down.</p>

<p>After these, it could be interesting to also revisit unit fractions:
following the same pattern that we have seen, how else could the
answer to $\frac{1}{3}\times \square = 1$ be written, besides as $3$?</p>

<h3 id="step-6-dividing-fractions">Step 6: Dividing fractions</h3>

<p>Before working on the full-blown division of fractions, it would be
useful to preface it by another relevant rearranging activity: how can
the number statement $2\times 3\times 4=24$ be rearranged, while
keeping all of the numbers involved the same?  This gives rise to a
number of statements, such as:</p>

\[24\div 4 = 2\times 3; \qquad \frac{24}{2\times 3}=4; \qquad
4\times 2\times 3 = 24.\]

<p>This may cause some difficulty and lead to some interesting class
discussions.</p>

<p>And now we can build on the ideas developed in Step 5.  How could we
complete the following statements?</p>

\[\begin{align*}
\frac{2}{3}\times \frac{3}{2} \times \square &amp;= \frac{4}{5} \\
\frac{2}{5}\times \frac{5}{2} \times \square &amp;= \frac{1}{3} \\
\frac{3}{5}\times \frac{5}{3} \times \square &amp;= \frac{7}{2} \\
\frac{1}{3}\times \frac{3}{1} \times \square &amp;= \frac{3}{4} \\
2\times \frac{1}{2} \times \square &amp;= \frac{1}{5}
\end{align*}\]

<p>A prompting question, if needed, is “What is $\frac{2}{3}\times
\frac{3}{2}$?”</p>

<p>And then what about these, where the two squares should be filled in
following the pattern we have just seen?</p>

\[\begin{align*}
\frac{3}{4}\times \square \times \square &amp;= \frac{2}{5} \\
\frac{2}{7}\times \square \times \square &amp;= \frac{4}{3} \\
\frac{1}{5}\times \square \times \square &amp;= \frac{3}{2} \\
4\times \square \times \square &amp;= \frac{3}{2}
\end{align*}\]

<p>Once students feel competent at these, ask how they can use these
to work out:</p>

\[\begin{align*}
\frac{2}{5} &amp;\div \frac{3}{4} \\
\frac{4}{3} &amp;\div \frac{2}{7} \\
\frac{3}{2} &amp;\div \frac{1}{5} \\
\frac{3}{2} &amp;\div 4
\end{align*}\]

<p>And with this, students have reached a point where the rule for
dividing by a fraction will make some sense: we multiply the
reciprocal of the divisor (so as to get 1 when it is multiplied by the
divisor itself) by the dividend, which is our well-known rule.</p>

  ]]></description>
</item>

	

	

	

	

	

</channel>
</rss>
