In the last episode we walked through the code that initializes a neural network in Karmen Blake's neural_network_elixir library. Today we're going to see how the network is trained. Let's get to it.

Project

Again, we're starting with a function call in our or mix task:

vim lib/mix/tasks/or.ex

After initializing the network with our desired topology, we pass it some training data and tell it to train for a certain number of iterations, and we tell it how frequently to log its progress.

    Trainer.train(network_pid, training_data(), %{epochs: epoch_count, log_freqs: 1000})

What does Trainer.train do?

defmodule NeuralNetwork.Trainer do
  @moduledoc """
  Runs a network as classified by data its given.
  """

  alias NeuralNetwork.{Network}

  def train(network_pid, data, options \\ %{}) do
    epochs      = options.epochs
    log_freqs   = options.log_freqs
    data_length = length(data)

    for epoch <- 0..epochs do
      average_error = Enum.reduce(data, 0, fn sample, sum ->
        # sum weighted inputs to produce output value of network
        # that output will be compared with target output to find the delta
        network_pid |> Network.get |> Network.activate(sample.input)

        # Backpropagation
        network_pid |> Network.get |> Network.train(sample.output)

        sum + Network.get(network_pid).error/data_length
      end)

      if rem(epoch, log_freqs) == 0 || epoch + 1 == epochs do
        IO.puts "Epoch: #{epoch}   Error: #{unexponential(average_error)}"
      end
    end
  end

  defp unexponential(average_error) do
    :erlang.float_to_binary(average_error, [{:decimals, 19}, :compact])
  end
end

Here, for every iteration we reduce across all of the input data, activating the network on a given entry and then training it to have the expected output. Each entry contributes a tiny bit to the overall average error calculated. We then print the average error if we're meant to log this iteration.

Let's look at the two things that happen each iteration:

  • We activate the network with the sample data's inputs.
  • We train the network with the sample data's expected outputs.

Activating the network with the sample inputs

To activate the network, we call Network.activate(network, sample.input). What does that look like?

defmodule NeuralNetwork.Network do
  # ...
  @doc """
  Activate the network given list of input values.
  """
  def activate(network, input_values) do
    network.input_layer |> Layer.activate(input_values)

    Enum.map(network.hidden_layers, fn hidden_layer ->
      hidden_layer |> Layer.activate
    end)

    network.output_layer |> Layer.activate
  end
  # ...
end

For the input layer, Layer.activate(input_values) is called. Then each hidden layer is activated. Finally, the output layer is activated.

Activating a Layer

What does activating a layer do? Let's look:

defmodule NeuralNetwork.Layer do
  # ...
  @doc """
  Activate all neurons in the layer with a list of values.
  """
  def activate(layer_pid, values \\ nil) do
    layer  = get(layer_pid)
    values = List.wrap(values) # coerce to [] if nil

    layer.neurons
    |> Stream.with_index
    |> Enum.each(fn(tuple) ->
         {neuron, index} = tuple
         neuron |> Neuron.activate(Enum.at(values, index))
       end)
  end
end

The values are wrapped in a list, and then each neuron in the layer is activated with the value from the list that has the same index.

Activating a Neuron

What does activating a neuron look like?

defmodule NeuralNetwork.Neuron do
  # ...
  @doc """
  Activate a neuron. Set the input value and compute the output
  Input neuron: output will always equal their input value
  Bias neuron: output is always 1.
  Other neurons: will squash their input value to compute output
  """
  def activate(neuron_pid, value \\ nil) do
    neuron = get(neuron_pid) # just to make sure we are not getting a stale agent

    fields = if neuron.bias? do
      %{output: 1}
    else
      input = value || Enum.reduce(neuron.incoming, 0, sumf())
      %{input: input, output: activation_function(input)}
    end

    neuron_pid |> update(fields)
  end
  # ...
end

Ignoring bias neurons, which always output 1, activating a neuron means reducing across all of the incoming connections through the sumf function. Then the output value will be set to the activation_function, passed the input.

sumf

What does sumf do?

defmodule NeuralNetwork.Neuron do
  # ...
  defp sumf do
    fn(connection_pid, sum) ->
      connection = Connection.get(connection_pid)
      sum + get(connection.source_pid).output * connection.weight
    end
  end
  # ...
end

It gets the connection's source neuron output, and multiplies it by the connection's weight. This is summed across all of the inputs, remember.

activation_function

What does the activation_function do?

defmodule NeuralNetwork.Neuron do
  # ...
  @doc """
  Sigmoid function. See more at: https://en.wikipedia.org/wiki/Sigmoid_function

  ## Example

      iex> NeuralNetwork.Neuron.activation_function(1)
      0.7310585786300049
  """
  def activation_function(input) do
    1 / (1 + :math.exp(-input))
  end
  # ...
end

It's a Sigmoid Function. In general, this is an S-shaped function that increases monotonically. There are a few different options available. This library uses a logistic function for its sigmoid. There's a note in the wikipedia page for this activation function that another sigmoid function can be used for faster convergence with backpropagation, which I find intriguing...

There's still a lot to unpack here, but at this point we've finished activating the network. Next, we'll move on to training it.

Training the network on the expected outputs

To train the network, we call Network.train(network, sample.output). What does that do?

defmodule NeuralNetwork.Network do
  # ...
  @doc """
  Set the network error and output layer's deltas propagate them
  backward through the network. (Back Propogation!)

  The input layer is skipped (no use for deltas).
  """
  def train(network, target_outputs) do
    network.output_layer |> Layer.get |> Layer.train(target_outputs)
    network.pid |> update(%{error: error_function(network, target_outputs)})

    network.hidden_layers
    |> Enum.reverse
    |> Enum.each(fn layer_pid ->
      Layer.get(layer_pid) |> Layer.train(target_outputs)
    end)

    network.input_layer |> Layer.get |> Layer.train(target_outputs)
  end
  # ...
end

This trains each layer in reverse. This is part of backpropagation. I think there's a little problem with the way backpropagation is implemented in the library right now.

I can demonstrate the issue by creating an xor network. It should be impossible to find convergence when training xor in a simple neural network with no hidden layers, but adding a hidden layer with two nodes should make the problem solvable. Let's try it:

vim lib/mix/tasks/xor.ex
defmodule Mix.Tasks.Xor do
  @moduledoc """
  Mix task to run the neural network for xor.
  """

  use Mix.Task

  alias NeuralNetwork.{Network, Trainer, Layer, Neuron}
  @shortdoc "Run the neural network app"

  def run(_args) do
    epoch_count = 10_000

    IO.puts ""

    {:ok, network_pid} = Network.start_link([2,2,1])
    Trainer.train(network_pid, training_data(), %{epochs: epoch_count, log_freqs: 1000})
    IO.puts "**************************************************************"
    IO.puts ""
    IO.puts "== XOR =="
    examine(network_pid, [0, 0])
    examine(network_pid, [0, 1])
    examine(network_pid, [1, 0])
    examine(network_pid, [1, 1])
  end

  def training_data do
    [
      %{input: [0,0], output: [0]},
      %{input: [0,1], output: [1]},
      %{input: [1,0], output: [1]},
      %{input: [1,1], output: [0]}
    ]
  end

  defp examine(network_pid, inputs) do
    val = network_pid |> Network.get |> Network.activate(inputs)
    network = Network.get(network_pid)
    IO.inspect network
    network.hidden_layers
      |> Enum.map(&(Layer.get(&1)))
      |> Enum.flat_map(&(&1.neurons))
      |> Enum.map(&(Neuron.get(&1).output))
      |> IO.inspect
    output_layer = Layer.get(network.output_layer)
    outputs =
      output_layer.neurons
      |> Enum.map(&(Neuron.get(&1).output))
    IO.puts "#{inspect inputs} => #{inspect outputs}"
  end
end

We'll try to run it:

λ mix xor

Epoch: 0   Error: 0.1605960033461333869
Epoch: 1000   Error: 0.1347009580731972356
Epoch: 2000   Error: 0.134700648768427661
Epoch: 3000   Error: 0.1347026881121198005
Epoch: 4000   Error: 0.1347048670963744699
Epoch: 5000   Error: 0.134706362829301457
Epoch: 6000   Error: 0.1347067957297368168
Epoch: 7000   Error: 0.1347063703705637105
Epoch: 8000   Error: 0.1347056149922223889
Epoch: 9000   Error: 0.1347049577197416692
Epoch: 9999   Error: 0.1347045567174682545
Epoch: 10000   Error: 0.1347045564423462183
**************************************************************

== XOR ==
%NeuralNetwork.Network{error: 0.1451653215641648,
 hidden_layers: [#PID<0.137.0>], input_layer: #PID<0.134.0>,
 output_layer: #PID<0.140.0>, pid: #PID<0.133.0>}
[0.002208571680283131, 0.002208571680283131, 1]
[0, 0] => [0.4984856452195869]
%NeuralNetwork.Network{error: 0.1451653215641648,
 hidden_layers: [#PID<0.137.0>], input_layer: #PID<0.134.0>,
 output_layer: #PID<0.140.0>, pid: #PID<0.133.0>}
[0.001048934610067151, 0.001048934610067151, 1]
[0, 1] => [0.4984903701709742]
%NeuralNetwork.Network{error: 0.1451653215641648,
 hidden_layers: [#PID<0.137.0>], input_layer: #PID<0.134.0>,
 output_layer: #PID<0.140.0>, pid: #PID<0.133.0>}
[0.00249330879934496, 0.00249330879934496, 1]
[1, 0] => [0.4984844850557456]
%NeuralNetwork.Network{error: 0.1451653215641648,
 hidden_layers: [#PID<0.137.0>], input_layer: #PID<0.134.0>,
 output_layer: #PID<0.140.0>, pid: #PID<0.133.0>}
[0.0011843445439997438, 0.0011843445439997438, 1]
[1, 1] => [0.49848981844199985]

Here we can see that the network never converged. This was my first hint that there was something fishy with the hidden layers here. I am going to look into the issue, but I also included reading material in the resources if someone else wanted to get their hands dirty and track down the issue.

Summary

This week we had a brief overview of neural networks in Elixir, and we ended up with a library that's almost there but doesn't quite seem finished. Elixir and Erlang are great languages for Neural Networks in general, and there is a lot more material out there. I've included some random links that I came across when researching the topic in the resources section below, if you wanted to dig deeper.

See you soon!

Resources