So I was watching Robert Virding's talk, the Hitchhiker's Tour of the BEAM and he had a slide listing the things that can crash the BEAM, and I thought it might make a useful episode to trigger all of these failures so we became familiar with our failure modes. He lists a few that should be fairly easy to trigger. They are:

  • Filling the atom table
  • Overflowing binary space
  • Uncontrolled process heap growth
  • Errors in NIFs and linked-in port drivers

There's also the obvious 'too many processes' failure mode, which he didn't mention.

Let's trigger each of these except the NIF and port driver use cases, just because we haven't covered those yet. Let's get started.

Project

We'll kick off a new project with examples for each use case.

mix new crashing_the_beam
cd crashing_the_beam
mkdir examples

Filling the Atom Table

We'll start by filling the atom table.

vim examples/filling_the_atom_table.exs
defmodule FillerUp do
  # We'll just loop, making as many atoms as we can.
  def start do
    loop(0)
  end

  def loop(n) do
    # And let's output the number we're on so we get a good count
    IO.puts n
    :"#{n}"
    loop(n+1)
  end
end

FillerUp.start

Right then, let's see this failure mode in action.

mix run examples/filling_the_atom_table.exs
...
1036040
1036041
1036042
1036043

Crash dump is being written to: erl_crash.dump...*** stack smashing detected ***: /home/jadams/erlang/erlangs/18.0/erts-7.0/bin/beam.smp terminated
Aborted (core dumped)

OK, so this is apparently referred to as stack smashing. Neat, we've got our first failure mode. Let's move on to overflowing binary space.

Overflowing Binary Space

So to try to overflow binary space, we'll just use erlang's :binary.copy function to make new binaries and continually shove them into the front of a list. This guarantees that we are still holding onto the binary, so it shouldn't get garbage collected, and our use of copy guarantees that erlang won't do anything funny with respect to using a shared binary for this in some optimization.

vim examples/overflowing_binary_space.exs
defmodule Overflower do
  def start do
    # We'll just loop on some example binary and a list to accumulate into
    loop(0, "samplebinary", [])
  end

  def loop(n, bin, acc) do
    # For each iteration, we copy the binary 1000 times and put it at the front
    # of the list.
    IO.puts n
    loop(n+1, bin, [:binary.copy(bin, 1_000)|acc])
  end
end

Overflower.start

Now I'm not actually going to run this, because in my tests I wasn't able to crash the BEAM itself before bringing my system to its knees with this script. I have 22 gigs of RAM, so I'm not sure short of adding a crazy larger amount of RAM what I could do to make this a successful failure test. If you can figure out how to trigger this failure mode, let me know - when I ran the script, Linux's OOM-killer wasn't able to stop the process before my system became unresponsive. So this one is a failure of sorts, though we can at least see how to break the underlying OS :)

Uncontrolled Process Heap Growth

I think the easiest way to trigger a failure due to uncontrolled process heap growth will be to add a lot of messages to a process's queue without matching on them. Eventually our unbounded mailbox will just be too backed up and we should see a crash. Let's try it out:

vim examples/uncontrolled_process_heap_growth.exs
defmodule Heaper do
  def start do
    pid = spawn(&receiver/0)
    loop(0, pid)
  end

  def loop(n, pid) do
    IO.puts n
    send(pid, :something)
    loop(n+1, pid)
  end

  def receiver do
    receive do
      :nope -> :ok
    end
  end
end

Heaper.start

This is another one that I can't really run to show you the problem, but you're welcome to run it. It will gradually increase memory usage until your system has no memory left, because erlang process mailboxes are unbounded and they aren't just going to throw away messages, but we never process them. It's possible to write a receive loop that doesn't throw away unexpected messages to cause this error - here we just cause it very explicitly, but it's an example of how you ought to have a catch-all receive that discards messages you don't care about. Using OTP basically lets you ignore this failure mode as your genservers and whatnot will handle all the messages that come in by way of handle_info anyway.

Too Many Processes

As our final failure mode, let's just create too many processes for erlang to keep up. These will just do the same infinite receive as we did in Heaper, but we'll spawn a new process each time through the loop.

vim examples/too_many_processes.exs
defmodule Processor do
  def start do
    loop(0)
  end

  def loop(n) do
    IO.puts n
    spawn(&receiver/0)
    loop(n+1)
  end

  def receiver do
    receive do
      :nope -> :ok
    end
  end
end

Processor.start
mix run examples/too_many_processes.exs
...
262094
262095
262096
262097
262098
262099

07:10:24.572 [error] Too many processes


** (SystemLimitError) a system limit has been reached
    :erlang.spawn(:erlang, :apply, [#Function<0.17877128/0 in Processor.loop/1>,
[]])
    :erlang.spawn/1
    examples/too_many_processes.exs:8: Processor.loop/1
    (elixir) lib/code.ex:307: Code.require_file/2
    (mix) lib/mix/tasks/run.ex:61: Mix.Tasks.Run.run/1

So here we can't have more than 262,100 processes. The actual default limit, per the information at man erl, is 262,144, which makes sense - the BEAM and Elixir both start their own processes at bootup presently, so we can assume that number is around 43, with our running script being the 44th. If you want to you can tune this number with a flag when starting your script. Let's look at that briefly:

ELIXIR_ERL_OPTIONS="+P 2000" mix run examples/too_many_processes.exs

So here I limited us to fewer processes, so we could see the error but it would fail a bit faster.

Summary

Today we just had a look at a few different failure modes your Elixir programs can run into. I'd love to see more examples of making the BEAM fall over. Leave them in the comments if you have some good examples. See you soon!

Resources