NOTE: Running Elixir v0.12.5 or later the % symbol for sigils was deprecated so use ~ instead. So %r should be entered as ~r instead: Regex.split(~r/ /, stripped, trim: true)

Hello again, and welcome to ElixirSips Episode 16 - Pipe Operator.

In today's episode, I'm going to cover the pipe operator. Let's get started.

What is the Pipe Operator?

The Pipe Operator: |> is used to quickly and easily create a pipeline of functions. It's an extremely common thing you run into in programming.

In Unix, a pipeline might look something like this:

ps ax|grep vim|awk '{ print $1 }'

This would list the processes out, grep the resulting lines for the string vim, and print out their PID. There are substantially better ways to accomplish this, but it serves as a pretty comprehensible example.

The reason Unix has a means of piping the output of one program into the input of another is that it's an exceedingly useful way of composing small operations into large, useful expressions.

It turns out that that's most of what functional programming is, so you'd expect the pipe operator to be very useful, and you'd be right. I'm still coming to grips with it, but I figure this is as useful a place as any to get started with an example.

In Elixir, the pipe operator takes the output of the expression on the left of it, and feeds it in as the first argument to the function on the right of it. Let's see what that looks like

Our example application

For our example app, we'll take the output of our ps command as input, and we'll generate functions to serve as grep and awk for building out the same pipeline.

Go ahead and generate a new application:

mix new pipe_operator_playground
cd pipe_operator_playground

Now let's go modify the existing test:

defmodule PipeOperatorPlaygroundTest do
  use ExUnit.Case

  test "ps_ax outputs some processes" do
  end

  test "grep(thing) returns lines that match 'thing'" do
  end

  test "awk(1) splits on whitespace and returns the first column" do
  end

  test "the whole pipeline works" do
  end
end

So there's our rough shell. Let's go ahead and start filling it in.

  test "ps_ax outputs some processes" do
    output = """
  PID TTY      STAT   TIME COMMAND
 8544 ?        S      0:00 [kworker/u:1]
10919 pts/4    Sl+    0:14 vim 016_Pipe_Operator.markdown
10941 pts/5    Ss     0:00 -bash
13936 pts/5    Sl+    0:00 vim test/pipe_operator_playground_test.exs
14422 ?        S      0:00 sleep 3
    """
    assert Unix.ps_ax == output
  end

Here we're just getting some basic test data in for our pipeline. I just read in the ps command and stripped most of it out. We just want to assert that we have a function, Unix.ps_ax, that outputs this data to get us started. Let's go ahead and make the module and corresponding function:

defmodule Unix do
  def ps_ax do
    """
  PID TTY      STAT   TIME COMMAND
 8544 ?        S      0:00 [kworker/u:1]
10919 pts/4    Sl+    0:14 vim 016_Pipe_Operator.markdown
10941 pts/5    Ss     0:00 -bash
13936 pts/5    Sl+    0:00 vim test/pipe_operator_playground_test.exs
14422 ?        S      0:00 sleep 3
    """
  end
end

Run the tests, and since we just copy-pasted the output of a function, I'd be flabbergasted if they didn't succeed.

Now that that's done with, let's fill in the next test:

  test "grep(thing) returns lines that match 'thing'" do
    input = """
    foo
    bar
    thing foo
    baz
    thing qux
    """
    output = ["thing foo", "thing qux"]
    assert Unix.grep(input, 'thing') == output
  end

Alright, here we're expecting to define a grep command that takes a regex of some sort to filter its input. Let's see what that looks like:

  def grep(input, match) do
    lines = String.split(input, "\n")
    Enum.filter(lines, fn(line) -> Regex.match?(%r/#{match}/, line) end)
  end

This would, hilariously, be a bit shorter if we were to use a pipeline for it, but we can always come back to that. I'm going to try to be laser-focused on the actual pipeline we're building, for now, for illustrative purposes.

Let's see the next test:

  test "awk(1) splits on whitespace and returns the first column" do
    input = ["foo bar", "  baz    qux "]
    output = ["foo", "baz"]
    assert Unix.awk(input, 1) == output
  end

Alright, so here we're expecting to split on whitespace of some sort, and return the column we ask for. Easy enough, let's go ahead and implement it:

  def awk(lines, column) do
    Enum.map(lines, fn(line) ->
      stripped = String.strip(line)
      columns = Regex.split(%r/ /, stripped, trim: true)
      Enum.at(columns, column-1)
    end)
  end

Run the tests, they should all pass. Finally, let's see how we can glue these pieces together with the pipe operator, which was the whole flipping point of this exercise, right?

  test "the whole pipeline works" do
    assert (Unix.ps_ax |> Unix.grep('vim') |> Unix.awk(1)) == ["10919", "13936"]
  end

Go ahead and run the test, and it should just work. That's what a pipeline's good for - turning a whole lot of work into one small, easy to read piece. Without the pipeline, that would have been written:

Unix.awk(Unix.grep(Unix.ps_ax, 'vim'), 1)

Which is just awful to read, especially if you look at the distance between 'awk' and its argument specifying the column.

Anyway, now that we've played with the pipe a bit, let's clean up the rest of the pieces of this code that could benefit from pipelines:

  def grep(input, match) do
    String.split(input, "\n")
      |> Enum.filter(fn(line) -> Regex.match?(%r/#{match}/, line) end)
  end

  def awk(lines, column) do
    Enum.map(lines, fn(line) ->
      stripped = String.strip(line)
      Regex.split(%r/ /, stripped, trim: true)
        |> Enum.at(column-1)
    end)
  end

You can see we can't easily pipeline through the Regex.split function. This is kind of untrue - we can use it, but since it doesn't take the input string as the first argument, it looks exceedingly unwieldy. Just to show you what this looks like, let me show you a way I would not suggest:

  def awk(lines, column) do
    Enum.map(lines, fn(line) ->
      String.strip(line)
        |> (&Regex.split(%r/ /, &1, trim: true)).()
        |> Enum.at(column-1)
    end)
  end

So yeah, that's technically a thing you can do, but don't.

Summary

In today's episode, we learned how to use the Pipe Operator to chain together functions to form a large, composition function that does a bigger task using small lego-style functional pieces. See you soon!

Links