The CSV format comes up all the time. It's a convenient and time-honored way to transfer tabular data textually. If you're going to have to deal with this sort of data, what's the best way to do it? I'm going to propose that you use a particular library, and I'm suggesting it because the level of detail in the code is overwhelming.

Upon reading the code for the Elixir CSV parsing library, I felt a sense of awe. The person that wrote this code is unbelievably serious about parsing CSV very well, and we're all better off for it. I'm not going to dig into the code in this episode, but I can't stress enough how nice it is to read.

Project

We'll start a new project.

mix new csv_playground
cd csv_playground

I'm pulling in 2 dependencies - the csv library itself, and the Faker library for generating fake data.

vim mix.exs
  defp deps do
    [
      {:faker, "~> 0.5"},
      {:csv, "~> 1.2.0"}
    ]
  end
mix deps.get

We're just going to decode and encode some CSV data here. We'll start out making a CSV file with some records in it.

mkdir scripts
vim scripts/make_data.exs
Faker.start

num_records = 10_000

headers = ~w(first_name last_name company)
data =
  1..num_records
  |> Enum.map(fn(_) ->
    [Faker.Name.first_name, Faker.Name.last_name, Faker.Company.name]
  end)

{:ok, file} = File.open("sample_data.csv", [:write])

[headers | data]
|> CSV.encode
|> Enum.each(&IO.write(file, &1))

:ok = File.close(file)

Now let's run it.

mix run scripts/make_data.exs

OK, so that generates a sample data file for us. Something to notice is that the CSV library operates over streams. So we passed it a list and it handles it just fine, but the same works for a stream of data. Similarly, the output of CSV.encode is a stream, so we then stream that into calls to IO.write.

Now to play with decoding, we'll want to put that into a test fixture.

mkdir test/fixtures
mv sample_data.csv test/fixtures
vim test/csv_playground_test.exs

Alright, so the most simple thing to do with CSV data is to read it as a list.

defmodule CsvPlaygroundTest do
  use ExUnit.Case

  # We'll get the file into a module attribute
  @file "test/fixtures/sample_data.csv"
  # This CSV module only operates on streams, so we need to read the file as a
  # stream.
  @data File.stream! @file

  test "reading CSV as a list" do
    # Now we'll just verify that we can read the data as a list.
    list =
      @data
      |> CSV.decode
      |> Enum.to_list

    assert hd(list) == ["first_name", "last_name", "company"]
  end
end

OK, so any CSV module can read data as a list, although streaming the data is pretty nice since you end up with a stream of lists as a result. This is really necessary if you're reading in a large amount of data.

My favorite thing to do with CSV is to read it in as a map. It's not super important but it makes it a lot nicer to not have to worry about counting positions in your CSV file. Let's look at that:

defmodule CsvPlaygroundTest do
  #...
  test "reading CSV as a map" do
    list =
      @data
      |> CSV.decode(headers: true)
      |> Enum.to_list

    sorted_keys =
      list
      |> hd
      |> Map.keys
      |> Enum.sort

    assert sorted_keys == Enum.sort(["first_name", "last_name", "company"])
  end
end

So it's really easy - you just specify headers as true. You can also specify headers as a list, which will let you get a map out of a CSV file that has no headers.

Resources

So that's a quick overview of my new favorite CSV library. You can also specify the separator, if for instance some jerk gave you a tab separated file. We also had a quick look at faker for generating fake data. I hope you enjoyed it. See you soon!