Exercises

  • What happens when we do df[c(TRUE,FALSE),] with the data.frame we created earlier?
  • What other ways can we subset or extract from a data.frame?
  • data.frame are lists of lists with each column as a list. Can the cells be lists? Prove your answer (bonus points).
  • What other languages have decided to emulate the data.frame? What reasons did they cite for emulating the data.frame?

Exercise Answers

  • What happens when we do df[c(TRUE,FALSE),] with the data.frame we created earlier?

    We get every odd row. Remember most operations in R are vectorized so the TRUE, FALSE are repeated automatically. This means TRUE (include) and FALSE (exclude) are considered a pattern and applied until the last row of the data.frame.

      df <- data.frame(letter=letters, number=1:26, stringsAsFactors = FALSE)
      > df[c(TRUE,FALSE),]
         letter number
      1       a      1
      3       c      3
      5       e      5
      7       g      7
      9       i      9
      11      k     11
      13      m     13
      15      o     15
      17      q     17
      19      s     19
      21      u     21
      23      w     23
      25      y     25
  • What other ways can we subset or extract from a data.frame?

    There are two other common functions for selecting portions of a data.frame. We can use <a class="tag" href="/tags/%20"> </a> which will select the column matching the number provided as the argument. Remember data.frames are lists of lists, so the <a class="tag" href="/tags/%20"> </a> behaves exactly as we expect from a list. In addition, there is a base R function subset which selects rows that match a set of conditions. Here are examples of these additional functions.

      > df<a class="tag" href="/tags/2">2</a>
       [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
      [24] 24 25 26
      > subset(df, number > 20)
         letter number
      21      u     21
      22      v     22
      23      w     23
      24      x     24
      25      y     25
      26      z     26

    In the future we will explore some additional optimized functionality provided by the data.table and dplyr packages.

  • data.frame are lists of lists with each column as a list. Can the cells be lists? Prove your answer (bonus points).

    Yes, cells can be lists, but assigning lists to data.frame cells is a little cumbersome. The packages tibble and dplyr make nesting lists in cells easy and actually exploit this functionality.

      # make a data.frame
      > df <- data.frame(l=1)
      # use <a class="tag" href="/tags/%20"> </a> to be able to assign lists to cells
      #  note the common constructors and $ and []
      #  will automatically flatten the lists
      #  and not work as expected
      > df<a class="tag" href="/tags/1,1">1,1</a> <- list(list(1,2,3))
      > df
              l
      1 1, 2, 3
  • What other languages have decided to emulate the data.frame? What reasons did they cite for emulating the data.frame?

    The most notable emulation of the data.frame from R is the DataFrame in Python pandas package. See this comparison for more details.

    ## Resources