Linked-lists, pattern matching, and recursion in Elixir
Deep dive into the livebook source code to understand pattern matching, recursion, and working with linked lists in Elixir.
For Jupyteach development I have been studying the excellent Livebook project's source code. As I am somewhat new to Elixir development, I have been learning a great deal from the pros. Here's what I learned today...
Linked Lists
In Elixir a list ([a, b, c]
) is implemented as a linked list. One implication of the linked-list structure is that prepend operations are O(1), but append or insert operations are O(n).
To encourage developers to work efficiently with this data structure, Elixir's pattern matching supports some special syntax exists that makes it easy to take the head of a list (first element) and the rest of the list like so [x | rest]
. In this example x
will be the first item of the list and rest
will be a list with everything else.
It is also very convenient to prepend a single element to the front of a list by doing new_list = [x | old_list]
, which will be an efficient O(1) operation.
Livebook Notebooks
In livebook's source code a notebook struct organizes sections and cells as follows: notebook = %{sections: [%{cells: []}]}
.
That is, a notebook has a list of sections, each of which has a list of cells.
In livebook, you are allowed to move cells up and down by a given offset. This offset might cause the cell to cross section boundaries. The function move_cell/3
is responsible for handling this operation. Here's the code:
def move_cell(notebook, cell_id, offset) do
# We firstly create a flat list of cells interspersed with `:separator`
# at section boundaries. Then we move the given cell by the given offset.
# Finally we split the flat list back into cell lists
# and put them in the corresponding sections.
separated_cells =
notebook.sections
|> Enum.map_intersperse(:separator, & &1.cells)
|> List.flatten()
idx =
Enum.find_index(separated_cells, fn
:separator -> false
cell -> cell.id == cell_id
end)
new_idx = (idx + offset) |> clamp_index(separated_cells)
{cell, separated_cells} = List.pop_at(separated_cells, idx)
separated_cells = List.insert_at(separated_cells, new_idx, cell)
cell_groups = group_cells(separated_cells)
sections =
notebook.sections
|> Enum.zip(cell_groups)
|> Enum.map(fn {section, cells} -> %{section | cells: cells} end)
%{notebook | sections: sections}
end
Comments:
separated_cells
is a flat list of cells, where in between each section's cells is the atom:separator
. It looks like[cell1, cell2, :separator, cell3, cell4, cell5, ..]
where cells 1 and 2 are in the first section and cells 3 to 5 are in the second section.idx
is the original index of the cell we want to move (found usingcell_id
)new_idx
is the new index where we would like the cell to endList.pop_at
andList.insert_at
are used to remove the cell fromidx
and put it back in atnew_idx
cell_groups
will returnlist(list(Cell.t()))
, for the cells that belong to each section- Finally the last two chunks of code update the list of cells in each section and then the list of sections in the notebook
Regrouping cells
The part I want to focus on is group_cells/1
. The implementation is a master class in pattern matching and efficient list operations in Elixir. Here's the code
defp group_cells(separated_cells) do
separated_cells
|> Enum.reverse()
|> do_group_cells([])
end
defp do_group_cells([], groups), do: groups
defp do_group_cells([:separator | separated_cells], []) do
do_group_cells(separated_cells, [[], []])
end
defp do_group_cells([:separator | separated_cells], groups) do
do_group_cells(separated_cells, [[] | groups])
end
defp do_group_cells([cell | separated_cells], []) do
do_group_cells(separated_cells, [[cell]])
end
defp do_group_cells([cell | separated_cells], [group | groups]) do
do_group_cells(separated_cells, [[cell | group] | groups])
end
We enter with group_cells
and pass in separated_cells
, which is the re-ordered list(Cell.t() | :separator)
.
The code then reverses that list and begins a series of recursive calls into the do_group_cells/2
function. In all clauses of this function the first argument is a flat list of cells and the second argument is a list of lists of cells. The outer list is for sections. Each inner list is a group of cells in that section.
We begin our time in do_group_cells/2
by passing the reversed separated_cells
list as the first argument and an empty list ([]
) as the second.
Note: I originally started this deep dive by trying to figure out why the list of cells was being reversed and how it was ever "un-reversed".
Recursion and Pattern Matching
This will match either the 2nd or 4th clause (due to the empty list as the second argument).
If our reversed list happened to begin with a :separator
, we go to the 2nd clause. Here we throw away the separator (by keeping only the tail of the first argument) and then make a recursive call updating the second argument to be [[], []]
.
Why [[], []]
? To see we need to follow the recursive call into the 5th clause (why 5th and not 3rd? because we won't have two :separator
atoms in a row).
The 5th clause is the work horse. It uses pattern matching to split the first argument into head (called cell
) and tail (called separated_cells
) and splits the second argument into head group
(list of cells in current section) and tail groups
(the other sections and their cells). The only line in the body of this function is to make another recursive call to do_group_cells/2
passing separated_cells
as the first argument and [[cell | group] | groups]
as the second. This second argument does two things: (1) puts the single cell at the head of the current group and (2) puts this updated group ahead of existing groups.
Comments:
- By always peeling
cell
of the head of the list of cells and then adding it to the front of the current group (updated group to the front of the list of groups), we end up undoing theList.reverse
fromgroup_cells/1
- By always passing
separated_cells
as the first argument in the recursive call that list shrinks by one each time until it is empty. When it is empty we fall into the first clause and recursion ends.
To get here we followed the 2nd clause. Suppose, on the other hand that we end up in the 4th clause to begin the recursion. This happens whenever the reversed list of separated cells does not begin with :separator
. In this clause we match the head of our cells to cell
and the tail to separated_cells
. We then do do a recursive call into do_group_cells/2
separated_cells
as the first argument and the second argument as [[cell]]
. This will match the 5th (or 3rd if the new head of separated_cells
is :separator
) clause and we go to work as described above!
The final piece of the puzzle is the 3rd clause. This matches when the current list of reversed cells begins with :separator
. Here the recursive call will throw away :separator
, pass the tail of the cell list in the first position and then [[] | groups]
as the second. This sets up a new empty cell list for the current group and we recurse back into the 5th clause.
Learning from the best
Functional programming is a powerful approach to modeling and manipulating data. Elixir's implementation allows for concise and expressive code, but it can seem foreign to someone like me with a stronger background in languages like Python, Julia, Javascript, and Go.
Thankfully the team behind livebook are experts and have made their code available and open source. It has been exhilarating to study their code and feel like I'm taking a class from the best in class. Open source for the win!
Postscript
I made use of the Cody tool from Sourcegraph while I was studying the livebook source code. While not not perfect, Cody was an excellent study buddy as I dug into this new library.