A Short Visit to Common Data Structures

Won’t be too long, promised!

Chances are you now understand the functional subset of Erlang pretty well and could read many programs without a problem. However, I could bet it’s still a bit hard to think about how to build a real useful program even though the last chapter was about solving problems in a functional manner. I’m saying this because it’s how I felt like at about that point in my learning, but if you’re doing better, congratulations!

Anyway, the point I’m coming to is that we’ve seen a bunch of things: most basic data types, the shell, how to write modules and functions (with recursion), different ways to compile, control the flow of the program, handle exceptions, abstract away some common operations, etc. We’ve also seen how to store data with tuples, lists and an incomplete implementation of a binary search tree. What we haven’t seen is the other data structures provided to the programmer in the Erlang standard library.

a phonograph

Records

Records are, first of all, a hack. They are more or less an afterthought to the language and can have their share of inconveniences. I’ll cover that later. They’re still pretty useful whenever you have a small data structure where you want to access the attributes by name directly. As such, Erlang records are a lot like structs in C (if you know C.)

They’re declared as module attributes in the following manner:

-module(records).
-compile(export_all).
-record(robot, {name,
type=industrial,
hobbies,
details=[]}).

So here we have a record representing robots with 4 fields: name, type, hobbies and details. There is also a default value for type and details, industrial and [], respectively. Here’s how to declare a record in the module records:

first_robot() ->
#robot{name="Mechatron",
type=handmade,
details=["Moved by a small man inside"]}.

And running the code:

1> c(records).
{ok,records}
2> records:first_robot().
{robot,"Mechatron",handmade,undefined,
["Moved by a small man inside"]}

Woops! Here comes the hack! Erlang records are just syntactic sugar on top of tuples. Fortunately, there’s a way to make it better. The Erlang shell has a command rr(Module) that lets you load record definitions from Module:

3> rr(records).
[robot]
4> records:first_robot().
#robot{name = "Mechatron",type = handmade,
hobbies = undefined,
details = ["Moved by a small man inside"]}

Ah there! This makes it much easier to work with records that way. You’ll notice that in first_robot/0, we had not defined the hobbiesfield and it had no default value in its declaration. Erlang, by defaults, sets the value to undefined for you.

To see the behavior of the defaults we set in the robot definition, let’s compile the following function:

car_factory(CorpName) ->
#robot{name=CorpName, hobbies="building cars"}.

And run it:

5> c(records).
{ok,records}
6> records:car_factory("Jokeswagen").
#robot{name = "Jokeswagen",type = industrial,
hobbies = "building cars",details = []}

And we have an industrial robot that likes to spend time building cars.

Note: The function rr() can take more than a module name: it can take a wildcard (like rr("*")) and also a list as a second argument to specify which records to load.

There are a few other functions to deal with records in the shell: rd(Name, Definition) lets you define a record in a manner similar to the -record(Name, Definition) used in our module. You can use rf() to ‘unload’ all records, or rf(Name) or rf([Names]) to get rid of specific definitions.

You can use rl() to print all record definitions in a way you could copy-paste into the module or use rl(Name) or rl([Names]) to restrict it to specific records.

Finally, rp(Term) lets you convert a tuple to a record (given the definition exists).

Writing records alone won’t do much. We need a way to extract values from them. There are basically two ways to do this. The first one is with a special ‘dot syntax’. Assuming you have the record definition for robots loaded:

5> Crusher = #robot{name="Crusher"hobbies=["Crushing people","petting cats"]}.
#robot{name = "Crusher",type = industrial,
hobbies = ["Crushing people","petting cats"],
details = []}
6> Crusher#robot.hobbies.
["Crushing people","petting cats"]

Ugh, not a pretty syntax. This is due to the nature of records as tuples. Because they’re just some kind of compiler trick, you have to keep keywords around defining what record goes with what variable, hence the #robot part of Crusher#robot.hobbies. It’s sad, but there’s no way out of it. Worse than that, nested records get pretty ugly:

7> NestedBot = #robot{details=#robot{name="erNest"}}.
#robot{name = undefined,type = industrial,
hobbies = undefined,
details = #robot{name = "erNest",type = industrial,
hobbies = undefined,details = []}}
8> (NestedBot#robot.details)#robot.name.
"erNest"

And yes, the parentheses are mandatory.

Update:
Starting with revision R14A, it is now possible to nest records without the parentheses. The NestedBot example above could also be written as NestedRobot#robot.details#robot.name and work the same.

To further show the dependence of records on tuples, see the following:

9> #robot.type.
3

What this outputs is which element of the underlying tuple it is.

One saving feature of records is the possibility to use them in function heads to pattern match and also in guards. Declare a new record as follows on top of the file, and then add the functions under:

-record(user, {idnamegroupage}).
%% use pattern matching to filter
admin_panel(#user{name=Name, group=admin}) ->
Name ++ " is allowed!";
admin_panel(#user{name=Name}) ->
Name ++ " is not allowed".
%% can extend user without problem
adult_section(U = #user{}) when  U#user.age >= 18  ->
%% Show stuff that can't be written in such a text
allowed;
adult_section(_->
%% redirect to sesame street site
forbidden.

The syntax to bind a variable to any field of a record is demonstrated in the admin_panel/1 function (it’s possible to bind variables to more than one field). An important thing to note about the adult_section/1 function is that you need to do SomeVar = #some_record{} in order to bind the whole record to a variable. Then we do the compiling as usual:

10> c(records).
{ok,records}
11> rr(records).
[robot,user]
12> records:admin_panel(#user{id=1name="ferd",group=adminage=96}).
"ferd is allowed!"
13> records:admin_panel(#user{id=2name="you",group=usersage=66}).
"you is not allowed"
14> records:adult_section(#user{id=21name="Bill",group=usersage=72}).
allowed
15> records:adult_section(#user{id=22name="Noah",group=usersage=13}).
forbidden

What this lets us see is how it is not necessary to match on all parts of the tuple or even know how many there are when writing the function: we can only match on the age or the group if that’s what’s needed and forget about all the rest of the structure. If we were to use a normal tuple, the function definition might need to look a bit like function({record, _, _, ICareAboutThis, _, _) -> .... Then, whenever someone decides to add an element to the tuple, someone else (probably angry about it all) would need to go around and update all functions where that tuple is used.

The following function illustrates how to update a record (they wouldn’t be very useful otherwise):

repairman(Rob) ->
Details = Rob#robot.details,
NewRob = Rob#robot{details=["Repaired by repairman"|Details]},
{repaired, NewRob}.

And then:

16> c(records).
{ok,records}
17> records:repairman(#robot{name="Ulbert",hobbies=["trying to have feelings"]}).
{repaired,#robot{name = "Ulbert",type = industrial,
hobbies = ["trying to have feelings"],
details = ["Repaired by repairman"]}}

And you can see my robot has been repaired. The syntax to update records is a bit special here. It looks like we’re updating the record in place (Rob#robot{Field=NewValue}) but it’s all compiler trickery to call the underlying erlang:setelement/3 function.

One last thing about records. Because they’re pretty useful and code duplication is annoying, Erlang programmers frequently share records across modules with the help of header files. Erlang header files are pretty similar to their C counter-part: they’re nothing but a snippet of code that gets added to the module as if it were written there in the first place. Create a file named records.hrl with the following content:

%% this is a .hrl (header) file.
-record(included, {some_field,
some_default = "yeah!",
unimaginative_name}).

To include it in records.erl, just add the following line to the module:

-include("records.hrl").

And then the following function to try it:

included() ->  #included{some_field="Some value"}.

Now, try it as usual:

18> c(records).
{ok,records}
19> rr(records).
[included,robot,user]
20> records:included().
#included{some_field = "Some value",some_default = "yeah!",
unimaginative_name = undefined}

Hooray! That’s about it for records; they’re ugly but useful. Their syntax is not pretty, they’re not much but a hack, but they’re relatively important for the maintainability of your code.

Key-Value Stores

key and keyhole, another terrible pun

I’ve had you build a tree back a few chapters, and the use was to use it as a key-value store for an address book. That book sucked: we couldn’t delete or convert it to anything useful. It was a good demonstration of recursion, but not much more. Now is the time to introduce you to a bunch of useful data structures and modules to store data under a certain key. I won’t define what every function does nor go through all the modules. I will simply link to the doc pages. Consider me as someone responsible about ‘raising awareness about key-value stores in Erlang’ or something. Sounds like a good title. I just need one of these ribbons.

For small amounts of data, there are basically two data structures that can be used. The first one is called a proplist. A proplist is any list of tuples of the form [{Key,Value}]. They’re a weird kind of structure because there is no other rule than that. In fact the rules are so relaxed that the list can also contain boolean values, integers and whatever you want. We’re rather interested by the idea of a tuple with a key and a value in a list here, though. To work with proplists, you can use the proplists module. It contains functions such as proplists:delete/2proplists:get_value/2proplists:get_all_values/2proplists:lookup/2 and proplists:lookup_all/2.

You’ll notice there is no function to add or update an element of the list. This shows how loosely defined proplists are as a data structure. To get these functionalities, you must cons your element manually ([NewElement|OldList]) and use functions such as lists:keyreplace/4. Using two modules for one small data structure is not the cleanest thing, but because proplists are so loosely defined, they’re often used to deal with configuration lists, and general description of a given item. Proplists are not exactly complete data structures. They’re more of a common pattern that appears when using lists and tuples to represent some object or item; the proplists module is a bit of a toolbox over such a pattern.

If you do want a more complete key-value store for small amounts of data, the orddict module is what you need. Orddicts (ordered dictionaries) are proplists with a taste for formality. Each key can be there once, the whole list is sorted for faster average lookup, etc. Common functions for the CRUD usage include orddict:store/3,orddict:find/2 (when you do not know whether the key is in the dictionaries), orddict:fetch/2 (when you know it is there or that it must be there) and orddict:erase/2.

A dictionary with the definition of 'Awesome' being 'it's you!'

Orddicts are a generally good compromise between complexity and efficiency up to about 75 elements (see my benchmark). After that amount, you should switch to different key-value stores.

There are basically two key-value structures/modules to deal with larger amounts of data: dicts and gb_trees. Dictionaries have the same interface as orddicts: dict:store/3dict:find/2dict:fetch/2dict:erase/2 and every other function, such as dict:map/2 and dict:fold/2 (pretty useful to work on the whole data structure!) Dicts are thus very good choices to scale orddicts up whenever it is needed.

General Balanced Trees, on the other hand, have a bunch more functions leaving you more direct control over how the structure is to be used. There are basically two modes for gb_trees: the mode where you know your structure in and out (I call this the ‘smart mode’, and the mode where you can’t assume much about it (I call this one the ‘naive mode’). In naive mode, the functions are gb_trees:enter/2gb_trees:lookup/2 and gb_trees:delete_any/2. The related smart functions are gb_trees:insert/3gb_trees:get/2gb_trees:update/3 and gb_trees:delete/2. There is also gb_trees:map/2, which is always a nice thing when you need it.

The disadvantage of ‘naive’ functions over ‘smart’ ones is that because gb_trees are balanced trees, whenever you insert a new element (or delete a bunch), it might be possible that the tree will need to balance itself. This can take time and memory (even in useless checks just to make sure). The ‘smart’ function all assume that the key is present in the tree: this lets you skip all the safety checks and results in faster times.

When should you use gb_trees over dicts? Well, it’s not a clear decision. As the benchmark module I have written will show, gb_trees and dicts have somewhat similar performances in many respects. However, the benchmark demonstrates that dicts have the best read speeds while the gb_trees tend to be a little quicker on other operations. You can judge based on your own needs which one would be the best.

Oh and also note that while dicts have a fold function, gb_trees don’t: they instead have an iterator function, which returns a bit of the tree on which you can call gb_trees:next(Iterator) to get the following values in order. What this means is that you need to write your own recursive functions on top of gb_trees rather than use a generic fold. On the other hand, gb_trees let you have quick access to the smallest and largest elements of the structure with gb_trees:smallest/1 and gb_trees:largest/1.

I would therefore say that your application’s needs is what should govern which key-value store to choose. Different factors such as how much data you’ve got to store, what you need to do with it and whatnot all have their importance. Measure, profile and benchmark to make sure.

Note: some special key-value stores exist to deal with resources of different size. Such stores are ETS tables,DETS tables and the mnesia database. However, their use is strongly related to the concepts of multiple processes and distribution. Because of this, they’ll only be approached later on. I’m leaving this as a reference to pique your curiosity and for those interested.

Arrays

But what about code that requires data structures with nothing but numeric keys? Well for that, there are arrays. They allow you to access elements with numerical indices and to fold over the whole structure while possibly ignoring undefined slots.

Don’t drink too much kool-aid:
Erlang arrays, at the opposite of their imperative counterparts, are not able to have such things as constant-time insertion or lookup. Because they’re usually slower than those in languages which support destructive assignment and that the style of programming done with Erlang doesn’t necessary lend itself too well to arrays and matrices, they are rarely used in practice.

Generally, Erlang programmers who need to do matrix manipulations and other uses requiring arrays tend to use concepts called Ports to let other languages do the heavy lifting, or C-NodesLinked in drivers and NIFs(Experimental, R13B03+).

Arrays are also weird in the sense that they’re one of the few data structures to be 0-indexed (at the opposite of tuples or lists), along with indexing in the regular expressions module. Be careful with them.

A Set of Sets

a swingSET

If you’ve ever studied set theory in whatever mathematics class you have an idea about what sets can do. If you haven’t, you might want to skip over this. However, I’ll just say that sets are groups of unique elements that you can compare and operate on: find which elements are in two groups, in none of them, only in one or the other, etc. There are more advanced operations letting you define relations and operate on these relations and much more. I’m not going to dive into the theory (again, it’s out of the scope of this book) so I’ll just describe them as it is.

There are 4 main modules to deal with sets in Erlang. This is a bit weird at first, but it makes more sense once you realize that it’s because it was agreed by implementers that there was no ‘best’ way to build a set. The four modules are ordsetssetsgb_sets and sofs(sets of sets):

ordsets
Ordsets are implemented as a sorted list. They’re mainly useful for small sets, are the slowest kind of set, but they have the simplest and most readable representation of all sets. There are standard functions for them such as ordsets:new/0ordsets:is_element/2ordsets:add_element/2ordsets:del_element/2ordsets:union/1,ordsets:intersection/1, and a bunch more.
sets
Sets (the module) is implemented on top of a structure really similar to the one used in dict. They implement the same interface as ordsets, but they’re going to scale much better. Like dictionaries, they’re especially good for read-intensive manipulations, like checking whether some element is part of the set or not.
gb_sets
Gb_sets themselves are constructed above a General Balanced Tree structure similar to the one used in the gb_trees module. gb_sets are to sets what gb_tree is to dict; an implementation that is faster when considering operations different than reading, leaving you with more control. While gb_sets implement the same interface as sets and ordsets, they also add more functions. Like gb_trees, you have smart vs. naive functions, iterators, quick access to the smallest and largest values, etc.
sofs
Sets of sets (sofs) are implemented with sorted lists, stuck inside a tuple with some metadata. They’re the module to use if you want to have full control over relationships between sets, families, enforce set types, etc. They’re really what you want if you need mathematics concept rather than ‘just’ groups of unique elements.

Don’t drink too much kool-aid:
While such a variety can be seen as something great, some implementation details can be downright frustrating. As an example, gb_sets, ordsets and sofs all use the == operator to compare values: if you have the numbers 2 and 2.0, they’ll both end up seen as the same one.

However, sets (the module) uses the =:= operator, which means you can’t necessarily switch over every implementation as you wish. There are cases where you need one precise behavior and at that point, you might lose the benefit of having multiple implementations.

It’s a bit confusing to have that many options available. Björn Gustavsson, from the Erlang/OTP team and programmer of  Wings3D mainly suggests using gb_sets in most circumstances, using ordset when you need a clear representation that you want to process with your own code and ‘sets’ when you need the =:= operator (source.)

In any case, like for key-value stores, the best solution is usually to benchmark and see what fits your application better.

Directed Graphs

There is one other data structure that I want to mention here (not that there are not more than what’s mentioned in this chapter, on the contrary): directed graphs. Again, this data structure is more for readers who already know the mathematical theory that goes with it.

Directed graphs in Erlang are implemented as two modules, digraph and digraph_utils. The digraph module basically allows the construction and modification of a directed graph: manipulating edges and vertices, finding paths and cycles, etc. On the other hand, digraph_utils allows you to navigate a graph (postorder, preorder), testing for cycles, arborescences or trees, finding neighbors, and so on.

Because directed graphs are closely related to set theory, the ‘sofs’ module contains a few functions letting you convert families to digraphs and digraphs to families.

Queues

The queue module implements a double-ended FIFO (First In, First Out) queue:

Drawing representing the implementation of a functional queue

They’re implemented a bit as illustrated above: two lists (in this context, stacks) that allow to both append and prepend elements rapidly.

The queue module basically has different functions in a mental separation into 3 interfaces (or APIs) of varying complexity, called ‘Original API’, ‘Extended API’ and ‘Okasaki API’:

Original API
The original API contains the functions at the base of the queue concept, including: new/0, for creating empty queues, in/2, for inserting new elements, out/1, for removing elements, and then functions to convert to lists, reverse the queue, look if a particular value is part of it, etc.
Extended API
The extended API mainly adds some introspection power and flexibility: it lets you do things such as looking at the front of the queue without removing the first element (see get/1 or peek/1), removing elements without caring about them (drop/1), etc. These functions are not essential to the concept of queues, but they’re still useful in general.
Okasaki API
The Okasaki API is a bit weird. It’s derived from Chris Okasaki’s Purely Functional Data Structures. The API provides operations similar to what was available in the two previous APIs, but some of the function names are written backwards and the whole thing is relatively peculiar. Unless you do know you want this API, I wouldn’t bother with it.

You’ll generally want to use queues when you’ll need to ensure that the first item ordered is indeed the first one processed. So far, the examples I’ve shown mainly used lists as a accumulators that would then be reversed. In cases where you can’t just do all the reversing at once and elements are frequently added, the queue module is what you want (well, you should test and measure first! Always test and measure first!)

End of the short visit

That’s about it for the data structures trip of Erlang. Thank you for having kept your arms inside the vehicles the whole time. Of course, there are a few more data structures available than that to solve different problems. I’ve only covered those that you’re likely to encounter or need the most given the strengths of general use cases of Erlang. I encourage you to explore the standard library and the extended one too to find more information.

You might be glad to learn that this completes our trip into sequential (functional) Erlang. I know a lot of people get in Erlang to see all the concurrency and processes and whatnot. It’s understandable, given it’s really where Erlang shines. Supervision trees, fancy error management, distribution, and more. I know I’ve been very impatient to write about these subjects, so I guess some readers were very impatient to read about them.

However, I judged it made more sense to be comfortable with functional Erlang before moving on to concurrent Erlang. It will be easier to move on afterwards and focus on all the new concepts. Here we go!

<<Add new type: Maps>>

A weird road-runner representing EEP-43

EEP, EEP!

The road that led to maps being added to the language has been long and tortuous. For years, people have been complaining about records and key/value data structures on multiple fronts in the Erlang community:

  • People don’t want to put the name of the record every time when accessing it and want dynamic access
  • There is no good canonical arbitrary key/value store to target and use in pattern matches
  • Something faster than dicts and gb_trees would be nice
  • Converting to and from key/value stores is difficult to do well, and native data types would help with this
  • Pattern matching is a big one, again!
  • Records are a preprocessor hack and problematic at run time (they are tuples)
  • Upgrades with records are painful
  • Records are bound to a single module
  • And so much more!

There ended up being two competing proposals. The first is Richard O’Keefe’s frames, a really deeply thought out proposal on how to replace records, and Joe Armstrong’s Structs (O’Keefe’s proposal describes them briefly and compares them to frames).

Later on, the OTP team came up with Erlang Enhancement Proposal 43 (EEP-43), their own proposal for a “similar” data type. However, much like with community complaints, the two proposals were working on entirely different issues: maps are intended to be very flexible hash maps to replace dicts andgb_trees, and frames are intended to replace records. Maps would also ideally be able to replace records in some use cases, but not all of them while preserving the positive characteristics of records (as we’ll see in Mexican Standoff).

Maps were the one picked to be implemented by the OTP team, and frames are still hanging with their own proposal, somewhere.

The proposal for maps is comprehensive, covers many cases, and highlights many of the design issues in trying to have them fit the language, and for these reasons, their implementation will be gradual. The specification is described in What Maps Shall Be, while the temporary implementation is described in Stubby Legs for Early Releases.

What Maps Shall Be

The Module

Maps are a data type similar to the dict data structure in intent, and has been given a module with a similar interface and semantics. The following operations are supported:

  • maps:new(): returns a new, empty map.
  • maps:put(Key, Val, Map): adds an entry to a map.
  • maps:update(Key, Val, Map): returns a map with Key‘s entry modified or raises the error badarg if it’s not there.
  • maps:get(Key, Map) and maps:find(Key, Map): find items in a map. The former returns the value if present or raises the bad_key error otherwise, whereas the latter returns {ok, Val} or error.
  • maps:remove(Key, Map): returns a map with the given element deleted. Similarly, maps:without([Keys], Map) will delete multiple elements from the returned map. If a key to be deleted isn’t present, the functions behave as if they had deleted it — they return a map without it.

It also contains the good old functional standards such as fold/3 and map/2, which work similarly to the functions in the lists module. Then there is the set of utility functions such as size/1is_map/1 (also a guard!), from_list/1 and to_list/1keys/1 and values/1, and set functions like is_key/2 and merge/2 to test for membership and fuse maps together.

That’s not all that exciting, and users want more!

The Syntax

Despite the promised gains in speed, the most anticipated aspect of maps is the native syntax they have. Here are the different operations compared to their equivalent module call:

Maps Module Maps Syntax
maps:new/1 #{}
maps:put/3 Map#{Key => Val}
maps:update/3 Map#{Key := Val}
maps:get/2 Map#{Key}
maps:find/2 #{Key := Val} = Map

Bear in mind that not all of this syntax has been implemented yet. Fortunately for map users, the pattern matching options go wider than this. It is possible to match more than one item out of a map at once:

1> Pets = #{"dog" => "winston""fish" => "mrs.blub"}.
#{"dog" => "winston","fish" => "mrs.blub"}
2> #{"fish" := CatName, "dog" := DogName} = Pets.
#{"dog" => "winston","fish" => "mrs.blub"}

Here it’s possible to grab the contents of any number of items at a time, regardless of order of keys. You’ll note that elements are set with => and matched with :=. The := operator can also be used to update an existing key in a map:

3> Pets#{"dog" := "chester"}.
#{"dog" => "chester","fish" => "mrs.blub"}
4> Pets#{dog := "chester"}.
** exception error: bad argument
in function  maps:update/3
called as maps:update(dog,"chester",#{"dog" => "winston","fish" => "mrs.blub"})
in call from erl_eval:'-expr/5-fun-0-'/2 (erl_eval.erl, line 257)
in call from lists:foldl/3 (lists.erl, line 1248)

There’s more matching in the specification, although it’s not available in 17.0 yet:

5> #{"favorite" := Animal, Animal := Name} = Pets#{"favorite" := "dog"}.
#{"dog" => "winston","favorite" => "dog","fish" => "mrs.blub"}
6> Name.
"winston"

Within the same pattern, a known key’s value can be used to define a variable (Animal) usable as another key, and then use that other key to match the desired value (Name). The limitation with this kind of pattern is that there cannot be cycles. For example, matching #{X := Y, Y := X} = Map cannot be done due to needing to know Y to match X, and needing to know X to bind Y. You also cannot do matching of a key by value (#{X := val} = Map) because there could be multiple keys with the same value.

Note: The syntax for accessing a single value (Map#{Key}) is documented as such in the EEP, but is subject to change in the future once implemented, and might be dropped entirely in favor of different solutions.

There’s something interesting but unrelated that was added with maps. If you remember in Starting Out For Real, we introduced list comprehensions:

7> Weather = [{torontorain}, {montrealstorms}, {londonfog},  
7>            {parissun}, {bostonfog}, {vancouversnow}].
[{toronto,rain},
{montreal,storms},
{london,fog},
{paris,sun},
{boston,fog},
{vancouver,snow}]
8> FoggyPlaces = [X || {X, fog<- Weather].
[london,boston]

The same kind of stuff can be done with map comprehensions, once they’re made available:

9> Weather = #{toronto => rainmontreal => stormslondon => fog,
9>             paris => sunboston => fogvancouver => snow}.
#{boston => fog,
london => fog,
montreal => storms,
paris => sun,
toronto => rain,
vancouver => snow}
10> FoggyPlaces = [X || X := fog <- Weather].
[london,boston]

Here, X := fog <- Weather represents a map generator, of the form Key := Val <- Map. This can be composed and substituted the same way list generators and binary generators can. Map comprehensions can also generate new maps:

11> #{X => foggy || <- [london,boston]}.
#{boston => foggy, london => foggy}

Or to implement the map operation from the maps module itself:

map(F, Map) ->
#{K => F(V) || K := <- Map}.

And that’s most of it! It’s extremely refreshing to see this new data type joining the Erlang zoo, and hopefully users will appreciate it.

A bee looking over a map

The Gritty Details

The details aren’t that gritty, just a little bit. The inclusion of maps in the language will impact a few things. EEP-43 goes into detail to define potential changes, many somewhat still in the air, for parts such as the distributed Erlang protocol, operator precedence, backwards compatibility, and suggestions to Dialyzer extensions (it is yet to see if the support will go as far as the EEP recommends).

Many of the changes have been proposed such that the user doesn’t need to think very hard about them. One that is unavoidable, however, is sorting. Previously in the book, the sorting order was defined as:

number < atom < reference < fun < port < pid < tuple < list < bit string

Maps now fit in here:

number < atom < reference < fun < port < pid < tuple < map < list < bit string

Bigger than tuples and smaller than lists. Interestingly enough, maps can be compared to each other based on their keys and values:

2> lists:sort([#{ 1 => 23 => 4}, #{2 => 1}, #{2 => 01 => 4}]).
[#{2 => 1},#{1 => 4,2 => 0},#{1 => 2,3 => 4}]

The sorting is done similarly to lists and tuples: first by size, then by the elements contained. In the case of maps, these elements are in the sorted order of keys, and breaking ties with the values themselves.

Don’t Drink Too Much Kool-Aid:
You may notice that while we can’t update the key 1.0 of a map with the key 1, it is possible to have them both compare as equal that way! This is one of the long-standing warts of Erlang. While it’s very convenient to have all numbers compare equal from time to time, for example, when sorting lists so that 1.0 isn’t greater than 9121, this creates confusing expectations when it comes to pattern matching.

For example, while we can expect 1 to be equal to 1.0 (although not strictly equal, as with =:=), we can’t expect to pattern match by doing 1 = 1.0.

In the case of maps, this means that Map1 == Map2 isn’t a synonym of Map1 = Map2. Because Erlang maps respect Erlang sorting order, a map such as #{1.0 => true} is going to compare equal to #{1 => true}, but you won’t be able to match them one against the other.

Be careful, because although the content of this section is written as based on EEP-43, the actual implementation might be lagging behind!

Stubby Legs for Early Releases

A Corgi with a cape, flying

The maps implementation that came with Erlang 17.0 is complete, but only within the confines of the maps module. The major differences come from the syntax. Only a minimal subset of it is available:

  • literal map declarations (#{}#{key1 => val1, "key2" => val2})
  • matching on known keys (#{a => X} = SomeMap)
  • Updating and adding elements with a known key in an existing map (Map#{a := update, b => new})

The rest, including accessing single values (Map#{key}) and using variables as keys, whether in matches or declarations, is still not there. Same fate for map comprehensions and Dialyzer support. In fact, the syntax may change and differ from the EEP in the end.

Maps are also still slower than most Erlang developers and the OTP team want them to be. Still, progress is ongoing and it should not take long — especially compared to how long it’s been before maps were added to the language in the first place — before they get much better.

This early release is still good enough to get familiar with maps, and more importantly, get to know when and where to use them.

Mexican Standoff

Everyone had their opinion on wanting native dictionaries or better records replacement as a priority. When maps were announced, many Erlang developers kind of just assumed that maps, when they came to the language, would solve the problem they wanted solved.

As such, there is some confusion on how maps should be used in Erlang.

Maps Vs. Records Vs. Dicts

To get it out of the way directly, maps are a replacement for dicts, not records. This can be confusing. At the beginning of this chapter, I identified common complaints, and it turns out that many of the complaints about records would be fixed by maps:

Issue Solved By Maps Issue in Records Issue in Dicts
Record Name is cumbersome
No good native k/v store
Faster k/v store future
Converting easier with native type
More powerful pattern matching
Upgrades with Records maybe
Usable across modules without includes

squid aiming two guns mexican-standoff-like, while eating a taco

The score is pretty close. The point of maps being faster isn’t necessarily the case yet, but optimizations should bring them to a better level. The OTP team is respecting the old slogan: first make it work, then make it beautiful, and only if you need to, make it fast. They’re getting semantics and correctness out of the way first.

For upgrades with records being marked as ‘maybe’, this has to do with the code_change functionality. A lot of users are annoyed by having to convert records openly when they change versions similar to what we did with pq_player.erland its upgrade code. Maps would instead allow us to just add fields as required to the entry and keep going. The counter-argument to that is that a botched up upgrade would crash early with records (a good thing!) whereas a botched up upgrade with maps may just linger on with the equivalent of corrupted data, and not show up until it’s too late.

So why is it we should use maps as dicts and not as records? For this, a second table is necessary. This one is about semantics, and which data structure or data type a feature applies to:

Operations Records Maps Dict
Immutable
Keys of any type
Usable with maps/folds
Content opaque to other modules
Has a module to use it
Supports pattern matching
All keys known at compile-time
Can merge with other instance
Testing for presence of a key
Extract value by key
Per-key Dialyzer type-checking *
Conversion from/to lists
Per-element default values
Standalone data type at runtime
Fast direct-index access

* The EEP recommends making this possible for keys known at compile-time, but has no ETA on when or if this will happen.

This chart makes it rather obvious that despite the similarity in syntax between maps and records, dicts and maps are much closer together semantically than maps are to records. Therefore, using maps to replace records would be similar to trying to replace ‘structs’ in a language like C with a hash map. It’s not impossible, but that doesn’t mean it’s a good idea given they often have different purposes.

Keys being known at compile time brings advantages with fast access to specific values (faster than what is possible dynamically), additional safety (crash early rather than corrupting state), and easier type checking. These make records absolutely appropriate for a process’ internal state, despite the occasional burden of writing a more verbose code_change function.

On the other hand, where Erlang users would use records to represent complex nested key/value data structures (oddly similar to objects in object-oriented languages) that would frequently cross module boundaries, maps will help a lot. Records were the wrong tool for that job.

To put it briefly, places where records really felt out of place and cumbersome could be replaced by maps, but most record uses should not be replaced that way.

Don’t Drink too Much Kool-Aid:
It is often tempting to bring complex nested data structures to the party and use them as one big transparent object to pass in and out of functions or processes, and to then just use pattern matching to extract what you need.

Remember, however, that there is a cost to doing these things in message passing: Erlang data is copied across processes and working that way might be expensive.

Similarly, passing big transparent objects in and out of functions should be done with care. Building a sensible interface (or API) is already difficult; if you marry yourself to your internal data representation, enormous amounts of flexibility for the implementation can be lost.

It is better to think about your interfaces and message-based protocols with great care, and to limit the amount of information and responsibility that gets shared around. Maps are a replacement for dicts, not for proper design.

There are also performance impacts to be expected. Richard O’Keefe mentions it in his proposal:

You can’t make dict good for the record-like uses that frames are meant for without making it bad for its existing uses.

And the EEP from the OTP team also mentions something similar:

When comparing maps with records the drawbacks are easily remedied by maps, however the positive effects [sic] is not as easy to replicate in a built-in data-type where values are determined at runtime instead of at compile time.

  • Being faster than direct-indexing array, where indices and possibly the resulting value are determined at compile time, is hard. In fact it is impossible.
  • Memory model for maps where the efficiency is near that of records could be achieved by essentially using two tuples, one for keys and one for values as demonstrated in frames. This would impact performance of updates on maps with a large number of entries and thus constrain the capability of a dictionary approach.

For the core of your process loop, when you know all keys that should exist, a record would be a smart choice, performance-wise.

Maps Vs. Proplists

One thing that maps may beat are proplists. A proplist is a rather simple data structure that is extremely well-suited for options passed to a module.

The inet:setopts/2 call can take a list of options of the form [{active, true}, binary], and the file handling functions can take argument lists such as [read, write, append, {encoding, utf8}], for example. Both of these option lists can be read using the proplists module, and terms such as write will be expanded as if they were written as {write, true}.

a scale measuring proplists vs. maps

Maps, with their single-key access (whenever implemented) will represent a similar way to define properties. For example, [{active, true}] can be expressed with maps as #{active => true}. This is equally cumbersome, but it will make reading the option much simpler, given you won’t need to make a module call (thanks to Opts#{active}).

It’s somewhat to be expected that option lists that are mostly pairs will be replaced by maps. On the other hand, literal options such as readwrite, orappend will remain much nicer with proplists, from the user’s perspective.

Given that right now most functions that require options use proplists, keeping things going that way forward may be interesting for consistency’s sake. On the other hand, when the options are mostly pairs, using the proplists module can get cumbersome fast. Ultimately, the author of a library will have to make a decision between what could be internal clarity of implementation or general ecosystem consistency. Alternatively, the author could decide to support both ways of doing things at once to provide a graceful path towards deprecation of proplists, if that’s how they feel about it.

On the other hand, functions that used to pass proplists as a return value from functions should probably switch to maps. There is little legacy there, and usability should be much better for users that way.

Note: maps can use a clever trick to set many default values at once easily. Whereas proplists would require the use of proplists:get_value(Key, List, Default), maps can use their merge/2 function.

According to the specification, merge/2 will fuse two maps together, and if two keys are the same, the second map’s value will prevail. This allows to call maps:merge(Default, YourMap) and get the desired values. For example, maps:merge(#{key => default, other => ok}, #{other => 42}) will result in the map #{key => default, other => 42}.

This makes for an extremely convenient way to attribute default values manually, and then you can just use the resulting map without worrying about missing keys.

<<Previous     Index     Next>>

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s