The Introduction Section described some challenges in designing associative containers. This section describes the STL's solution and motivation for an alternative solution. It is organized as follows.
The STL (or its extensions) currently offer associative containers based on underlying red-black trees or collision-chaining hash tables. For association, containers based on trees are parameterized by a comparison functor, and containers based on hash tables are parameterized by a hash functor and an equivalence functor.
For each underlying data-structure, the STL offers four containers with different mapping semantics. A map-type uniquely maps each key to some datum, a set-type stores uniquely keys, a multimap-type non-uniquely maps each key to some datum, and a multiset-type non-uniquely stores keys.
Containers contain various iterator-based methods. E.g., all containers have constructors taking a pair of iterators, and transactionally construct an object containing all elements in the iterators' range. Additionally, it is possible to (non-transactionally) insert a range given by iterators, or erase such a range. Other methods are implicitly range-based, e.g., it is possible to test the equivalence of two associative container objects via operator==.
In order to function efficiently in various settings, associative containers require a wide variety of policies.
For example, a hash policy instructs how to transform a key object into some non-negative integral type; e.g., a hash functor might transform "hello" into 1123002298. A hash table, though, requires transforming each key object into some non-negative integral type in some specific domain; e.g., a hash table with 128 entries might transform the "hello" into position 63. The policy by which the hash value is transformed into a position within the table can dramatically affect performance.
Additionally, most hash-table algorithms encounter collisions. To mitigate the cost of these collisions, it sometimes is beneficial to store the hash value along with each element [clrs2001, austern01htprop]. While this improves performance for complex keys, it hampers performance for simple keys, and is best left as a policy.
Tree-based containers allow reasonable access while maintaining order between elements. In some cases, however, tree-based containers can be used for additional purposes. E.g.,consider Figure Sets of line intervals -A, which shows an example of a tree-based set storing half-open geometric line intervals. An std::set with this structure can efficiently answer whether [20, 101) is in the set, but it cannot efficiently answer whether any interval in the set overlaps [20, 101), nor can it efficiently enumerate all intervals overlapping [20, 101). A well-known augmentation to balanced trees can support efficient answers to such questions [clrs2001]. Namely, an invariant should be maintained whereby each node should contain also the maximal endpoint of any interval within its subtree, as in Figure Sets of line intervals -B. In order to maintain this ivariant, though, an invariant-restoring policy is required.
Consider a generic function manipulating an associative container, e.g.,
template< class Cntnr> int some_op_sequence (Cntnr &r_cnt) { ... }
The underlying data structure affects what the function can do with the container object.
For example, if Cntnr is std::map, then the function can use std::for_each(r_cnt.find(foo), r_cnt.find(bar), foobar) in order to apply foobar to all elements between foo and bar. If Cntnr is a hash-based container, then this call's results are undefined.
Also, if Cntnr is tree-based, the type and object of the comparison functor can be accessed. If Cntnr is hash based, these queries are nonsensical
These types of problems are excaberated when considering the wide variety of useful underlying data-structures. Figure Different underlying data structures shows different underlying data-structures (the ones currently supported in pb_assoc). A shows a collision-chaining hash-table; B shows a probing hash-table; C shows a red-black tree; D shows a splay tree; E shows a tree based on an ordered vector (the tree is implicit in the order of the elements); E shows a list-based container with update policies.
These underlying data structures display different behavior. For one, they can be queried for different policies. Furthermore:
A unified tag and traits system (as used for the STL's iterators, for example) can ease generic manipulation of associative containers based on different underlying data-structures.
In some cases, map and set semantics are inappropriate. E.g., consider an application monitoring user activity. Such an application might be designed to track a user, the machine(s) to which the user is logged, application(s) the user is running on the machine, and the start time of the application. In this case, since a user might run more than a single application, there can be no unique mapping from a user to specific datum.
The STL's non-unique mapping containers (e.g., std::multimap and std::multiset) can be used in this case. These types of containers can store store two or more equivalent, non-identical keys [kleft00sets]. Figure Non-unique mapping containers in the STL's design shows possible structures of STL tree-based and hash-based containers, multisets, respectively; in this figure, equivalent-key nodes share the same shading.
This design has several advantages. Foremost, it allows maps and multimaps, and sets and multisets, to share the same value_type, easing generic manipulation of containers with different mapping semantics.
Conversely, this design has possible scalability drawbacks, due to an implicit "embedding" of linked lists. Figure Embedded lists in STL multimaps -A shows a tree with shaded nodes sharing equivalent keys; Figure Embedded lists in STL multimaps -A explicitly shows the linked lists implicit in Figure Non-unique mapping containers in the STL's design. The drawbacks are the following.
[meyers02both] points out that a class's methods should comprise only operations which depend on the class's internal structure; other operations are best designed as external functions. Possibly, therefore, the STL's associative containers lack some useful methods, and provide some redundant methods.