Friday, May 23, 2008

Java design: Map vs. Set.get(key)

Java Set is a kind of collection, where all the elements are different. The equivalence is determined in the standard way, by hash/equals methods. The set has options to iterate its elements and check for element presence in it but there is no way to retrieve an element from the set by supplying the equivalent key. If you need such a option, you are suggested to use the Map interface. They tell that if you need the get-by-key in a Set, your design is wrong. Yet, it is wrong to think so. The Map structure is abused. I'll try to explain why.

The keys, also known as identifiers, are used for referencing. Within one program, objects are accessed by pointers (Object b = a.field.getSomething()). The pointers are natural and fast but invalid outside the program address space. Thus, files and DBs are used for communication, where externally understandable references are used. Normally, the linking is done by unique object names. Even human can interact the programs tending to give mnemonic names.

Consider a circuit editor. An engineer commands the program: 'join pin "a" of unit "u1" with pin "b" of unit "u2" by net "x"'. Here, objects of class Pin, Unit and Net are referred by name. The circuits are transferred between apps by netlist files:
(block "Circuit 1" (pin "in") (pin "out") (instance "AND" "u1") (instance "AND" "u2") (net "x" joins (u1.a, in)).

In this file, a block contains two ports, two instances of AND sub-blocks, and one net connecting two ports. The objects in the presented hierarchy inherently have a unique name within their container. There is only one instance "u1" in the block. It is natural, therefore, to establish a set of unique elements for every container. Using a map, on the other hand, results in bad design because of redundancy.

Redundancy is good in two cases: boosting performance (caching) or error detection. However, it is attribute of bad design when it is entailed for no advantage. That is because it increases complexity for no reason, consumes (mainly, storage) resources and creates potential for inconsistencies in the program.

The very possibility of inconsistency (also know as incoherency) is a sign of redundancy. When you use a map to retrieve the named objects by name, you have object name twice: in the key and in the value of the map:

class Net {
String name; // net name, unique within parent block
Block parent;
...
}
class Block {
String name; // block name, unique within parent block
Map<String, Net> connections;
void addNet(Net net) {
  connections.add(net.name, net) // name is duplicated
  net.parent = this;
}
...

All the objects in this example model have name, which is unique within the parent container. The example illustrates how a string key translates to a net object, which has a copy of the key. The name presents twice in the data model, consuming the twice memory needed. The inconsistency is produced when the key refers an object, which has a different name: ("a" -> net("b")). We must avoid redundancy (memory waste and potential for inconsistencies). The java.util.Map is a wrong structure to implement the lookup-demanding relationship. A better option would be
class Pin {
String name; // pin name, unique within parent block
Set<Net> connections;
void join(Net net) {
  connections.add(net) // no redundancy is created
}
}


Here is no room for error. In addition, the code is simpler and no data memory is wasted. The only problem is to define the equivalence so that container would match the elements by name as desired.

There is a definite analogy between a set of elements in java program and table of records in DB. The equivalence in the datatable is determined by key fields. There is no doubt that the primary key is not reflected to the rest of the record. The record is solid: it is constituted of all the fields including the key ones. Meantime, the keys in the map are separate objects from the values they map on. The redundancy can be illustrated in this way. Given a table
  field1(primary key), field2, field3

we are deprived of the facility to select an object by key field. Instead, we are invited to create an extra mapping
  field1(primary key) => field1(primary key), field2, field3.

Please tell me why is this map "a proper" design? I see only one reason: there is a group of Java adepts-fundamentalists, who define good design as "the one provisioned by java collections". The java.util.Set is exposed as mathematical abstraction, and contains(equivalent) satisfies the intersection-like math operations.

Yet, I'm sure that the set notion is also very useful outside of the math field. In this post, I have explained the need for Set and get-by-key in general-purpose applications. The omission is a great loss for programming, for Java. The Java Collections design flow must be rectified.

In the next post, I'll propose the Set with get(key) method along with the element equivalence management. It can be implemented in number of approaches.

Duplicated in http://valjok.blogspot.com/2012/02/why-java-set-does-not-have-get-method.html

No comments: