Wednesday, February 15, 2012

Why java Set does not have get() method?

I have mentioned the same simple question in StackOverflow that I asked many years ago in that was one of the reasons to ban be there. I have my considerations.

I had the same question in java forum years ago. They told me that the Set interface is defined. It cannot be changed because it will break the current implementations of Set interface. Then, they started to claim bullshit, like you see here: "Set does not need the get method" and started to drill me that Map must always be used to get elements from a set.
If you use the set only for mathematical operations, like intersection or union, then may be contains() is sufficient. However, Set is defined in collections to store data. I explained for need get() in Set using the relational data model.
In what follows, an SQL table is like a class. The columns define attributes (known as fields in Java) and records represent instances of the class. So that an object is a vector of fields. Some of the fields are primary keys. They define uniqueness of the object. This is what you do for contains() in Java:
class Element {

        public int hashCode() {return sumOfKeyFields()}
        public boolean equals(Object e) {keyField1.equals(e) && keyField2.equals(e) && ..}
I'm not aware of DB internals. But, you specify key fields only once, when define a table. You just annotate key fields with @primary. You do not specify the keys second time, when add a record to the table. You do not separate keys from data, as you do in mapping. SQL tables are sets. They are not maps. Yet, they provide get() in addition to maintaining uniqueness and contains() check.
You see how they say that Set with get would be redundant? It is because their favorite map, which they impose to use in place of set, introduces the redundancy. The call put(obj.getKey(), obj) stores two keys: one for map key and duplicate as part of the object. This is the redundancy. The duplication involves more bloat in the code and wastes memory consumed at Runtime. I do not know about DB internals, but database normalization says that such duplication is bad idea. Redundancy means that it may happen that key in the map does not match the key of the referred object. The contradictions are a prominent mark of redundancy. Edgar F. Codd proposed DB normalization just to get rid of redundancies and their inferred inconsistencies.
So, we have 3 points, why using a map for implementing get in set is bad:
  1. redundancy in Runtime storage
  2. code bloat
  3. data storage normalization
I was banned from Sun forum for pointing this out. This world is dominated by bigots. They do not want to see concepts and how things can be improved. They see only actual world and cannot imagine that design of Collections may be different or is missing something. It is dangerous to remind rationale things to such people. They teach you their blindness and punish if you do not obey.

Duplicate of

Finally, I have discovered where normalization is specific on the topic: Normalization will never generate two tables with a one-to-one relationship between them. There is no theoretical reason to separate a single entity like this with some fields in a single record of one table and others in a single record of another table

No comments: