In a previous article I covered Array and ArrayList. Here, I’m going through other data structures in Java and some of their specificities, plus available operations to work with them. This is a high level summary and not meant to replace the official documentation. This article is applicable for Java 11 and further.
This article is part of a series on advanced Java concepts, also corresponding to topics to study for upgrading from level I certification to level II certification (OCP level). The new Oracle Java certification framework (as from Java 11) only offers one certification covering both levels, so you’ll want to go through the level I (“OCA”) topics as well.
This article was mentioned in JetBrains’ Java Annotated Monthly (January 2022)
Simplified structure of the Java data structure framework
These classes are part of the java.util package. Here’s the hierarchy of the most commonly used classes.
Iterable (I) allows any Collection (not to confuse with Collections!) to be used in a forEach loop
– – ->Collection (I) adds generic collection capabilities (add, remove…)
– – – – – – ->List (I) is a collection where elements are indexed by an integer
– – – – – – – – – – ->ArrayList
– – – – – – ->Set (I) is a collection with unique elements
– – – – – – – – – – ->SortedSet(I) maintains ordering
– – – – – – – – – – – – – – – – ->TreeSet
– – – – – – ->Deque (I) double-ended queue, FIFO & LIFO
– – – – – – – – – – ->ArrayDeque
Map is a different kind of data structure, which doesn’t implement/extend Iterable and Collection. Therefore, it doesn’t have their capabilities per se.
Map (I)
– – -> HashMap
Data structures specifics
Apart from the explicitly immutable ones, collections are mutable and can vary in size. They have a default capacity (which could be considered as the “expected quantity”), but the capacity will grow bigger if there are more elements than its value. Capacity is not the size.
ArrayList
Constructors:
- no args (default capacity = 10)
- with capacity argument
- with Collection argument (to populate the list)
Fixed size lists:
- Arrays.asList(array[ ]) creates a list from an array, backed by an array (which size is fixed!)
- List.of(Collection or …varargs)
- As usual, the list being immutable doesn’t mean the referenced objects are as well!
HashSet
Constructors:
- no args (default capacity = 16)
- with capacity argument
- with capacity and load factor arguments (default factor = 0.75)
- with Collection argument
Uniqueness is checked based on equality of the elements.
The load factor allows to separate the elements in buckets based on alike hash codes. The load factor defines how buckets are created. For example, the default 0.75 means that when a bucket gets to 75% capacity, a new bucket is created for that group.
ArrayDeque
Constructors:
- no args (default capacity = 16)
- with capacity argument
- with Collection argument
offerFirst(e) adds the elements to head
pollFirst() gets the first element from head and removes it
peekFirst() grabs the first element without removing it
same methods exist with Last instead of First, and tail being used instead of head.
you cannot offer the value null.
HashMap
<key, value> based structure.
Constuctors:
- no args (default capacity = 16)
- with capacity argument
- with capacity and load factor arguments (default = 0.75)
- with a Map argument (not a Collection!)
Read-only Maps:
- Map.of(key, value, key, value, …) up to ten map entries (key + value)
- Map.ofEntries(Map.entry<key, value>…entries) (varargs)
put(K,V): value
remove(K): value
get(K): value
containsKey(K)/containsValue(V): boolean
Iterate over a map: get the set of keys (.keySet()) or values (.values()), or the entry set (.entrySet())
Convert a Collection to array
Type[ ] array = myCollection.toArray(refArray): Type[ ] -> the array passed as parameter is used if its size fits, otherwise a new array is created.
Filter a Collection
myCollection.removeIf(Predicate) removes all elements that match the predicate.
Collections class
Collections class is not to be confused with the Collection interface. Collections is also part of the java.util package. It contains utility methods to use with collections and maps.
Collections.sort(myCollection) uses Comparable (more precisely, the implementation of Comparable of the elements inside the collection), or can take a Comparator as second parameter.
Collections.reverse(myCollection)
Collections.shuffle(myCollection)
Collections.fill(myCollection, element) fills the collection with the element passed as second parameter; throws UnsupportedOperationException is the collection doesn’t support the .set() method, for example, if used with… a Set (yeah, they didn’t make it easy on this one). Set classes have a .add() method but not a .set() method. It’s also only logical that you can’t use it with sets, since they’re collections of unique elements.
Collections and concurrent access
Mutable collections can be accessed by several threads modifying them. Any object in the heap that’s not immutable is not thread-safe. Other threads may observe incomplete modification state. Making a collection thread-safe does not guarantee the thread safety of the elements it contains. To be totally thread-safe, both the collection and the contained elements must be thread-safe.
There are 3 approaches to prevent concurrent access:
- Make the collection read-only, immutable: it’s fast, but you must work with read-only collections…
- Synchronized, which is slow and unscalable.
- Copy-on-write, which is fast but consumes a lot of memory.
Here are some example declarations for each approach:
Set<…> readOnlySet = Collections.unmodifiableSet(set);
Map<…,…> syncMap = Collections.synchronizedMap(map);
List<…> copyOnWriteList = new CopyOnWriteArrayList<>(list);
Because they’re immutable, they must be initialized with their content.
The synchronized approach blocks access on write to the object when it’s being modified by another process, which can create a bottleneck and performance problems if a lot of threads work in parallel.
The copy-on-write approach creates a replica of the collection for each thread (local to each) and then merges all the copies.
Legacy classes
Some legacy classes still exist but are rarely used (and they’re not part of the exam topics for Java certification), like Vector (kind of ArrayList) and HashTable (kind of HashMap). Their behaviors were really similar to the more recent classes, except they were synchronized, thus let no choice for the implementation of thread-safe strategies.
Nice post, and really good for your intended audience. A few small points:
First, regarding your claim that “Uniqueness is checked based on the hash code of the elements.” This is incorrect. Uniqueness is determined by the “equals()” method of the elements. If there are two objects, o1 and o2, such that o1.hashCode() == o2.hashCode(), but o1.equals(o2) == false, the set can contain both o1 and o2. Put another way, if you add o1 to a set (set.add(o1)) then a call to set.contains(o2) will return false.
Second, in the section on “Collections and concurrent access” you state that “Any object in the heap that’s not immutable is not thread-safe”. That is incorrect, there are data structures which allow thread-safe concurrent (non-blocking) access. The JDK offers a ConcurrentHashMap, ConcurrentLinkedQueue, and others. These allow multiple threads to access them concurrently, and still remain correct – their perform much better than merely synchronizing on (e.g.) a HashMap.
Hi Dan,
Thanks a lot for your comment. I’ve changed the first statement.
The second one is valid I think (but please tell me if I’m wrong), it comes from the Oracle course and how I understand it, is that the collection might be thread-safe but it doesn’t make its elements thread safe: to achieve this, elements must be immutable. In other words, the collection is not really immutable unless all contained elements are immutable too.
I think my comment wasn’t clear.
“Making a collection thread-safe does not guarantee the thread safety of the elements it contains. ” – This statement is true, and it is something beginners don’t often understand, so it is a good point to reiterate.
When you said there are “3 approaches to prevent concurrent access” I assumed you meant, to prevent (unsafe) concurrent access the *collection object* (not its contents). If that is what you meant, there is a 4th approach, and that is to use an implementation specifically designed to allow (safe) concurrent access, like a ConcurrentHashMap. The statement above still applies, of course, e.g. the items in that map are not automatically thread safe.
It also might make sense to clarify the statement “Any object in the heap that’s not immutable is not thread-safe”, as it is confusing. Many mutable objects are thread safe, as long as they are only accessed via their API, and if their internals are properly encapsulated (and there is no “hacking” their insides via reflection) – an object where you’ve “synchronized” all the methods and have no other way to access the fields in the object, for example. (And while it is out of scope here – a final field can also be changed via reflection, and so nothing is really immutable.)
I think the most important point about thread safety for a beginner to understand is that it requires *all* the threads to “play by the rules”. If one thread has the lock on an object (inside a synchronized method), that prevents other threads from grabbing the same lock – but it doesn’t prevent a thread from changing a public member of that object.
I hope this is clear.