Sunday, June 15, 2014

Disjoint-set data structure

Disjoint-set data structure is the one of the best way to have separated sets of unique items.
Union-find is a simple algorithm which allows us to identify whether two (or more) points are in the same set. For example we have the following disjoint-sets:

It's not a hard task to examine if points 6 and 10 are in the same disjoin-set, but for a sufficient number of points we need to have correct algorithm (correct means fast, reliable and valid). The best practice is using union-find algorithm.
To implement the algorithm we need:
1. Create disjoint-data set adding items one-by-one, it could be performed on initializating of our data.
2. Create a special 'root' nodes array
3. Get a result using 'root' nodes

Lets use this class to store our items:

static class UnionItem {  

  private int itemValue;

  private int neighborItemValue;

  public UnionItem(int itemValue, int neighborItemValue) {
   this.itemValue = itemValue;
   this.neighborItemValue = neighborItemValue;
  public int getItemValue() {
   return itemValue;

  public int getNeighborItemValue() {
   return neighborItemValue;


For our example we have the following array (ROOT_ID can be replaced by repeating the actual item value)

private final static int ROOT_ID = -1;

private static UnionItem[] items = new UnionItem[] {
   new UnionItem(1, ROOT_ID),   
   new UnionItem(5, ROOT_ID),
   new UnionItem(8, ROOT_ID),
   new UnionItem(2, 1),
   new UnionItem(11, 2),
   new UnionItem(4, 11),
   new UnionItem(7, 5),
   new UnionItem(6, 7),
   new UnionItem(9, 8),
   new UnionItem(3, 9),
   new UnionItem(10, 9)

For a 'root' nodes we need to have an array

private static int[] unions = new int[items.length + 1];

Array may be filled out in a cycle:

for (int iter = 0; iter < items.length; ++iter) {
 int currentValue = items[iter].getItemValue();
 if (items[iter].getNeighborItemValue() != ROOT_ID) {
     unions[currentValue] = getParentValue(items[iter]
 } else {
      unions[currentValue] = currentValue;

public static int getParentValue(int val) {
 if (val != unions[val]) {
  val = getParentValue(unions[val]);
 return val;
'root' nodes array will look like: 1 1 8 1 5 5 5 8 8 8 1. We can say for sure that element 3 is included in the same disjoint-set as elements 8,9,10. Why? Because of our algorithm that uses indicies as element's values and values available by those indicies as identificators of the set.

Then we just need the method to examine two value in one union

public static boolean isInUnion(int val1, int val2) {
 return unions[val1] == unions[val2];

No comments:

Post a Comment