Friday, November 6, 2015

A few words about java 8 performance, sorting complexity and leetcode task


There is an easy leetcode task:

Given two strings s and t, write a function to determine if t is an anagram of s.
For example,
s = "anagram", t = "nagaram", return true.
s = "rat", t = "car", return false.
And the function signature is 
public boolean isAnagram(String s, String t) {
}
At the first glance it could be solved with two algorithms.
The first algorithm:
1. convert input strings to char arrays
2. sort char arrays
3. compare arrays symbol-by-symbol, if they are not identical return false, otherwise we have valid anagram and must return true
Coplexity of the first algorithm equals to complexety of sorting char array which "offers O(n log(n)) performance" 

The second algorithm:
1. Iterate through all symbols of the 's' string and put each symbol to the map where key is the symbol itself and the value is the counter (if we met a symbol we need to increment the counter)
2. Iterate through all symbols of the 't' string and decrement the counter for each symbol in the map 
3. Iterate through all map values and check if counter equals '0' for every symbol (map key), it means that we got valid anagram
Complexity of the second algorithm equals to complexity of iterating through the arrays and symbol-counter map which is no more than O(n).
Looks like the second solution should work faster but leetcode checker engine doesn't think so...
 Here is the code of the first algorithm:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
public boolean isAnagram(String s, String t) {
        char[] sArr = s.toCharArray();
        char[] tArr = t.toCharArray();
        Arrays.sort(sArr);
        Arrays.sort(tArr);
        if (sArr.length != tArr.length) {
            return false;
        }
        for (int iter = 0; iter < sArr.length; ++iter) {
            if (sArr[iter] != tArr[iter]) {
                return false;
            }
        }
        return true;
}
And that's what we got after task submit
It took 9ms to run the solution on test environment.
Let's check another solution, source code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
public boolean isAnagram(String s, String t) {
        Map<String, Integer> sMap = new HashMap<String, Integer>();
        for(char ch : s.toCharArray()) {
            String chStr = String.valueOf(ch);
            if (sMap.containsKey(chStr)) {
                Integer counter = sMap.get(chStr);
                counter += 1;
                sMap.put(chStr, counter);
            } else {
                sMap.put(chStr, new Integer(1));
            }
        }
        
        for(char ch : t.toCharArray()) {
            String chStr = String.valueOf(ch);
            if (sMap.containsKey(chStr)) {
                Integer counter = sMap.get(chStr);
                counter -= 1;
                sMap.put(chStr, counter);
            } else {
                return false;
            }
            
        }
        for (Integer counter : sMap.values()) {
            if (counter != 0) {
                return false;
            }
        }
        return true;
}
Runtime results:

45 ms!!! 5 times worse than the first algorithm! I can't believe that looping and read-write operations on maps more expensive than array sorting.
Just out of curiosity I applied Java 8 approach to find the map value which is not equal to 0. I replaced 25-29 lines of code with the following

        // for (Integer counter : sMap.values()) {
        //     if (counter != 0) {
        //         return false;
        //     }
        // }
        // return true;
        boolean isUniqSymbolDetected = sMap.values().stream().anyMatch(val -> val > 0);
        return !isUniqSymbolDetected;
Ta-dah!!
137 ms! It is almost 3 times worse than the worst runtime for 'classic' map iteration solution.



No comments:

Post a Comment