1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

HashMap micro-optimisations

Discussion in 'Archives (Read-Only)' started by raphfrk, Jan 28, 2013.

1. raphfrk

Local Time:
11:38 AM
Java hash maps use power of 2 length tables. This makes lookup much faster since you can just use bit masking to convert the key into a hash table index.

However, it means that if your keys are similar in the LSBs, then it doesn't work as a very good hash function, since it ignores the MSBs.

To solve this, they added an internal hash method to their HashMap classes. This kind of reduces the benefit of going to power of 2 in the first place.

The other type of hash function is based on prime numbers. If the internal arrays have a prime number length, you can figure out the index by finding the remainder mod the prime number.

This has the nice feature that it hashes numbers very effectively. Two numbers which are next to each other will be placed in different bins. In fact, unless the difference between the numbers is a multiple of the prime number, they will be placed in different bins.

There are a set of prime numbers called Mersenne primes which are 2^n - 1. For example, 255 would be a Mersenne prime, if it was prime, since it is 2^8 - 1.

It turns out dividing by these numbers can be done quickly.

<maths>
So, assume the divisor is p and it is equal to 2^n - 1 and B is p + 1 (so is a power of 2).

You can express any number as

X*B + Y

So, 257 = 1 * 256 + 1

We want the remainder of that mod p, so

X * B + Y mod p

With mods, you can get the mod of each individual term, B can be replaced by B mod p

X * (B mod p) + Y mod p

But, B mod p = (p + 1) mod p = 1 mod p

X * (1) + Y mod p

X + Y mod p

So, you just have to add X and Y together and get the remainder for that
</maths>

In the case of dividing by 255, you need to shift by 8.

So, to get the remainder of i, you just need to do the following

Code (text):

int rem = i;
while (rem >= p) {
int X = rem >> 8;
int Y = rem & (255);  // i.e. rem & p
rem = X + Y;
}

This assumes positive numbers. It has to be a loop, since it mightn't get it into range on the first pass. The remainder is reduced by around 256 each pass, so once you pass 65536, you need 2 shifts.

As an example,
257 is 2 mod 255

257 >> 8 = 1
257 & 255 = 1

= 2

Anyway, the conclusion is that if you have hash maps that have lengths of Mersenne primes, then you should be able to get fast calculation and prime lengths.

Ofc, you still effectively need a pre-hash method call, so maybe it isn't really a speed boost, relative to what Java actually uses.

I did a quick check with the following. The JAVA_HASH is the method in the HashMap class that does the re-hashing.

Code (text):

public class DivTst {

private static int dem = 65535;
private static int LENGTH = 100000000;

private static boolean FAST_MOD = false;
private static boolean JAVA_HASH = true;

public static void main(String[] args) {

divTst(LENGTH);
divTst(LENGTH);
divTst(LENGTH);

long start = System.currentTimeMillis();

int t = divTst(LENGTH);

long end = System.currentTimeMillis();

System.out.println("Sum: " + t);
System.out.println("Time: " + (end - start));

}

private static int divTst(int n) {

int t = 0;

for (int i = 0; i < n; i++) {
if (JAVA_HASH) {
int h = i;
h ^= (h >>> 20) ^ (h >>> 12);
h = h ^ (h >>> 7) ^ (h >>> 4);
t += h;
} else if (!FAST_MOD) {
t += i % dem;
} else {
int r = ((i & 0xFFFF) + (i >>> 16));
r = r >= dem ? (r - dem) : r;
t += r;
}
}

return t;
}

}

and the fast calculation was 82ms vs 307ms for the % operator. The java hash is 160ms.
2. Afforess

Local Time:
6:38 AM
kitskub likes this.