Custom String Hashing – Part I

In this post, the first of two, we will discuss a workaround for providing custom String hashing. While this is not the most elegant solution, it does the job.

Background

HashMaps and HashSets are commonly used data structures in Java. One would expect the ability to override the hash() function for these classes (well, it’s really just HashMap‘s hash function as HashSet is actually a HashMap with an empty Object value – don’t get me started), but you actually cannot. The hash function is static final (see code below from Java 1.8u51, lines 336-339 of java.util.HashMap), meaning there are three reasons it cannot be overridden: (1) it is not public, (2) it is static, and (3) it is final.

java.util.HashMap from Java 1.8u51

static final int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

Well, since we cannot override the function, maybe we can override some internal component. Alright, so it makes use of hashCode() for the passed Object. Can we override this function? It turns out that Java’s internally defined object types (e.g., String, Integer, Double, etc.) are final; thus, we cannot override those functions. But all is not lost. The generic Object class (which all user-defined classes are apart of), are not final. Herein lies our workaround.

Workaround for Custom String Hashing

The workaround is quite simple really. Create a new class (e.g., StringKey in this example) that stores a String, custom hash function reference (to be discussed later), and overrides its Object.hashCode() and Object.equals(Object) methods. The print statement on line 19 is for testing and should be removed for actual implementation.

import functions.HashFunction;
import java.util.Objects;
/**
 * Custom string class. Stores a {@link String} and {@link HashFunction}, which
 * is used for hashing.
 * @author Ray Hylock
 */
public class StringKey {
    private String key;
    private HashFunction function;
     
    public StringKey(String key, HashFunction function){
        this.key = key;
        this.function = function;
    }
 
    @Override
    public int hashCode(){
        System.out.println("Custom hashing: " + key);
        return function.hash(key);
    }
 
    @Override
    public boolean equals(Object obj) {
        if (obj == null) return false;
        if (getClass() != obj.getClass()) return false;
        final StringKey other = (StringKey) obj;
        return Objects.equals(this.key, other.key);
    }
}

Then, we extend HashMap and HashSet to allow the use of custom hash functions.

import functions.*;
import functions.HashFunction;
import functions.HashFunction.FUNCTIONS;
import java.util.HashMap;
/**
 * Hash map class allowing custom hash functions.
 * @author Ray Hylock
 * @param <K> the type of keys maintained by this map
 * @param <V> the type of mapped values
 * 
 */
public class HashMapCustom<K, V> extends HashMap<K, V> {
    private static final FUNCTIONS DEFAULT = FUNCTIONS.JAVA1_8;
    private HashFunction function;
     
    /**
     * Use the default function {@link #DEFAULT}.
     */
    public HashMapCustom(){
        this(DEFAULT);
    }
     
    /**
     * Use the custom {@link HashFunction}.
     * @param function custom {@link HashFunction}
     */
    public HashMapCustom(HashFunction function){
        this.function = function;
    }
     
    /**
     * Use one of the available hashing functions.
     * @param function available {@link FUNCTIONS}
     */
    public HashMapCustom(FUNCTIONS function){
        super();
        switch(function){
            case AP: this.function = new AP(); break;
            case BKDR: this.function = new BKDR(); break;
            case DEK: this.function = new DEK(); break;
            case DJB: this.function = new DJB(); break;
            case DJBX: this.function = new DJBX(); break;
            case ELF: this.function = new ELF(); break;
            case FNV_1A: this.function = new FNV_1a(); break;
            case FNV_1: this.function = new FNV_1(); break;
            case IDENTITY_HASH_CODE: this.function = new IdentityHashCode(); break;
            case JAVA1_8: this.function = new Java1_8(); break;
            case JENKINS: this.function = new Jenkins(); break;
            case JENKINS_OAT: this.function = new JenkinsOAT(); break;
            case JS: this.function = new JS(); break;
            case PJW: this.function = new PJW(); break;
            case RS: this.function = new RS(); break;
            case SDBM: this.function = new SDBM(); break;
            case STRING_HASH_CODE: this.function = new StringHashCode(); break;
            case SUPER_FAST_HASH: this.function = new SuperFastHash(); break;
        }
    }
     
    /**
     * Return the {@link HashFunction}.
     * @return the {@link HashFunction}
     */
    public HashFunction getFunction(){
        return function;
    }
}
import functions.*;
import functions.HashFunction;
import functions.HashFunction.FUNCTIONS;
import java.util.HashSet;
/**
 * Hash set class allowing custom hash functions.
 * @author Ray Hylock
 * @param <E> the type of elements maintained by this set
 * 
 */
public class HashSetCustom<E> extends HashSet<E> {
    private static final FUNCTIONS DEFAULT = FUNCTIONS.JAVA1_8;
    private HashFunction function;
     
    /**
     * Use the default function {@link #DEFAULT}.
     */
    public HashSetCustom(){
        this(DEFAULT);
    }
     
    /**
     * Use the custom {@link HashFunction}.
     * @param function custom {@link HashFunction}
     */
    public HashSetCustom(HashFunction function){
        this.function = function;
    }
     
    /**
     * Use one of the available hashing functions.
     * @param function available {@link FUNCTIONS}
     */
    public HashSetCustom(FUNCTIONS function){
        super();
        switch(function){
            case AP: this.function = new AP(); break;
            case BKDR: this.function = new BKDR(); break;
            case DEK: this.function = new DEK(); break;
            case DJB: this.function = new DJB(); break;
            case DJBX: this.function = new DJBX(); break;
            case ELF: this.function = new ELF(); break;
            case FNV_1A: this.function = new FNV_1a(); break;
            case FNV_1: this.function = new FNV_1(); break;
            case IDENTITY_HASH_CODE: this.function = new IdentityHashCode(); break;
            case JAVA1_8: this.function = new Java1_8(); break;
            case JENKINS: this.function = new Jenkins(); break;
            case JENKINS_OAT: this.function = new JenkinsOAT(); break;
            case JS: this.function = new JS(); break;
            case PJW: this.function = new PJW(); break;
            case RS: this.function = new RS(); break;
            case SDBM: this.function = new SDBM(); break;
            case STRING_HASH_CODE: this.function = new StringHashCode(); break;
            case SUPER_FAST_HASH: this.function = new SuperFastHash(); break;
        }
    }
     
    /**
     * Return the {@link HashFunction}.
     * @return the {@link HashFunction}
     */
    public HashFunction getFunction(){
        return function;
    }
}
Hash Functions

The difficult (well, more tedious) part of this process was creating all the hash functions. Fortunately, for another project of mine (the UPC collections), I created quite a few for 64-bit indices. Since we are dealing with Java collections here, those that could, were refactored to 32-bit. Included here are 18 separate functions (click download in upper right-hand corner of the Google Drive window):

  • AP: Arash Partow hashing function
  • BKDR: Brian Kernighan and Dennis Ritchie hashing function. From “The C Programming Language”
  • DEK: Donald E. Knuth hashing function from “The Art of Computer Programming” volume 3
  • DJB: Daniel J. Bernstein hashing function
  • DJBX: Daniel J. Bernstein hashing function – XOR variant
  • ELF: UNIX ELF hashing function
  • FNV_1: Fowler-Noll-Vo (FNV-1) 32-bit hashing function
  • FNV_1a: Fowler-Noll-Vo (FNV-1a) 32-bit hashing function variant
  • IdentityHashCode: Java’s identity hash code
  • JS: Justin Sobel hashing function
  • Java1_8: Java 1.8 hash function (default for HashMapCustom and HashSetCustom)
  • Jenkins: Jenkins hashing function
  • JenkinsOAT: Jenkins one-at-a-time hash function
  • PJW: Peter J. Weinberger 32-bit hashing function
  • RS: Robert Sedgwick hashing function from “Algorithms in C”
  • SDBM: a reimplementation of NDBM (used by databases) hashing function
  • StringHashCode: Java’s default String.hashCode()
  • SuperFastHash: Super fast hash (which really isn’t from experience)
Custom Hash Function

If you have a hash function to implement, then you will follow these steps. First, create a class that extends functions.HashFunction (included in the download). Then, override public int hash(String key) and add your function. Here is an example using Java’s String.hashCode() as the sample code (the print statement on line 5 is for testing):

import functions.HashFunction;
public class CustomHashFunction extends HashFunction {
    @Override
    public int hash(String key) {
        System.out.println(" Using my custom hash function");
        int h = 2147483647; // largest int prime number
        for (int i = 0; i < key.length(); i++) 
            h = 31 * h + key.charAt(i);
        return h;
    }
}
Testing

Alright, so let’s put everything together into a test class. Below, we test HashSetCustom and HashMapCustom using the built in FNV_1A hashing algorithm. Then, the custom hash function from the previous section is used in a map.

import functions.HashFunction;
import functions.HashFunction.FUNCTIONS;
import static java.lang.System.out;
import java.util.HashMap;
import java.util.HashSet;
/**
 * Test class for {@link HashMapCustom} and {@link HashSetCustom}.
 * @author Ray Hylock
 */
public class HashTest {
    public static void main(String[] args) {
        hashSet();
        hashMap();
        customHashMap();
    }
 
    /**
     * HashSetCustom example.
     */
    private static void hashSet() {
        out.println("----- Hash set -----");
        // instantiate
        HashSet<StringKey> hs = new HashSetCustom<StringKey>(FUNCTIONS.FNV_1A);
        HashFunction f = ((HashSetCustom) hs).getFunction();
 
        // add
        String key = "hello world";
        out.println("Adding: " + key);
        hs.add(new StringKey(key, f));
 
        // containts
        out.println("\nChecking " + key);
        out.println("Contains: " + hs.contains(new StringKey(key, f)));
    }
 
    /**
     * HashMapCustom example.
     */
    private static void hashMap() {
        out.println("\n----- Hash map -----");
        // instantiate
        HashMap<StringKey, Integer> hm
                = new HashMapCustom<StringKey, Integer>(FUNCTIONS.FNV_1A);
        HashFunction f = ((HashMapCustom) hm).getFunction();
 
        // put
        String key = "hello world";
        int value = 17;
        out.println("Puttig: " + key + ", " + value);
        hm.put(new StringKey(key, f), value);
 
        // containts key
        out.println("\nChecking key " + key);
        out.println("Contains key: " + hm.containsKey(new StringKey(key, f)));
 
        // containts value
        out.println("\nChecking value " + value);
        out.println("Contains value: " + hm.containsValue(value));
    }
     
    /**
     * Customized hash for HashMapCustom example.
     */
    private static void customHashMap(){
        HashFunction hf = new CustomHashFunction();
         
        out.println("\n----- Custom hash map -----");
        // instantiate
        HashMap<StringKey, Integer> hm
                = new HashMapCustom<StringKey, Integer>(hf);
        HashFunction f = ((HashMapCustom) hm).getFunction();
 
        // put
        String key = "hello world";
        int value = 17;
        out.println("Puttig: " + key + ", " + value);
        hm.put(new StringKey(key, f), value);
 
        // containts key
        out.println("\nChecking key " + key);
        out.println("Contains key: " + hm.containsKey(new StringKey(key, f)));
 
        // containts value
        out.println("\nChecking value " + value);
        out.println("Contains value: " + hm.containsValue(value));
    }
}

The output for this test class can be found below. As you can see, each time we add or put a value for the first two tests, we see the line “Custom hashing” indicating the use of the overridden hashCode() function in StringKey. In the final segment labeled “Custom hash map,” we see an additional line “Using my custom hash function” showing our custom function is in use at the override layer.

----- Hash set -----
Adding: hello world
Custom hashing: hello world
 
Checking hello world
Custom hashing: hello world
Contains: true
 
----- Hash map -----
Puttig: hello world, 17
Custom hashing: hello world
 
Checking key hello world
Custom hashing: hello world
Contains key: true
 
Checking value 17
Contains value: true
 
----- Custom hash map -----
Puttig: hello world, 17
Custom hashing: hello world
 Using my custom hash function
 
Checking key hello world
Custom hashing: hello world
 Using my custom hash function
Contains key: true
 
Checking value 17
Contains value: true