Hash function for a string

First, it usually does not matter that much in practice. Most hash functions are “good enough”.

But if you really care, you should know that it is a research subject by itself. There are thousand of papers about that. You can still get a PhD today by studying & designing hashing algorithms.

Your second hash function might be slightly better, because it probably should separate the string "ab" from the string "ba". On the other hand, it is probably less quick than the first hash function. It may, or may not, be relevant for your application.

I’ll guess that hash functions used for genome strings are quite different than those used to hash family names in telephone databases. Perhaps even some string hash functions are better suited for German, than for English or French words.

Many software libraries give you good enough hash functions, e.g. Qt has qhash, and C++11 has std::hash in <functional>, Glib has several hash functions in C, and POCO has some hash function.

I quite often have hashing functions involving primes (see Bézout’s identity) and xor, like e.g.

#define A 54059 /* a prime */
#define B 76963 /* another prime */
#define C 86969 /* yet another prime */
#define FIRSTH 37 /* also prime */
unsigned hash_str(const char* s)
{
   unsigned h = FIRSTH;
   while (*s) {
     h = (h * A) ^ (s[0] * B);
     s++;
   }
   return h; // or return h % C;
}

But I don’t claim to be an hash expert. Of course, the values of A, B, C, FIRSTH should preferably be primes, but you could have chosen other prime numbers.

Look at some MD5 implementation to get a feeling of what hash functions can be.

Most good books on algorithmics have at least a whole chapter dedicated to hashing. Start with wikipages on hash function & hash table.

Related Posts:

Leave a Comment Cancel reply