Learn from these links:
http://stackoverflow.com/questions/730620/how-does-a-hash-table-work
https://en.wikipedia.org/wiki/Hash_table
http://www.cs.cornell.edu/courses/cs3110/2011sp/lectures/lec20-amortized/amortized.htm
What is hash algorithm (hash function)?
A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash). The hash value is representative of the original string of characters, but is normally smaller than the original.
Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string. Hashing is also used in encryption.
This term is also known as a hashing algorithm or message digest function.
Hashing is used with a database to enable items to be retrieved more quickly. Hashing can also be used in the encryption and decryption of digital signatures. The hash function transforms the digital signature, then both the hash value and signature are sent to the receiver. The receiver uses the same hash function to generate the hash value and then compares it to that received with the message. If the hash values are the same, it is likely that the message was transmitted without errors.
One example of a hash function is called folding. This takes an original value, divides it into several parts, then adds the parts and uses the last four remaining digits as the hashed value or key.
Another example is called digit rearrangement. This takes the digits in certain positions of the original value, such as the third and sixth numbers, and reverses their order. It then uses the number left over as the hashed value.
It is nearly impossible to determine the original number based on a hashed value, unless the algorithm that was used is known.
How is a hash table stored in memory?
A hash table is stored in the memory as an array of buckets. Each bucket has a unique hashcode. each entry (key value pair) will store in a bucket based on the hash value (equal to the hashcode of the target bucket) calculated based on the key. Shows in the picture below.
![630px-Hash_table_3_1_1_0_1_0_0_SP](https://churong.wordpress.com/wp-content/uploads/2016/04/630px-hash_table_3_1_1_0_1_0_0_sp.png?w=676)
Why there is a collision in hash table?
Because the hash value is representative of the original string of characters, but the range of hash values is normally smaller than the original. values after hashing function may result in a same value. Then a collision happens.
How to deal with collision?
When a collision happens in a bucket, we can chain the entries as a linked list, as showed in the picture above. There are also many other data structures that can be used to chain entries in a bucket. For example, we can use balanced binary tree. In this case, the time complexity will be improved, but the space complexity will be increased.
The size of the bucket in a hash table is pre-defined, which decides how many entries that a bucket can hold. if the array becomes too full, we will just resizing the array. There many resizing ways. One is to double the size of array. recompute the hashcodes for existing keys (due to the size of array is changed), copy them to the new array, delete the old one.
This operation is not constant time, but rather linear in the number of elements at the time the table is grown.
What is the Time and Space complexity for using hash table?
As for linked list chaining method.
Time complexity:
average is O(1).
Because in most cases, the collision would not happen frequently, we can get the value for get the hascode of its key in constant time. We say that the insertion operation has O(1) amortized run time because the time required to insert an element is O(1) on average, even though some elements trigger a lengthy rehashing of all the elements of the hash table.
worst would be O(n).
Because, in worst case, all entries are stored in a bucket which means every time a new entry create will have a collision happen. In this case, to access or search a entry in hash table would be the same as linked list.
Space complexity:
O(n).
Because the size of space will grow linearly as the number of entries increasing.