C# FAQ: Why are hashtable lookups so slow with struct keys
Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio
| CSharp-Online.NET:FAQs |
| edit |
Why are hashtable lookups so slow with struct keys?
When struct objects are used as hashtable keys, the lookup operation for the hashtable performs horribly. The reason for the poor performance lies in the GetHashCode method which performs the lookup internally. When a struct contains only simple value types—int, short, etc.—, the GetHashCode algorithm which computes the hashes causes most of them to be stored in the same bucket. For example, assume a hashtable creates five buckets.
- bucket-1 - value-a, value-b, value-c, , value-n
- bucket-2 - (empty)
- bucket-3 - (empty)
- bucket-4 - (empty)
- bucket-5 - (empty)
Probably, all of the keys will be put into a single bucket. Therefore, when a lookup is performed, the .NET runtime will be forced to traverse the entire contents of that bucket to find the value. So, instead of the lookup operation being O(1), the average lookup case becomes O(n).
To alleviate this problem, override the GetHashCode method; so, the runtime can access the keys efficiently.
One way to address the problem is to create a key string by merging all the value types from a struct. Each type is seperated by a special delimiter character. Since the struct is a lookup criteria, it is certain that all struct values will be different; therefore, the generated string is guaranteed to be unique. And, since the string is derived from System.Object, the generated key string has a GetHashCode method which can be overridden.
Here is an example of overriding the GetHashCode method:
struct { int value-a; short value-b; public struct(int value-a, short value-b) { this.value-a = value-a; this.value-b = value-b; } public override int GetHashCode() { string hash = value-a.ToString() + ":" + value-b.ToString(); return hash.GetHashCode(); } }
Now, the generated hashcode is certain to be unique causing hash keys to be distributed across all buckets; so, lookup performance will be greatly improved.