A faster way of combining and sorting large lists.

A few months ago I stumbled upon an issue whilst adding lists of names for staff members to Game Dev Manager.

The issue was that the game started to take a very long time initialising.

After a bit of code profiling we located two lines of code that were killing performance. List<string>.Contains() and List<string>.Sort()

The Contains() method was called several thousand times which was taking somewhere in the region of about 5-10 seconds per list.

And since we have three lists, one for male, female and surnames, the time taken to initialise the game became unbearable.

We were loading the names from mods into a List<string> object, then use the List.Contains() method to check if the name being added to the list was already present.

After the lists were combined we then used the List.Sort() method, again called three times, once for each list

Big O ComplexityAfter a bit of thinking, I decided that the best solution for the problem would be to use a HashSet<string> in conjunction with a List. My reasoning was this:

HashSet is a collection that contains no duplicate elements, and whose elements are in no particular order and is designed to perform lookups in O(1) time.

O(1) means that, no matter how much data, it will execute in constant time. The amount of time to execute one operation (illustrated).

Whereas List<string>.Contains() is an O(n) operation. This means that for large enough input sizes the running time increases linearly with the size of the input.

By using the HashSet to check if a name had already been added, I could add it to the main list if it was not found, decreasing times dramatically.

However, I could still not avoid calling List<string>.Sort() for each list, however by changing to HashSets reduced initialisation time to just a few milliseconds instead of 30 seconds. A massive reduction, I’m sure you’ll agree.

 

You may also like...

Leave a Reply