C#: Get the Most Common (Mode) Element from a Collection


I was surprised today when I couldn’t find a built in Mode extension method that retrieved the most common or most frequently occurring element in a collection so I set out to write my own. My first approach was to use LINQ to Objects and just group the collection by the individual items, order each group descending by their count, and then return the first element as follows.

public static T Mode<T>(this IEnumerable<T> list)
{
    // Null testing
    if (list == null || list.Count() == 0)
        return default(T);

    return (from item in list
            group item by item into g
            orderby g.Count() descending
            select g.Key).First();
}

The above method works great but it left a bad taste in my mouth as ordering the grouped items using LINQ took O(n log n) time and I knew there had to be a way to solve this problem in O(n) time. So, I dropped the LINQ approach and went back to the good ol’ loop. The following method simply traverses the collection and adds an entry for each unique element in the collection into a Dictionary. The value of each Dictionary entry is incremented every time an element is encountered. Then we traverse the Dictionary to find the element with highest value.

/// <summary>
/// Gets the element that occurs most frequently in the collection.
/// </summary>
/// <param name="list"></param>
/// <returns>Returns the element that occurs most frequently in the collection.
/// If all elements occur an equal number of times, a random element in
/// the collection will be returned.</returns>
public static T Mode<T>(this IEnumerable<T> list)
{
    // Initialize the return value
    T mode = default(T);

    // Test for a null reference and an empty list
    if (list != null && list.Count() > 0)
    {
        // Store the number of occurences for each element
        Dictionary<T, int> counts = new Dictionary<T, int>();

        // Add one to the count for the occurence of a character
        foreach (T element in list)
        {
            if (counts.ContainsKey(element))
                counts[element]++;
            else
                counts.Add(element, 1);
        }

        // Loop through the counts of each element and find the 
        // element that occurred most often
        int max = 0;

        foreach (KeyValuePair<T, int> count in counts)
        {
            if (count.Value > max)
            {
                // Update the mode
                mode = count.Key;
                max = count.Value;
            }
        }
    }

    return mode;
}

Usage

List<int> ints = new List<int>() { 1, 2, 6, 3, 6, 7, 3, 6, 8, 4, 2, 1, 7, 6 };
int mode = ints.Mode();

Console.WriteLine(mode);

// Output
// ------
// 6

After some performance testing on large collections, the second method proved to take about half the time compared to that of the LINQ method. Also note that both methods are created using generic types so this extension method can be used on any collection of any type.

C#: Optimized Extension Method – Get a Random Element from a Collection


In my previous post I created an extension method to select a random element from a generic collection and it works great but I wanted to see if there was any way to optimize this method for better performance. Since my extension method uses the LINQ extension method ElementAt(), I first looked at the .NET code for System.Linq.Enumerable.ElementAt() and found the following:

public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index)
{
    TSource current;
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    IList<TSource> list = source as IList<TSource>;
    if (list != null)
    {
        return list[index];
    }
    if (index < 0)
    {
        throw Error.ArgumentOutOfRange("index");
    }
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
    Label_0036:
        if (!enumerator.MoveNext())
        {
            throw Error.ArgumentOutOfRange("index");
        }
        if (index == 0)
        {
            current = enumerator.Current;
        }
        else
        {
            index--;
            goto Label_0036;
        }
    }
    return current;
}

It first tries to cast the source collection to an IList and if that is successful, it indexes into the collection to the specified index. If that fails, it simply travels down the source collection using the Enumerator.MoveNext() method until it reaches the desired index. After seeing this my first thought was to see which of the two methods, casting to an IList and indexing or traveling down the collection one element at a time, was faster. I figured since the .NET Framework first attempts to cast the collection to an IList, that would be the fastest method.

Here are the two methods I tested, the first utilizing my original method of just calling the ElementAt() method and the second using an Enumerator on the collection and traversing the collection until the desired element is reached.

public static T GetRandomElement<T>(this IEnumerable<T> list)
{           
    if (list == null)
        throw new ArgumentNullException("list");

    // If there are no elements in the collection, return the default value of T
    if (list.Count() == 0)
        return default(T);
   
    return list.ElementAt(random.Next(list.Count()));
}

public static T GetRandomElement1<T>(this IEnumerable<T> list)
{
    if (list == null)
        throw new ArgumentNullException("list");

    // Get the number of elements in the collection
    int count = list.Count();

    // If there are no elements in the collection, return the default value of T
    if (count == 0)
        return default(T);

    // Get a random index
    int index = random.Next(list.Count());

    using (IEnumerator<T> enumerator = list.GetEnumerator())
    {
        // Move down the collection one element at a time.
        // When index is -1 we are at the random element location
        while (index >= 0 && enumerator.MoveNext())
            index--;

        // Return the current element
        return enumerator.Current;
    }          
}     

To perform the test I used the following code which generates arrays of strings with 1 to 500 elements each and then uses each of the two GetRandomElement() methods to select random elements from the array. Before I call each method I generate a new array of strings to prevent false performance enhancement due to caching. I performed the test twice; the first time selecting 2,000 random elements and the second time selecting 20,000 random elements. (Note this test uses the RandomText class in my previous post to generate the random strings.)

public static void Test1()
{
    RandomText random = new RandomText();

    StringBuilder results = new StringBuilder();
    results.AppendLine("List Length,Method One, Method Two, Difference");

    Stopwatch sw = new Stopwatch();

    // Test on lists of length 1 to 500
    for (int listLength = 1; listLength <= 500; listLength++)
    {             
        Console.WriteLine("Current list length: " + listLength);
        long method1 = 0;
        long method2 = 0;

        // Get an array of random strings with a length of the current listLength                
        string[] strings = random.Sentance(listLength).Split(' ');

        sw.Reset();
        sw.Start();

        // Get a random element 20,000 times from the array using method 1                
        for (int i = 0; i < 20000; i++)
        {
            string s = strings.GetRandomElement<string>();
        }

        sw.Stop();

        // Record the time required for method 1
        method1 = sw.ElapsedMilliseconds;

        // Get a different array of random strings to ensure we don't
        // get performance enhancement due to caching
        strings = random.Sentance(listLength).Split(' ');

        sw.Reset();
        sw.Start();

        // Get a random element 20,000 times from the array using method 2
        for (int i = 0; i < 20000; i++)
        {
            string s = strings.GetRandomElement1<string>();
        }

        sw.Stop();

        // Record the time required for method 2
        method2 = sw.ElapsedMilliseconds;

        // Store the results
        results.AppendLine(listLength + "," + method1 + "," + method2 + "," + (method1 - method2));
    }

    StreamWriter writer = new StreamWriter(@"C:\Temp\RandomStats.csv");
    writer.Write(results.ToString());
    writer.Close();

    Console.WriteLine();
    Console.WriteLine("Press enter to close...");

    Console.ReadLine();
}

Here are the summarized results:

It turns out that in both tests, when the collection of strings has between 1 and 100 elements, the second method using the Enumerator actually performs better! I repeated the test multiple times and it produced the same results. I even swapped which method got called first just to ensure a fair test and as well similar results were produced.

After my first test I decided to create a new GetRandomElement() method that took advantage of this new information. The new method first checks the length of the collection and if it contains 100 or less elements, it gets a random element using the Enumerator.MoveNext() method, otherwise it just calls the Enumerable.ElementAt() method. Here is the code:

public static T GetRandomElement<T>(this IEnumerable<T> list)
{
    if (list == null)
        throw new ArgumentNullException("list");

    // Get the number of elements in the collection
    int count = list.Count();

    // If there are no elements in the collection, return the default value of T
    if (count == 0)
        return default(T);

    // Get a random index
    int index = random.Next(list.Count());

    // When the collection has 100 elements or less, get the random element
    // by traversing the collection one element at a time.
    if (count <= 100)
    {                
        using (IEnumerator<T> enumerator = list.GetEnumerator())
        {
            // Move down the collection one element at a time.
            // When index is -1 we are at the random element location
            while (index >= 0 && enumerator.MoveNext())
                index--;

            // Return the current element
            return enumerator.Current;
        }
    }
    
    // Get an element using LINQ which casts the collection
    // to an IList and indexes into it.
    return list.ElementAt(index);            
}       

I then wanted to check the performance of my original method against that of the new one. I’ll spare you the code as it is quite similar to the code above but instead of simply incrementing the collection size on each iteration from 1 to 500, on each iteration I created a collection of a random size between 1 and 200 elements. My reasoning for choosing the range of 1 to 200 elements is that my new method makes its decision on which way it generates the random element by checking the collection size and the breaking point is at 100 elements. Thus, generating a collection of random size between 1 and 200 allows for an equal opportunity for each method of generating the random element. Again I performed this test selecting both 2,000 and 20,000 random elements from each collection using my original method and then my new method. Here are the results:

In both cases the time required to generate a random element was decreased using my new optimized method. When selecting 2,000 random elements, the new method completed on average 0.338 ms quicker and completed 169 ms faster over all. When selecting 20,000 random elements, the new method completed on average 2.312 ms quicker and completed 1156 ms faster over all.

Seeing the improvement in performance as the number of random elements selected increases, I decided to perform the same test but selecting 200,000 elements on each iteration. Here are the results:

The new method completed on average 32.346 ms quicker and completed 16.172 seconds faster over all.

True the majority of the time we are only talking about an improvement of a matter of milliseconds here but nonetheless those add up if you are performing a large number of operations.

For those that are interested, the raw test result data can be found on Google Docs here.

C#: Extension Method – Get a Random Element from a Collection


In my previous post I created a class to generate random text. Throughout the code I had to get a random element from various arrays of characters and integers. The class was riddled with the repeated code that generated a random number, from 0 to the length of the array minus one, which was then used to index into the array and select an element. Something like this:

int numberOfWords = possibleSentanceLengths[RandomNumber(0, possibleSentanceLengths.Length)];

I got to thinking and figured this was the perfect situation to use an extension method so that I could just call a GetRandomElement on any array of elements. Here is the method I came up with:

public static class Extensions
{
    private static Random random = new Random();
    
    public static T GetRandomElement<T>(this IEnumerable<T> list)
    {
        // If there are no elements in the collection, return the default value of T
        if (list.Count() == 0)
            return default(T);

        return list.ElementAt(random.Next(list.Count()));
    }
}

Note that the instance of the Random class is created outside the method GetRandomElement. If we were to create a new Random object each time the method was called and we were to call the method in a loop, the element returned would be the same every time as the Random class is seeded using the current time. Having the class created outside the scope of the method ensures a true psuedorandom number on each call to the Next method.

This nice thing about this extension method is that it is generic so it can be called on any collection that implements the IEnumerable interface including arrays, Lists, HashSets, Dictionaries, you name it. Here is the original code using our new extension method.

int numberOfWords = possibleSentanceLengths.GetRandomElement<int>();

I was bored with using the .NET provided Random class and tried to think of other ways that a random number could be generated. My first thought was to use a Guid as those are completely unique (up to a point as there are only so many possible 32 character Guids that can be generated). Creating a new Guid, generating its hash code and then using the mod operator allows us to index into the collection at a random location.

public static class Extensions
{  
    public static T GetRandomElement<T>(this IEnumerable<T> list)
    {
        // If there are no elements in the collection, return the default value of T
        if (list.Count() == 0)
            return default(T);

        // Guids as well as the hash code for a guid will be unique and thus random        
        int hashCode = Math.Abs(Guid.NewGuid().GetHashCode());
        return list.ElementAt(hashCode % list.Count());
    }
}

Any other ideas on how to generate a random number other than the two mentioned?

C#: Most Recently Used List


I wanted to add a button to one of the applications that I wrote that listed the last ten reports that a user has viewed so they could quickly and easily access recently viewed reports. I found a lot of information about how to create a Most Recently Used (MRU) list of files that have been opened but I couldn’t find anything built into the .NET Framework to create a custom list of objects where the most recently used item was at the beginning of the list.

I’m sure many of you are saying to yourself, “Why didn’t he just use a stack?”. The reason I didn’t use the already built Stack class is because I needed to implement two other features:

  • Restrict the size of the set
  • Ensure no duplicate items exist in the set

The Stack class found in the System.Collections namespace doesn’t enforce these two properties inherently and thus the need for a custom collection.

Note: I could have optimized this collection a bit more by adding each item to the end of the list and keeping a pointer to the location of the true first item but given the fact that I only will be storing a few items in the set, I didn’t take the time to implement it that way. I leave that optimization to you.

/// <summary>
/// Stores a set of items with the most recently added at item the front and the 
/// least recently added item at the end
/// </summary>
public class RecentSet<T> : IEnumerable<T>
{
    private List<T> _list;
    private int _size = -1;

    /// <summary>
    /// Creates a new RecentSet object.
    /// </summary>
    public RecentSet()
    {
        _list = new List<T>();            
    }
     /// <summary>
    /// Creates a new RecentSet object with a fixed size. The return set may be smaller than
    /// the specified size but it will never be larger
    /// </summary>
    /// <param name="size">The maximum size of the set</param>
    public RecentSet(int size)
    {
        _list = new List<T>();
        _size = size;
    }
 
    /// <summary>
    /// Creates a new RecentSet object initializing it with the indicated items. Note: 
    /// the initialized RecentSet will be in the order of parameter items.  If items are {1, 2, 3, 4},
    /// iterating through RecentSet will result in a list of {1, 2, 3, 4} not {4, 3, 2, 1}        
    /// </summary>
    public RecentSet(IEnumerable<T> items)
    {
        _list = items.ToList();
    }

    /// <summary>
    /// Creates a new RecentSet object with a fixed size initializing it with the indicated items. Note: 
    /// the initialized RecentSet will be in the order of parameter items.  If items are {1, 2, 3, 4},
    /// iterating through RecentSet will result in a list of {1, 2, 3, 4} not {4, 3, 2, 1}        
    /// </summary>
    public RecentSet(int size, IEnumerable<T> items)
    {
        _list = items.ToList();
        _size = size;

        TrimList();
    }

    /// <summary>
    /// Adds an item to the RecentSet
    /// </summary>
    public void Add(T item)
    {
        // If the item is already in the set, remove it
        int i = _list.IndexOf(item);
        if (i > -1)
            _list.RemoveAt(i);

        // Add the item to the front of the list.
        _list.Insert(0, item);

        TrimList();
    }

    public int Count
    {
        get { return _list.Count; }
    }

    private void TrimList()
    {
        // If there is a set size, make sure the set only contains that many elements
        if (_size != -1)
            while (_list.Count > _size)
                _list.RemoveAt(_list.Count - 1);
    }

    /// <summary>
    /// Returns the set in the form of a List
    /// </summary>
    public List<T> ToList()
    {
        return _list;
    }

    #region IEnumerable<T> Members

     public IEnumerator<T> GetEnumerator()
    {           
        return _list.GetEnumerator();
    }

    #endregion

    #region IEnumerable Members

    System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator()
    {
        return _list.GetEnumerator();
    }

    #endregion
}
Follow

Get every new post delivered to your Inbox.

Join 66 other followers