C#: Optimized Extension Method – Get a Random Element from a Collection


In my previous post I created an extension method to select a random element from a generic collection and it works great but I wanted to see if there was any way to optimize this method for better performance. Since my extension method uses the LINQ extension method ElementAt(), I first looked at the .NET code for System.Linq.Enumerable.ElementAt() and found the following:

public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index)
{
    TSource current;
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    IList<TSource> list = source as IList<TSource>;
    if (list != null)
    {
        return list[index];
    }
    if (index < 0)
    {
        throw Error.ArgumentOutOfRange("index");
    }
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
    Label_0036:
        if (!enumerator.MoveNext())
        {
            throw Error.ArgumentOutOfRange("index");
        }
        if (index == 0)
        {
            current = enumerator.Current;
        }
        else
        {
            index--;
            goto Label_0036;
        }
    }
    return current;
}

It first tries to cast the source collection to an IList and if that is successful, it indexes into the collection to the specified index. If that fails, it simply travels down the source collection using the Enumerator.MoveNext() method until it reaches the desired index. After seeing this my first thought was to see which of the two methods, casting to an IList and indexing or traveling down the collection one element at a time, was faster. I figured since the .NET Framework first attempts to cast the collection to an IList, that would be the fastest method.

Here are the two methods I tested, the first utilizing my original method of just calling the ElementAt() method and the second using an Enumerator on the collection and traversing the collection until the desired element is reached.

public static T GetRandomElement<T>(this IEnumerable<T> list)
{           
    if (list == null)
        throw new ArgumentNullException("list");

    // If there are no elements in the collection, return the default value of T
    if (list.Count() == 0)
        return default(T);
   
    return list.ElementAt(random.Next(list.Count()));
}

public static T GetRandomElement1<T>(this IEnumerable<T> list)
{
    if (list == null)
        throw new ArgumentNullException("list");

    // Get the number of elements in the collection
    int count = list.Count();

    // If there are no elements in the collection, return the default value of T
    if (count == 0)
        return default(T);

    // Get a random index
    int index = random.Next(list.Count());

    using (IEnumerator<T> enumerator = list.GetEnumerator())
    {
        // Move down the collection one element at a time.
        // When index is -1 we are at the random element location
        while (index >= 0 && enumerator.MoveNext())
            index--;

        // Return the current element
        return enumerator.Current;
    }          
}     

To perform the test I used the following code which generates arrays of strings with 1 to 500 elements each and then uses each of the two GetRandomElement() methods to select random elements from the array. Before I call each method I generate a new array of strings to prevent false performance enhancement due to caching. I performed the test twice; the first time selecting 2,000 random elements and the second time selecting 20,000 random elements. (Note this test uses the RandomText class in my previous post to generate the random strings.)

public static void Test1()
{
    RandomText random = new RandomText();

    StringBuilder results = new StringBuilder();
    results.AppendLine("List Length,Method One, Method Two, Difference");

    Stopwatch sw = new Stopwatch();

    // Test on lists of length 1 to 500
    for (int listLength = 1; listLength <= 500; listLength++)
    {             
        Console.WriteLine("Current list length: " + listLength);
        long method1 = 0;
        long method2 = 0;

        // Get an array of random strings with a length of the current listLength                
        string[] strings = random.Sentance(listLength).Split(' ');

        sw.Reset();
        sw.Start();

        // Get a random element 20,000 times from the array using method 1                
        for (int i = 0; i < 20000; i++)
        {
            string s = strings.GetRandomElement<string>();
        }

        sw.Stop();

        // Record the time required for method 1
        method1 = sw.ElapsedMilliseconds;

        // Get a different array of random strings to ensure we don't
        // get performance enhancement due to caching
        strings = random.Sentance(listLength).Split(' ');

        sw.Reset();
        sw.Start();

        // Get a random element 20,000 times from the array using method 2
        for (int i = 0; i < 20000; i++)
        {
            string s = strings.GetRandomElement1<string>();
        }

        sw.Stop();

        // Record the time required for method 2
        method2 = sw.ElapsedMilliseconds;

        // Store the results
        results.AppendLine(listLength + "," + method1 + "," + method2 + "," + (method1 - method2));
    }

    StreamWriter writer = new StreamWriter(@"C:\Temp\RandomStats.csv");
    writer.Write(results.ToString());
    writer.Close();

    Console.WriteLine();
    Console.WriteLine("Press enter to close...");

    Console.ReadLine();
}

Here are the summarized results:

It turns out that in both tests, when the collection of strings has between 1 and 100 elements, the second method using the Enumerator actually performs better! I repeated the test multiple times and it produced the same results. I even swapped which method got called first just to ensure a fair test and as well similar results were produced.

After my first test I decided to create a new GetRandomElement() method that took advantage of this new information. The new method first checks the length of the collection and if it contains 100 or less elements, it gets a random element using the Enumerator.MoveNext() method, otherwise it just calls the Enumerable.ElementAt() method. Here is the code:

public static T GetRandomElement<T>(this IEnumerable<T> list)
{
    if (list == null)
        throw new ArgumentNullException("list");

    // Get the number of elements in the collection
    int count = list.Count();

    // If there are no elements in the collection, return the default value of T
    if (count == 0)
        return default(T);

    // Get a random index
    int index = random.Next(list.Count());

    // When the collection has 100 elements or less, get the random element
    // by traversing the collection one element at a time.
    if (count <= 100)
    {                
        using (IEnumerator<T> enumerator = list.GetEnumerator())
        {
            // Move down the collection one element at a time.
            // When index is -1 we are at the random element location
            while (index >= 0 && enumerator.MoveNext())
                index--;

            // Return the current element
            return enumerator.Current;
        }
    }
    
    // Get an element using LINQ which casts the collection
    // to an IList and indexes into it.
    return list.ElementAt(index);            
}       

I then wanted to check the performance of my original method against that of the new one. I’ll spare you the code as it is quite similar to the code above but instead of simply incrementing the collection size on each iteration from 1 to 500, on each iteration I created a collection of a random size between 1 and 200 elements. My reasoning for choosing the range of 1 to 200 elements is that my new method makes its decision on which way it generates the random element by checking the collection size and the breaking point is at 100 elements. Thus, generating a collection of random size between 1 and 200 allows for an equal opportunity for each method of generating the random element. Again I performed this test selecting both 2,000 and 20,000 random elements from each collection using my original method and then my new method. Here are the results:

In both cases the time required to generate a random element was decreased using my new optimized method. When selecting 2,000 random elements, the new method completed on average 0.338 ms quicker and completed 169 ms faster over all. When selecting 20,000 random elements, the new method completed on average 2.312 ms quicker and completed 1156 ms faster over all.

Seeing the improvement in performance as the number of random elements selected increases, I decided to perform the same test but selecting 200,000 elements on each iteration. Here are the results:

The new method completed on average 32.346 ms quicker and completed 16.172 seconds faster over all.

True the majority of the time we are only talking about an improvement of a matter of milliseconds here but nonetheless those add up if you are performing a large number of operations.

For those that are interested, the raw test result data can be found on Google Docs here.

C#: Extension Method – Get a Random Element from a Collection


In my previous post I created a class to generate random text. Throughout the code I had to get a random element from various arrays of characters and integers. The class was riddled with the repeated code that generated a random number, from 0 to the length of the array minus one, which was then used to index into the array and select an element. Something like this:

int numberOfWords = possibleSentanceLengths[RandomNumber(0, possibleSentanceLengths.Length)];

I got to thinking and figured this was the perfect situation to use an extension method so that I could just call a GetRandomElement on any array of elements. Here is the method I came up with:

public static class Extensions
{
    private static Random random = new Random();
    
    public static T GetRandomElement<T>(this IEnumerable<T> list)
    {
        // If there are no elements in the collection, return the default value of T
        if (list.Count() == 0)
            return default(T);

        return list.ElementAt(random.Next(list.Count()));
    }
}

Note that the instance of the Random class is created outside the method GetRandomElement. If we were to create a new Random object each time the method was called and we were to call the method in a loop, the element returned would be the same every time as the Random class is seeded using the current time. Having the class created outside the scope of the method ensures a true psuedorandom number on each call to the Next method.

This nice thing about this extension method is that it is generic so it can be called on any collection that implements the IEnumerable interface including arrays, Lists, HashSets, Dictionaries, you name it. Here is the original code using our new extension method.

int numberOfWords = possibleSentanceLengths.GetRandomElement<int>();

I was bored with using the .NET provided Random class and tried to think of other ways that a random number could be generated. My first thought was to use a Guid as those are completely unique (up to a point as there are only so many possible 32 character Guids that can be generated). Creating a new Guid, generating its hash code and then using the mod operator allows us to index into the collection at a random location.

public static class Extensions
{  
    public static T GetRandomElement<T>(this IEnumerable<T> list)
    {
        // If there are no elements in the collection, return the default value of T
        if (list.Count() == 0)
            return default(T);

        // Guids as well as the hash code for a guid will be unique and thus random        
        int hashCode = Math.Abs(Guid.NewGuid().GetHashCode());
        return list.ElementAt(hashCode % list.Count());
    }
}

Any other ideas on how to generate a random number other than the two mentioned?

C#: Random Text Generator


Whenever I need to generate random text I generally just Google ‘Random Text’ and pull up the first or second site in the list. This works great when you can copy and paste the text from your browser, but I ran across a situation where I needed to generate some random text from within one of my applications. So, I whipped up a small method that generated a random string of a given length and continued on my way. But, soon thereafter the inner programmer got the better of me and I just had to turn the simple method into an entire class that aided in random text generation.

The RandomText class that I created allows you to generate random strings (words), sentences, and paragraphs. Each method is overloaded with parameters that allow you to specify things like the possible word lengths that make up each sentence and the possible sentence lengths that make up each paragraph. Additionally there are methods that take no parameters and will generate a random string (word), sentence, or paragraph using default possible lengths. There are also options for adding punctuation to the sentences include middle of the sentence punctuation like commas and end of sentence punctuation like periods.

I have included the public properties and methods along with their documentation below. The actual source code can be found on codeplex here.

public class RandomText
{  
        /// <summary>
        /// Indicate whether or not to include random middle of the sentance punctuation
        /// marks in generated sentances
        /// </summary>
        public bool AddMiddleOfSentancePunctuationMarks = false;

        /// <summary>
        /// Indicates whether or not to add an end of sentance punctuation mark
        /// </summary>
        public bool AddEndOfSentancePunctuation = true;
      
        public RandomText()                

        /// <summary>
        /// Generates a random string of a specfic length.
        /// </summary>        
        /// <returns>Returns a randomly generated string (lower case) of a specific length.</returns>
        public string String()  

        /// <summary>
        /// Generates a random string of a specfic length.
        /// </summary>
        /// <param name="length">The length of the random string to generate.</param>        
        /// <returns>Returns a randomly generated string (lower case) of a specific length.</returns>
        public string String(int length)

        /// <summary>
        /// Generates a random string of a specfic length.
        /// </summary>
        /// <param name="length">The length of the random string to generate.</param>
        /// <param name="randomCharacterCase">If true, each character in the string will have
        /// an equal chance of being either upper case or lower case.  If false, the generated
        /// string will be all lower case.
        /// </param>
        /// <returns>Returns a randomly generated string of a specific length.</returns>
        public string String(int length, bool randomCharacterCase)

        /// <summary>
        /// Returns a random number within a specified range.
        /// </summary>
        /// <param name="minValue">The inclusive lower bound of the random number returned.</param>        
        /// <param name="maxValue">The exclusive upper bound of the random number returned. maxValue must be
        ///  greater than or equal to minValue.</param>               
        /// <returns>A 32-bit signed integer greater than or equal to minValue and less than maxValue;
        ///  that is, the range of return values includes minValue but not maxValue. If
        ///  minValue equals maxValue, minValue is returned.</returns>
        private int Number(int minValue, int maxValue)
		
        /// <summary>
        /// Generates a random sentance.
        /// </summary>        
        /// <returns>Returns a random sentance of random length and words from the default sentance and word lengths.</returns>
        public string Sentance()   

        /// <summary>
        /// Generates a random sentance of a given number of words .
        /// </summary>
        /// <param name="numberOfWords">The number of words in the sentance</param>
        /// /// <returns>Returns a random sentance of the specified length.</returns>
        public string Sentance(int numberOfWords)     

        /// <summary>
        /// Generates a random sentance of a given number of words and possible word lengths.
        /// </summary>
        /// <param name="numberOfWords">The number of words in the sentance</param>
        /// <param name="possibleWordLengths">An array of integers representing the possible number of characters in each word</param>
        /// <returns>Returns a string containing a specified number of random words composed of random characters</returns>
        public string Sentance(int numberOfWords, int[] possibleWordLengths)       

        /// <summary>
        /// Generates a random paragraph.
        /// </summary>
        public string Paragraph()        

        /// <summary>
        /// Generates a random paragraph of a given number of sentances.
        /// </summary>
        /// <param name="numberOfSentances">The number of sentances in the paragraph.</param>
        public string Paragraph(int numberOfSentances)     

        /// <summary>
        /// Generates a random paragraph of a given number of sentances.
        /// </summary>
        /// <param name="numberOfSentances">The number of sentances in the paragraph.</param>
        /// <param name="possibleSentanceLengths">An array of integers representing the possible number of words in each sentance.</param>
        public string Paragraph(int numberOfSentances, int[] possibleSentanceLengths)       

        /// <summary>
        /// Generates a random paragraph of a given number of sentances.
        /// </summary>
        /// <param name="numberOfSentances">The number of sentances in the paragraph.</param>
        /// <param name="possibleSentanceLengths">An array of integers representing the possible number of words in each sentance.</param>
        /// <param name="possibleWordLengths">An array of integers representing the possible number of characters in each word</param>
        /// <returns>Returns a string containing a specified number of random sentances composed of random words and characters</returns>
        public string Paragraph(int numberOfSentances, int[] possibleSentanceLengths, int[] possibleWordLengths)        
    }

Reverse DNS Lookup Using cmd.exe


Here is an awesome script that utilizes cmd.exe for reverse DNS lookup. It was posted on the Security Reliks blog by Skyler Onken.

WEAPONIZING CMD.EXE – DNS REVERSE LOOKUP

Enjoy!

C#: List.Contains Method – Case Insensitive


I had a list of strings and needed to check whether or not a specific string was contained within the list ignoring the character casing.  Pre-LINQ days you would have had to loop through the entries calling the .ToLower or ToUpper method on each of the elements or using the string.Compare method.  But thanks to LINQ, one simple method will take care of this for us:

List<string> list = new List<string>() { "a", "b", "c", "d", "e" };
string value = "A";

if (list.Contains(value, StringComparer.OrdinalIgnoreCase))
    Console.WriteLine(value + " is in the list!");
else
    Console.WriteLine(value + " is not in the list!");