LINQ to SQL: XElement Not Being Updated in the Database


If you are using LINQ to SQL to manage the interaction between your database and your application, you will notice that any columns you have defined as an XML SQL data type are represented using the XElement class in the generated classes. This makes working with XML data stored in a database very easy and convenient. But, today I ran into a problem while trying to update the XML stored in the database. My code followed the logic shown below.

// Get a report for the database
Report report = (from r in db.Reports
                 where r.Id == id
                 select r).First();

// The Report.Definition property is a representation of an XML column
// Get the first 'Group' element and add a new SubGroup child
XElement element = report.Definition.Elements("Group").First();
element.Add(new XElement("SubGroup", value));

// Commit the changes to the database
db.SubmitChanges();

The problem was that the changes made to the XML found in the Definition property were committed to the DataContext but not to the database.

After some research it appears that when SubmitChanges is called on the DataContext, it checks the entities it has retrieved to see if any changes were made. To do this, the DataContext must cache the original values it retrieved from the database. The problem with the code above is that when you access and change the report.Definition property directly, you are changing the value cached by the DataContext. Thus, when you call the SubmitChanges method and comparisons are made to the cached value, it appears nothing has changed as you edited the cached value. To solve this, do the following

// Get a report for the database
Report report = (from r in db.Reports
                 where r.Id == id
                 select r).First();

// The Report.Definition property is a representation of an XML column
// Get the first 'Group' element and add a new SubGroup child
XElement element = report.Definition.Elements("Group").First();
element.Add(new XElement("SubGroup", value));

// Force a reference inequality
report.Definition = new XElement(report.Definition);

// Commit the changes to the database
db.SubmitChanges();

Now when the DataContext checks to see if anything has changed, there will be a reference inequality between the original Report.Definition and the new XElement you assigned to Report.Definition. Unfortunately the LINQ to SQL classes don’t handle the XElement.Changed event and flag that something has changed.

Posted in LINQ, SQL. Tags: , . 1 Comment »

LINQ to XML: Convert a List into a Set of XElements


I ran into the situation today where I had a List of integer Ids that I needed to turn into an XML string. I thought about using the XmlWriter class but felt it was a bit overkill as it was a very simple XML string I was trying to generate. My next thought was to just iterate through the Ids and generate the XML manually, something like the following:

List<int> ids = new List<int>() { 1, 2, 3, 4, 5 };

StringBuilder xml = new StringBuilder();
xml.Append("<Ids>");

foreach(int id in ids)
     xml.Append("<Id>" + id + "</Id>");

xml.Append("</Ids>");

Console.Write(xml.ToString());

// Result
// ------
// <Ids>
//     <Id>1</Id>
//     <Id>2</Id>
//     <Id>3</Id>
//     <Id>4</Id>
//     <Id>5</Id>
// </Ids>

While the above solution worked, I was wondering if there was something in the LINQ to XML classes that I could use to perform this operation in a more compact way. After some research it turns out there is a very elegant way of doing this. The single statement below creates a new XElement object with a single node named Ids and as that node’s content is passed a collection of XElement objects, one for each Id in the list.

List<int> ids = new List<int>() { 1, 2, 3, 4, 5 };

XElement xmlIds = new XElement("Ids", ids.Select(i => new XElement("Id", i)));

Console.Write(xmlIds.ToString());

// Result
// ------
// <Ids>
//     <Id>1</Id>
//     <Id>2</Id>
//     <Id>3</Id>
//     <Id>4</Id>
//     <Id>5</Id>
// </Ids>

This technique is very useful if you are using a LINQ to SQL DataContext which contains a stored procedure that accepts an XML string as a parameter as when calling that stored procedure from your DataContext, you are required to pass in an XElement not just an XML string.

Posted in LINQ. Tags: , . 2 Comments »

C#: Get the Most Common (Mode) Element from a Collection


I was surprised today when I couldn’t find a built in Mode extension method that retrieved the most common or most frequently occurring element in a collection so I set out to write my own. My first approach was to use LINQ to Objects and just group the collection by the individual items, order each group descending by their count, and then return the first element as follows.

public static T Mode<T>(this IEnumerable<T> list)
{
    // Null testing
    if (list == null || list.Count() == 0)
        return default(T);

    return (from item in list
            group item by item into g
            orderby g.Count() descending
            select g.Key).First();
}

The above method works great but it left a bad taste in my mouth as ordering the grouped items using LINQ took O(n log n) time and I knew there had to be a way to solve this problem in O(n) time. So, I dropped the LINQ approach and went back to the good ol’ loop. The following method simply traverses the collection and adds an entry for each unique element in the collection into a Dictionary. The value of each Dictionary entry is incremented every time an element is encountered. Then we traverse the Dictionary to find the element with highest value.

/// <summary>
/// Gets the element that occurs most frequently in the collection.
/// </summary>
/// <param name="list"></param>
/// <returns>Returns the element that occurs most frequently in the collection.
/// If all elements occur an equal number of times, a random element in
/// the collection will be returned.</returns>
public static T Mode<T>(this IEnumerable<T> list)
{
    // Initialize the return value
    T mode = default(T);

    // Test for a null reference and an empty list
    if (list != null && list.Count() > 0)
    {
        // Store the number of occurences for each element
        Dictionary<T, int> counts = new Dictionary<T, int>();

        // Add one to the count for the occurence of a character
        foreach (T element in list)
        {
            if (counts.ContainsKey(element))
                counts[element]++;
            else
                counts.Add(element, 1);
        }

        // Loop through the counts of each element and find the 
        // element that occurred most often
        int max = 0;

        foreach (KeyValuePair<T, int> count in counts)
        {
            if (count.Value > max)
            {
                // Update the mode
                mode = count.Key;
                max = count.Value;
            }
        }
    }

    return mode;
}

Usage

List<int> ints = new List<int>() { 1, 2, 6, 3, 6, 7, 3, 6, 8, 4, 2, 1, 7, 6 };
int mode = ints.Mode();

Console.WriteLine(mode);

// Output
// ------
// 6

After some performance testing on large collections, the second method proved to take about half the time compared to that of the LINQ method. Also note that both methods are created using generic types so this extension method can be used on any collection of any type.

C#: String Concatenation using StringBuilder and LINQ Aggregate Function


I always forget how to do this so I thought I would post it mainly for my reference.

List<string> nameList = new List<string>() { "Roger", "Clark", "Wonda", "Anita" };

// A useless Select call in this instance just to remind you
// that you coulde use LINQ to select a set of strings
// from a list of objects
StringBuilder names = nameList.Select(n => n)
                              .Aggregate(new StringBuilder(), (current, next) => current.Append(next).Append(", "));

// Remove the trailing comma and space
if (names.Length > 1)
    names.Remove(names.Length - 2, 2);

Console.WriteLine(names.ToString());

// Output
// ------
// Roger, Clark, Wonda, Anita

LINQ: Selecting Multiple Properties from a List of Objects


In a previous post I discussed how to use LINQ to flatten a list of lists. Yesterday and ran across a slightly different situation but that used the same solution.

I had a list of objects that each contained three separate integer values and I wanted to extract the three values from each object and put them all in a single list. So, all you need to do is create a new List that has each of the three values in each object as elements of the list and then use the SelectMany method to flatten the list of lists we just created.

private class Foo
{
    public int Item1;
    public int Item2;
    public int Item3;
}

static void Main(string[] args)
{
    List<Foo> foos = new List<Foo> 
                           { 
                               new Foo() { Item1 = 1, Item2 = 2, Item3 = 3 },
                               new Foo() { Item1 = 4, Item2 = 5, Item3 = 6 },
                               new Foo() { Item1 = 7, Item2 = 8, Item3 = 9 }
                           };

    // Create a list of lists where each list has three elements corresponding to 
    // the values stored in Item1, Item2, and Item3.  Then use SelectMany
    // to flatten the list of lists.
    var items = foos.Select(f => new List<int>() { f.Item1, f.Item2, f.Item3 }).SelectMany(item => item).Distinct();

    foreach (int item in items)
        Console.WriteLine(item.ToString());
    
    Console.ReadLine();
}

// Output
// ------
// 1
// 2
// 3
// 4
// 5
// 6
// 7
// 8
// 9
Posted in LINQ. Tags: , . 2 Comments »

C#: Optimized Extension Method – Get a Random Element from a Collection


In my previous post I created an extension method to select a random element from a generic collection and it works great but I wanted to see if there was any way to optimize this method for better performance. Since my extension method uses the LINQ extension method ElementAt(), I first looked at the .NET code for System.Linq.Enumerable.ElementAt() and found the following:

public static TSource ElementAt<TSource>(this IEnumerable<TSource> source, int index)
{
    TSource current;
    if (source == null)
    {
        throw Error.ArgumentNull("source");
    }
    IList<TSource> list = source as IList<TSource>;
    if (list != null)
    {
        return list[index];
    }
    if (index < 0)
    {
        throw Error.ArgumentOutOfRange("index");
    }
    using (IEnumerator<TSource> enumerator = source.GetEnumerator())
    {
    Label_0036:
        if (!enumerator.MoveNext())
        {
            throw Error.ArgumentOutOfRange("index");
        }
        if (index == 0)
        {
            current = enumerator.Current;
        }
        else
        {
            index--;
            goto Label_0036;
        }
    }
    return current;
}

It first tries to cast the source collection to an IList and if that is successful, it indexes into the collection to the specified index. If that fails, it simply travels down the source collection using the Enumerator.MoveNext() method until it reaches the desired index. After seeing this my first thought was to see which of the two methods, casting to an IList and indexing or traveling down the collection one element at a time, was faster. I figured since the .NET Framework first attempts to cast the collection to an IList, that would be the fastest method.

Here are the two methods I tested, the first utilizing my original method of just calling the ElementAt() method and the second using an Enumerator on the collection and traversing the collection until the desired element is reached.

public static T GetRandomElement<T>(this IEnumerable<T> list)
{           
    if (list == null)
        throw new ArgumentNullException("list");

    // If there are no elements in the collection, return the default value of T
    if (list.Count() == 0)
        return default(T);
   
    return list.ElementAt(random.Next(list.Count()));
}

public static T GetRandomElement1<T>(this IEnumerable<T> list)
{
    if (list == null)
        throw new ArgumentNullException("list");

    // Get the number of elements in the collection
    int count = list.Count();

    // If there are no elements in the collection, return the default value of T
    if (count == 0)
        return default(T);

    // Get a random index
    int index = random.Next(list.Count());

    using (IEnumerator<T> enumerator = list.GetEnumerator())
    {
        // Move down the collection one element at a time.
        // When index is -1 we are at the random element location
        while (index >= 0 && enumerator.MoveNext())
            index--;

        // Return the current element
        return enumerator.Current;
    }          
}     

To perform the test I used the following code which generates arrays of strings with 1 to 500 elements each and then uses each of the two GetRandomElement() methods to select random elements from the array. Before I call each method I generate a new array of strings to prevent false performance enhancement due to caching. I performed the test twice; the first time selecting 2,000 random elements and the second time selecting 20,000 random elements. (Note this test uses the RandomText class in my previous post to generate the random strings.)

public static void Test1()
{
    RandomText random = new RandomText();

    StringBuilder results = new StringBuilder();
    results.AppendLine("List Length,Method One, Method Two, Difference");

    Stopwatch sw = new Stopwatch();

    // Test on lists of length 1 to 500
    for (int listLength = 1; listLength <= 500; listLength++)
    {             
        Console.WriteLine("Current list length: " + listLength);
        long method1 = 0;
        long method2 = 0;

        // Get an array of random strings with a length of the current listLength                
        string[] strings = random.Sentance(listLength).Split(' ');

        sw.Reset();
        sw.Start();

        // Get a random element 20,000 times from the array using method 1                
        for (int i = 0; i < 20000; i++)
        {
            string s = strings.GetRandomElement<string>();
        }

        sw.Stop();

        // Record the time required for method 1
        method1 = sw.ElapsedMilliseconds;

        // Get a different array of random strings to ensure we don't
        // get performance enhancement due to caching
        strings = random.Sentance(listLength).Split(' ');

        sw.Reset();
        sw.Start();

        // Get a random element 20,000 times from the array using method 2
        for (int i = 0; i < 20000; i++)
        {
            string s = strings.GetRandomElement1<string>();
        }

        sw.Stop();

        // Record the time required for method 2
        method2 = sw.ElapsedMilliseconds;

        // Store the results
        results.AppendLine(listLength + "," + method1 + "," + method2 + "," + (method1 - method2));
    }

    StreamWriter writer = new StreamWriter(@"C:\Temp\RandomStats.csv");
    writer.Write(results.ToString());
    writer.Close();

    Console.WriteLine();
    Console.WriteLine("Press enter to close...");

    Console.ReadLine();
}

Here are the summarized results:

It turns out that in both tests, when the collection of strings has between 1 and 100 elements, the second method using the Enumerator actually performs better! I repeated the test multiple times and it produced the same results. I even swapped which method got called first just to ensure a fair test and as well similar results were produced.

After my first test I decided to create a new GetRandomElement() method that took advantage of this new information. The new method first checks the length of the collection and if it contains 100 or less elements, it gets a random element using the Enumerator.MoveNext() method, otherwise it just calls the Enumerable.ElementAt() method. Here is the code:

public static T GetRandomElement<T>(this IEnumerable<T> list)
{
    if (list == null)
        throw new ArgumentNullException("list");

    // Get the number of elements in the collection
    int count = list.Count();

    // If there are no elements in the collection, return the default value of T
    if (count == 0)
        return default(T);

    // Get a random index
    int index = random.Next(list.Count());

    // When the collection has 100 elements or less, get the random element
    // by traversing the collection one element at a time.
    if (count <= 100)
    {                
        using (IEnumerator<T> enumerator = list.GetEnumerator())
        {
            // Move down the collection one element at a time.
            // When index is -1 we are at the random element location
            while (index >= 0 && enumerator.MoveNext())
                index--;

            // Return the current element
            return enumerator.Current;
        }
    }
    
    // Get an element using LINQ which casts the collection
    // to an IList and indexes into it.
    return list.ElementAt(index);            
}       

I then wanted to check the performance of my original method against that of the new one. I’ll spare you the code as it is quite similar to the code above but instead of simply incrementing the collection size on each iteration from 1 to 500, on each iteration I created a collection of a random size between 1 and 200 elements. My reasoning for choosing the range of 1 to 200 elements is that my new method makes its decision on which way it generates the random element by checking the collection size and the breaking point is at 100 elements. Thus, generating a collection of random size between 1 and 200 allows for an equal opportunity for each method of generating the random element. Again I performed this test selecting both 2,000 and 20,000 random elements from each collection using my original method and then my new method. Here are the results:

In both cases the time required to generate a random element was decreased using my new optimized method. When selecting 2,000 random elements, the new method completed on average 0.338 ms quicker and completed 169 ms faster over all. When selecting 20,000 random elements, the new method completed on average 2.312 ms quicker and completed 1156 ms faster over all.

Seeing the improvement in performance as the number of random elements selected increases, I decided to perform the same test but selecting 200,000 elements on each iteration. Here are the results:

The new method completed on average 32.346 ms quicker and completed 16.172 seconds faster over all.

True the majority of the time we are only talking about an improvement of a matter of milliseconds here but nonetheless those add up if you are performing a large number of operations.

For those that are interested, the raw test result data can be found on Google Docs here.

LINQ: Flatten a List of Lists


Today I needed to flatten a list of lists and really didn’t want write out a nested loop to iterate through each element in each list. After some searching I found that LINQ provides a very elegant solution to this problem. LINQ is a powerful set of extensions that were added to the .NET Framework that allows you to query and manipulate sets of data. The SelectMany extension method solves this problem very easily as shown below.

List<List<string>> listOfLists = new List<List<string>>();
listOfLists.Add(new List<string>() { "a", "b", "c" });
listOfLists.Add(new List<string>() { "d", "e", "f" });
listOfLists.Add(new List<string>() { "g", "h", "i" });

var flattenedList = listOfLists.SelectMany(x => x);

foreach (string s in flattenedList)
     Console.Write(s + " ");

// Output
// -----
// a b c d e f g h i

Justin Etheridge has a great blog post with awesome diagrams of how the SelectMany extension method works. Check it out for more information.

Posted in LINQ. Tags: , . 3 Comments »
Follow

Get every new post delivered to your Inbox.

Join 69 other followers