Failed the Turing Test: Microsoft 70-487: Manipulate XML data structures

Exam Objectives

Read, filter, create, modify XML data structures; Manipulate XML data by using XMLReader, XMLWriter, XMLDocument, XPath, LINQ to XML; transform XML by using XSLT transformations

Quick Overview of Training Materials

Exam Ref 70-487 - Chapter 1.6

[MSDN] XML Documents and Data
[MSDN] XML
[MSDN] Understanding XML
[MSDN] XPath Reference
[MSDN] What is XSLT?
[MSDN] XSLT Reference
[Hanselminutes] XML Tools and Technologies
[Hanselminutes] LINQ to XML
A Really, Really, Really Good Introduction to XML
[PluralSight] XML Fundamentals
[PluralSight] XML Processing in .NET Applications
[W3C] XML 1.1 Recommendation (Spec)
[W3C] Namespaces in XML 1.0 3rd Ed. (Spec)
[W3C] XSLT Recommendation (Spec)
[StackOverflow] JSON and XML comparison
[YouTube] XML and XSLT Transformation Explained
[YouTube] Using XSLT to Transform Your XML

My code samples on GitHub

What is XML?

XML can be define a number of ways. The spec defines it as such:

The Extensible Markup Language (XML) is a subset of SGML... Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

The MSDN page for XML describes it like this:

Extensible Markup Language (XML) is the universal format for data on the Web. XML allows developers to easily describe and deliver rich, structured data from any application in a standard, consistent way. XML does not replace HTML; rather, it is a complementary format.

And Wikipedia describes XML like this:

Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable... The design goals of XML emphasize simplicity, generality, and usability across the Internet... Although the design of XML focuses on documents, the language is widely used for the representation of arbitrary data structures such as those used in web services.

These three definitions emphasize very different aspects of XML, but all are correct. XML is a subset of Standard Generalize Markup Language (just as HTML is a subset of SGML), which was designed to be easier for machines to parse than HTML ("well-formedness" is easier to determine), while still being human readable. While initially aimed at "documents" (Word documents are, in fact, a flavor of XML), the flexibility of the model was well suited to representing structured data (as in a relational database).

So in a way, "XML" actually exists, as a concept, at several different levels. There is the XML spec itself, with the rules and syntax for defining document formats and data models (e.g. RSS, Atom, SOAP, SVG...). There is the logical "document" that a particular instantiation of XML represents, what Dan Sullivan refers to as the "Logical Structure" in his PluralSight course on XML. Finally, there is the actual .xml file, which is really just a text file. XML documents are encoded as plain text, usually using UTF-8 or -16, but other encodings are possible, and XML has ways of making it easy for XML parsers to figure out which encoding to use (through the XML declaration).

Because XML can describe a document, or data, I think it's useful to mention a rule of thumb mentioned by Dan Sullivan to distinguished the two: If what is left after removing all the tags makes sense, then it is probably a document. Imagine a word document with all the content stripped into a plain text file. Data, on the other hand, tends to have all the metadata encoded into the tags, attributes, and structure of the document, so removing these tags leaves you with a jumble of numbers and strings.

XML Basics

Many of the high level components of XML will seem familiar to anyone who knows HTML. Tags and attributes are used with both, though XML tags and attributes have slightly different rules:

XML tags are always closed, which is done two ways: <tag></tag> OR <tag/>
XML elements may not overlap (must be strictly nested). So <t1><t2></t1><t2> is not allowed.
attribute values are always enclosed in quotes (single or double, doesn't matter)
XML is case sensitive.

In addition to the usual angle bracket tags, there are some special elements that use other delineators:

comments
processing instruction <? processing ?>
unparsed character data <!CDATA[ data.data.data ]]>

A well formed XML Document starts with a root element, though it may also include comments and processing instructions along with the root element. The XML declaration is the first line of an XML file, specifying the XML version and optionally the text encoding used in the document.

Another notable difference between HTML and XML, and what gives XML its name, is that while HTML has a fixed library of tags, XML tags can be anything (it is eXtensible...). While free-form XML is common, it is possible to define a schema for an XML document, which can then be used by an XML parser to determine if the XML document is valid according to it's schema. The Wiki article mentions several standards for schema definitions, but the most common are Document Type Definition (DTD) and XML Schema Definition (XSD primer).

Because XML does not have a fixed vocabulary, there is a potential problem with name collisions. That is, Company A and Company B may both define an element with the same name. While this is fine as long as their two definitions never exist in the same document, what if the need arises? This is where namespacing saves the day. Much like namespaces in programming languages, XML namespaces provide a way to qualify XML element names.

The following snippet demonstrates a few of these properties:

<?xml version="1.0" encoding="iso-8859-1" ?>  <!-- XML declaration           -->
<?xml-stylesheet href="orders.xsl"?>          <!-- XML stylesheet processing -->
<!--  This is a comment -->
 
<order id="ord123456">     <!--  "id" is an attribute of order.  -->
  <customer id='cust0921'> <!--  attributes can be surrounded with ' or ". -->
    <title value="Sir" />  <!--  self-closing (empty) tag.  -->
    <first-name>Dare</first-name>
    <First-Name>DARE</First-Name>     <!--  XML is case sensitive   -->
    <last-name>Obasanjo</last-name>
    <address>                            <!--  "address" opening tag      -->
      <street>One Microsoft Way</street>   <!--   *********************   -->
      <city>Redmond</city>                 <!--   children of "address"   -->
      <state>WA</state>                    <!--   *********************   -->
      <zip>98052</zip>                   <!--  "address" opening tag      -->
    </address>
    <special>
      <![CDATA[<><>@#$%^&"""""'']]>      <!--  CDATA is "character data" and can --> 
    </special>                           <!--  contain XML special characters    -->
  </customer>
 
  <!--  xmlns is used to declare a namespace prefix -->
  <items xmlns:cd="http://example.com/2007/Compact-Disc">
    <cd:compact-disc>
      <cd:price>16.95</cd:price>        <!--  "cd" is the prefix, and it is prepended -->
      <cd:artist>Nelly</cd:artist>      <!--  the the element tags (start and end)    -->
      <cd:title>Nellyville</cd:title>   <!--  separated by a colon                    -->
    </cd:compact-disc>
    <cd:compact-disc>
      <cd:price>17.55</cd:price>
      <cd:artist>Baby D</cd:artist>
      <cd:title>Lil Chopper Toy</cd:title>
    </cd:compact-disc>
  </items>
</order>

Manipulating XML as a Stream

There are two fundamentally different ways to deal with XML data. The first is to consider the XML as just a stream of data. This "low level" view of XML is supported in .NET by two classes (and their subclasses), XmlWriter and XmlReader. These classes provide a fast, memory efficient way to process XML data. The tradeoff with these classes are that they are forward only. However, for dealing with very large XML files, these are going to perform very well where the in-memory (DOM) based APIs will struggle (or choke all together). I grabbed a large XML data file from the University of Washington's XML Data Repository containing protein sequence data. This file takes up about 700MB on the harddrive, and contains about 21 million elements. Chrome refused to open it (memory usage hit nearly 4GB before it finally died):

Notepad made the same complaint. So it was no surprise when the XmlDocument class (which I'll discuss next) also could not load it all into memory:

        public static void AttemptToLoadHugeXmlIntoMemory()
        {
            Console.WriteLine("Loading document into memory... (this won't end well)");
            var dom = new XmlDocument();
            dom.Load(@"C:\XML\psd7003.xml");
        }

XmlReader, on the other hand, performed nearly 80M Read() operations on this file in just over 8 seconds, and never used more than 18MB of memory:

        public static void XMLReader_Example()
        {
            var settings = new XmlReaderSettings();
            settings.DtdProcessing = DtdProcessing.Parse;
            var xr = XmlReader.Create(@"C:\XML\psd7003.xml", settings);

            var timer = new Stopwatch();
            Console.WriteLine("Processing started...");
            timer.Start();

            int count = 0;
            while (!xr.EOF)
            {
                xr.Read();
                count++;
            }
            timer.Stop();
            
            Console.WriteLine("Called Read() " + count + " times, took " 
                + timer.ElapsedMilliseconds + "ms");
            
        }

The XmlReader can be configured using the XmlReaderSettings class. Because the protein database XML file has a DTD, it was necessary to configure the reader to parse the DTD (this is prohibited by default). Ironically, when I switched on validation, using the DTD, the same file failed validation, go figure... or did it? When I dug a little deeper, I found that the problem wasn't that the XML file was invalid. It just couldn't find the external DTD file. While the UW repo provided a copy of the DTD, the DOCTYPE declaration was pointing to a Georgetown university URL that was no longer valid (http://pir.georgetown.edu/pirwww/xml/002/psdml.dtd). StackOverflow to the rescue; I fired up Fiddler, had it intercept the call and responde with the downloaded DTD file, and all was well:

A smaller file makes it a little easier to wrap your head around what is happening. I ran sample XML from above (demonstrating the various components) through a console app that just prints the NodeType, LocalName, and Value. Each bit of output represents a Read() operation.

Compare this with the output when using the MoveToContent() method after each Read():

The declaration, processing instructions, comments, and whitespace have all been skipped over (of course this means I had to figure out all the indentation manually). But it also illustrates quite vividly that using the XmlReader, you can really get a fine grained look into the structure of the XML file you are working with. It is also possible to give the XmlReader a Stream, so one is not limited to reading XML files off the hard drive.

XmlWriter offers output functionality analogous that the XmlReader. We can provide a file name, stream, string builder, or a TextWriter to the static Create() method to produce a writer, and we can optionally provide an instance of the XmlWriterSettings class to configure things such as indentation, auto-closing tags, and conformance level. The following code will (roughly) recreate the example XML from above and output it as a string to the console:

public static void XmlWriterExample()
{
    var sb = new StringBuilder();
    var settings = new XmlWriterSettings();
    settings.Indent = true;
    settings.WriteEndDocumentOnClose = true;
    settings.Encoding = Encoding.GetEncoding("iso-8859-1");
    var xw = XmlWriter.Create(sb, settings);

    xw.WriteStartDocument();
    xw.WriteComment(" XML declaration           ");
    xw.WriteProcessingInstruction("xml-stylesheet", "href='orders.xsl'");
    xw.WriteComment(" XML stylesheet processing ");
    xw.WriteStartElement("order");
    xw.WriteAttributeString("id", "ord123456");
    xw.WriteComment("\"id\" is an attribute of order.  ");
    {  //braces are just for organization
        xw.WriteStartElement("customer");
        xw.WriteAttributeString("id", "cust0921");
        xw.WriteComment("  attributes can be surrounded with ' or \". ");
        {
            xw.WriteStartElement("title");
            xw.WriteAttributeString("value", "Sir");
            xw.WriteEndElement();
            xw.WriteComment("  self-closing (empty) tag.  ");
            WriteSimpleTag("first-name", "Dare", xw);
            WriteSimpleTag("First-Name", "DARE", xw);
            xw.WriteComment("  XML is case sensitive   ");
            WriteSimpleTag("last-name", "Obasanjo", xw);
            xw.WriteStartElement("address");
            {
                xw.WriteComment("\"address\" opening tag");
                WriteSimpleTag("street", "One Microsoft Way", xw);
                xw.WriteComment("   *********************   ");
                WriteSimpleTag("city", "Redmond", xw);
                xw.WriteComment("   children of \"address\"   ");
                WriteSimpleTag("state", "WA", xw);
                xw.WriteComment("   *********************   ");
                WriteSimpleTag("zip", "98052", xw);
            }
            xw.WriteEndElement(); //close address
            xw.WriteComment("\"address\" closing tag");
            xw.WriteStartElement("special");
            xw.WriteCData("<><>@#$%^&\"\"\"\"\"''");
            xw.WriteComment("  CDATA is \"character data\" and can ");
            xw.WriteEndElement();
            xw.WriteComment("  contain XML special characters    ");
        }
        xw.WriteEndElement(); //close customer

        xw.WriteComment("  xmlns is used to declare a namespace prefix ");
        xw.WriteStartElement("items");
        xw.WriteAttributeString("xmlns", "cd", null, "http://example.com/2007/Compact-Disc");
        {
            xw.WriteStartElement("cd", "compact-disc", null);
            {
                WriteSimpleTagNs("price", "cd", "16.95", xw);
                WriteSimpleTagNs("artist", "cd", "Nelly", xw);
                WriteSimpleTagNs("title", "cd", "Nellyville", xw);
            }
            xw.WriteEndElement();
            xw.WriteStartElement("cd", "compact-disc", null);
            {
                WriteSimpleTagNs("price", "cd", "17.55", xw);
                WriteSimpleTagNs("artist", "cd", "Baby D", xw);
                WriteSimpleTagNs("title", "cd", "Lil Chopper Toy", xw);
            }
            xw.WriteEndElement();
        }
        //the last couple closing tags will be added automatically
    }

    xw.Flush();
    xw.Close();

    Console.Write(sb.ToString() + "\n\n");
}

private static void WriteSimpleTag(string name, string value, XmlWriter xw)
{
    xw.WriteStartElement(name);
    xw.WriteString(value);
    xw.WriteEndElement();
}

private static void WriteSimpleTagNs(string name, string ns, string value, XmlWriter xw)
{
    xw.WriteStartElement(ns, name, null);
    xw.WriteString(value);
    xw.WriteEndElement();
}

Because the "Indent" property on the XmlWriterSettings is "true", the output is nicely indented automatically. The comments are a little wonky compared to the source xml, but having the comments off to the side isn't exactly idomatic XML anyway. The encoding is "utf-16" because it's using StringBuilder, when I swap out a FileStream, the encoding is "iso-8859-1" in the resultant file, as configured on the settings instance.

While the writer is very performant, the granularity of the control we have over the output is a two edged sword. The code above is very verbose (would have been even worse if I didn't throw together a couple of simple helper methods). For smaller data, the abstractions given by the DOM based APIs can make things much easier for us.

Manipulating XML as a Document

The first of the in-memory XML models is the XmlDocument API. Unlike the XmlReader and XmlWriter, the XmlDocument represents the ... well... document as a whole using a collection of classes. The root class is the XmlDocument, which forms the basis of the model, and includes functionality to load xml from another source (parsing it into an in-memory representation in the process), create new XML components, writing changes to an XmlWriter, or saving changes to a stream (such a FileStream or an in-memory string).

The components of an XML document each have their own class, such as XmlDeclaration, XmlElement, XmlAttribute, XmlComment, and more. These elements can be composed dynamically; an XmlElement can add and remove children (which can be other elements, or CData nodes, or comments), add or remove attributes, etc.

There are several ways to navigate the structure of an XmlDocument. A node has properties to access it's parent, children, and sibling nodes. Node collections are generally returned as an XmlNodeList, which implements IEnumerable and thus can be looped over in a foreach statement. Calling Cast<XmlNode>() on such a list allows you to use LINQ extension methods. Also, the SelectNodes() and SelectSingleNode() can take an XPath query and will return the nodes or first node, respectively, satisfying the query.

XPath is pretty much an entire language unto itself and I probably could write a whole blog post just on XPath, so I'll try to touch on the highlights. My first impression was that XPath "feels" a lot like CSS query selection. The Wikipedia entry gives a pretty decent high level overview of XPath. The location path is an XPath expression consisting of an axis (optional), a node test, and predicate (optional).

The axis determines how the nodes selected by the location path are related to the context node (basically, the starting node). An axis is always followed by a double colon (::) in the path. The following axis are described in the MSDN entry:

ancestor - parent node, grandparent node, etc.
ancestor-or-self - context node and all its ancestors.
attribute - attributes of the context node.
child - direct children of the context node (equivalent to ChildrenNodes property)
descendant - direct children plus all their children, grandchildren, etc.
descendant-or-self - context node and all its descendants.
following - all nodes after the context node, excluding descendants.
following-sibling - all sibling nodes that appear after the context.
namespace - namespace nodes of the context node.
parent - parent node of the context, if it exists.
preceding - opposite of following.
preceding-sibling - opposite of following-sibling.
self - just the context node.

The node test is the only required part of the path. This part of the path can specify node names, or several node types (which include parens like a function call...):

comment()
node()
processing-instruction()
text()

Gluing these node tests together are the various operators.

/ - child
// - recursive child (descendant)
. - current node
.. - parent node
* - wildcard
@ - attribute (so if I want the id of the current node, it's "./@id")
@* - attribute wildcard (select all attributes of current node)
: - namespace separator
() - grouping (establish precedence)
[] - used by predicate, also for array dereferencing
+, -, div, *, mod - arithmetic operators
and, not(), or, =, !=, etc. - boolean operators
| - union operator

A predicate is a filtering instruction, and can be thought of as analogous to a WHERE clause. These expressions are enclosed in square brackets. The final bit of XPath is functions:

Node-Set functions: count, id, last, local-name, name, namespace-uri, position
String functions: concat, contains, normalize-space, starts-with, string, string-length, substring, substring-after, substring-before, translate
Boolean functions: boolean, false, lang, not, true
Number functions: ceiling, floor, number, round, sum
Microsoft extensions functions...

The following code snippet touches on some XPath, as well as tidbits of other parts of the API. For many more XPath examples, see the MSDN XPath Examples page. In this code, I load our running sample into an XmlDocument, select all the comment nodes and remove them, clone and modify an existing node, and create a new node from scratch. Then I save the modified XML document (in this case to a StringBuilder):

public static void SaveChangesToXmlDocumentObject()
{
    Console.WriteLine("Loading document into memory...\n\n");
    var dom = new XmlDocument();
    dom.Load(@"C:\XML\XMLAnatomy.xml");

    var declaration = dom.ChildNodes.Cast<XmlNode>()
        .Where(x => x.NodeType == XmlNodeType.XmlDeclaration).FirstOrDefault();

    var ids = dom.SelectNodes("descendant::*/@id");

    //add the "cd" namespace prefix
    var nsmgr = new XmlNamespaceManager(dom.NameTable);
    nsmgr.AddNamespace("cd", "http://example.com/2007/Compact-Disc");

    //remove all the comments
    var comments = dom.SelectNodes("descendant::comment()");
    foreach (XmlNode comment in comments)
    {
        comment.ParentNode.RemoveChild(comment);
    }

    //grab the "items" and clone a cd
    var items = dom.SelectSingleNode("//items");
    var disc = dom.SelectSingleNode(
        "//cd:compact-disc[cd:artist='Nelly']", nsmgr)
        .Clone();

    //change the details
    disc.SelectSingleNode("cd:price", nsmgr)
        .InnerText = "21.50";
    disc.SelectSingleNode("cd:artist", nsmgr)
        .InnerText = "Alanis Morrisette";
    disc.SelectSingleNode("cd:title", nsmgr)
        .InnerText = "Jagged Little Pill";

    //add back into list
    items.AppendChild(disc);

    //create a cd from scratch
    string ns = "http://example.com/2007/Compact-Disc";
    var newCd = dom.CreateElement("cd", "compact-disc", ns);
    var price = dom.CreateElement("cd", "price", ns);
    price.InnerText = "99.95";
    var artist = dom.CreateElement("cd", "artist", ns);
    artist.InnerText = "Elton John";
    var title = dom.CreateElement("cd", "title", ns);
    title.InnerText = "The Complete Works";
    newCd.AppendChild(price);
    newCd.AppendChild(artist);
    newCd.AppendChild(title);
    items.AppendChild(newCd);

    //commit changes
    var sb = new StringBuilder();
    dom.Save(new StringWriter(sb));

    Console.WriteLine(sb.ToString() + "\n\n");
}

Manipulating XML with LINQ

In sharp contrast to the XmlDocument based API, the LINQ to XML API, which includes such creatively named classes as XDocument, XDeclaration, XElement, XAttribute, and XComment, among others, makes LINQ a first class citizen, and is really optimized around its use. While it is still possible to use XPath using extension methods in the System.Xml.XPath namespace, there is a bit of inelegant hackery involved in getting equivalent results... think a (cast) and then a .Cast<>... yeesh!

Loading an XML document from a file uri is pretty much exactly the same as on XmlDocument, but that's basically where the similarities end. The XDocument class really abstracts away the fact that we are dealing with an XML object, so you won't see InnerXml and OuterXml properties, but you will see properties for Declaration, Descendants, and various properties that resemble XPath axes. Namespace handling is much improved, and new element instances can be created with "new" instead of being spawned from the root document.

Creating new elements is also much more streamlined with the XElement constructor. While XmlElements were arguably easier to clone from existing elements, XElement nodes are much simpler to create from scratch. The following snippet makes all the same changes to the sample XML document, saving them to a text stream that is spat onto the console, but does so with 44 lines of code instead of 55 (a ~20% reduction). And it's arguably easier to read as well...

public static void XDocumentSample()
{
    Console.WriteLine("Loading document into memory...\n\n");
    var dom = XDocument.Load(@"C:\XML\XMLAnatomy.xml");

    var declaration = dom.Declaration;

    var ids = dom.Descendants().SelectMany(x => x.Attributes())
        .Where(x => x.Name.LocalName.Equals("id"));
    var xids = ((IEnumerable)dom.XPathEvaluate("descendant::*/@id"))
        .Cast<XAttribute>();

    //remove all the comments
    dom.DescendantNodes().Where(x => x.NodeType == XmlNodeType.Comment).Remove();     

    //grab the "items" and clone a cd
    var items = dom.Descendants("items").FirstOrDefault();
    XNamespace ns = items.GetNamespaceOfPrefix("cd");
    var disc = new XElement(dom.Descendants(ns + "artist")
        .Where(x => "Nelly".Equals(x.Value))
        .FirstOrDefault().Parent);

    //change the details
    disc.Descendants(ns + "price")
        .FirstOrDefault().Value = "21.50";
    disc.Descendants(ns + "artist")
        .FirstOrDefault().Value = "Alanis Morrisette";
    disc.Descendants(ns + "title")
        .FirstOrDefault().Value = "Jagged Little Pill";

    //add back into list
    items.Add(disc);

    //create a cd from scratch

    XElement newDisc = new XElement(ns + "compact-disc",
        new XElement(ns + "price", "99.95"),
        new XElement(ns + "artist", "Elton John"),
        new XElement(ns + "title", "The Complete Works")
        );
    items.Add(newDisc);

    //commit changes
    var sb = new StringBuilder();
    dom.Save(new StringWriter(sb));

    Console.WriteLine(sb.ToString() + "\n\n");
}

Using XSLT for Transformations

XSLT (Extensible Stylesheet Language Transformations) is a declarative language for describing XML transformations. An XSLT processor takes as input one or more XML documents, and one or more XSLT "stylesheets", and creates an output document, which can be any text based format, including XML, HTML, PDF... whatever.

The YouTube videos put almost all of their emphasis on the xsl:for-each and xsl:value-of elements, and these are arguably the most useful for interpolating data into the template text, but XSLT offers a much richer set of capabilities that these simplistic tutorials seem to suggest. XSLT leverages the power of the XPath language for evaluating paths and expressions.

It is possible to bind parameters in the template by using the xsl:param element. These parameters can then be passed into the processor (I'll demonstrate in a bit). Similarly, the xsl:variable element allows you to define a variable that can be selected using the expression syntax $varname. Visual Studio (at least the 2017 version) was smart enough to signal errors when I tried to reference a parameter that was not available (I was futzing around with where to define it).

XSLT can also use conditionals. The xsl:choose element acts like a switch statement in conjunction with the xsl:when (case) and xls:otherwise (default) elements. There is also an xsl:if element that acts like an if statement. If and When both include a "test" attribute that is set to an XPath boolean expression.

XSLT allows for number formatting and twiddling white space (preserving or stripping it). Elements can be sorted with xsl:sort. It also includes the ability to compose multiple template files. Both Wikipedia and MSDN include lists of the available element types.

I created a quickie demonstration using auction data from the XML Repository site (yahoo.xml and ebay.xml). Here is a snippet of the XML file, with the reference to our XSLT file. In order to apply the transformation, the XML source file must have an xml-stylesheet processing instructions element pointing to the .xslt file:

<?xml version='1.0' ?>
<?xml-stylesheet href="Transform.xslt" type="text/xsl" ?>
<!DOCTYPE root SYSTEM "http://www.cs.washington.edu/.../auctions/yahoo.dtd">
<root>
  <listing>
   <seller_info>
       <seller_name>jenzen12 </seller_name>
       <seller_rating>new </seller_rating>
   </seller_info>

The XSLT file (in it's entirety). Like every other example on the planet this takes the XML input and transforms it into an HTML table. This also takes in a parameter for the "Auction Platform" column, with the default value set to 'Generic':

<?xml version="1.0" encoding="utf-8"?>
 
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
  
    <xsl:output method="html" indent="yes"/>
    <xsl:param name="source" select="'Generic'" />
    <xsl:template name="html_table" match="/">
      <html>
        <header>
          <style>
            td {
                border: 1px solid #dddddd;
                text-align: left;
                padding: 8px;
            }
          </style>
        </header>
        <body>
          <table>
            <thead>
              <tr>
                <th>Auction Platform</th>
                <th>Seller Name</th>
                <th>High Bidder Name</th>
                <th>Highest Bid Amount</th>
              </tr>
            </thead>
            <tbody>
              <xsl:for-each select="//listing">
                <tr>
                  <td><xsl:value-of select="$source"/></td>
                  <td><xsl:value-of select="seller_info/seller_name"/></td>
                  <td><xsl:value-of select="auction_info/high_bidder/bidder_name"/></td>
                  <td><xsl:value-of select="bid_history/highest_bid_amount" /></td>
                </tr>
              </xsl:for-each>
            </tbody>
          </table>
        </body>
      </html>
    </xsl:template>
</xsl:stylesheet>

The following simple console app (based on this StackOverflow answer with some modifications) applies the transform with the "source" parameter set. This little app leverages classes from the System.Xml.Xsl namespace. The XSLT processor built into Visual Studio (accessed through the XML top level menu when in an XSLT or XML file) doesn't have the capability to pass in parameters. The resulting HTML files illustrate the difference, with the VS generated file reflecting the default value for the "source" parameter.

    class Program
    {
        public static void Main(string[] args)
        {
            XsltArgumentList argsList = new XsltArgumentList();
            argsList.AddParam("source", "", "yahoo");

            XslCompiledTransform transform = new XslCompiledTransform(true);
            transform.Load("Transform.xslt");
            var settings = new XmlReaderSettings();
            settings.DtdProcessing = DtdProcessing.Parse;
            var reader = XmlReader.Create("yahoo.xml", settings);

            using (StreamWriter sw = new StreamWriter("output.html"))
            {
                transform.Transform(reader, argsList, sw);
            }
        }
    }

Failed the Turing Test

Tuesday, June 6, 2017

Microsoft 70-487: Manipulate XML data structures