This is an extension to my previous post in hopes to explain things a little better of shredding XML in SQL Server.
What is XML?
eXtensible Markup Language (XML) is a markup language that was designed to make it easier to store and send data, all while being easy to read (ref). The act of shredding XML is another way term for extracting values from an XML.
Why Use XML in SQL Sever?
In SQL Server, there is an XML data type that I work with from time to time as it is very helpful in a variety of situations such as having data that is easy to read, have data that is easy to store, or it may be the case that it is the only way you can receive data. In my current situation, I needed use it because the application programming interface (API) I was consuming data from was passing XML.
How to Shred XML in SQL Server
To shred the values, you need to use XML data type methods. It follows the same “SELECT * FROM” format, but you are really executing methods. We’ll be going over the value() and nodes() methods which use XQuery expressions. These methods can be used based off of alias and variable names using dot-notation. You can read more about the others here.
Rather than trying to explain each part beforehand, I think it’s easier just to dive right into the code:
As you can see, I declared a variable called @XML that is an XML data type. The root element is called CustomDataSet – which has three sub-child elements; VersionNumber, DownloadDate, and an array of Persons. The Persons array can have any number of persons, but I decided to stick with two that has some information in them.
Let’s get the syntax down for the methods:
- The nodes syntax: nodes(XQuery expression) AS Table(Column)
- The value syntax: value(XQuery expression, SQL data type)
You can probably map the syntax back to the SQL and understand it, but if you don’t, I’ll explain it a bit farther. In a nutshell, you need to link datasets with other datasets and form a path to the value(s) you want. XQuery expressions are used to determine which element the method (nodes() or value()) should be looking at.
In the nodes() method, I am saying “start at the CustomDataSet node and alias it as customDataSet(cols)”. In the value() methods, I am saying “Based on customDataSet(cols) dataset, give me the values for the specified element (VersionNumber or DownloadDate) and it should be of the specified SQL data type (VARCHAR(30) or DATE)”. The bracket-number ([1]) you see after the specified element tells the method to grab the X ordinal value – in this case, the first value.
You can use the bracket-number to get whichever person you want from the array of persons, but it might be easier to create a couple of table/column node sets and get the values from that – just as in the second query. In the example above, since everything is connected, you’re going to have to pull from the customDataSet(cols) set to reach the Persons element, and then use that Persons set to get to the Person you want.
Another thing I wanted you to notice was how it handles missing fields in this example: There is a ‘RandomField’ in Barack Obama’s section. You can see that the return value is NULL for John Xiong since there isn’t that column in his. If you were to use an XML Schema Definition (XSD) that required for this field to come through, the XML will not parse and error – but that will be for a different post.
Hopefully that helps lay out some ground rules for shredding XML in SQL Server.