XML Workshop 12 - Parsing a delimited strin

Introduction
If you have read most of my articles in XML Workshop series, you might have figured that I am a strong supporter of XML. There had been a lot of debates on TO-XML or NOT-TO-XML for quite some time. But I guess it is very much subjective. The question should be answered based on a specific task at hand. So the real question is not TO-XML or NOT-TO-XML. But the question should be 'TO-XML or NOT-TO-XML for the implementation of a specific task'. When I have a specific task at hand, based on various factors I might decide to go for an XML approach or an approach without XML.

Some times, we take a specific approach even though we know there is a better way of doing the given task. But again, you might have your own reasons for that. So it does not mean that a good technology should always be used. There are so many factors that might influence the Project Manager or the Tech Lead from deciding an approach suitable for the specific task.

We started working with SQL Server 2005 right from the initial days. I was very much fascinated by the XML support. I knew how to use XQuery in TSQL. We had many stored procedures which accepted string parameters in XML format. Those stored procedures used OPENXML() to transform the XML string to a result set and read values from the result set. I know, I could have used an XML variable instead of a string in XML structure and used XQuery instead of OPENXML(). That would have been a better choice given the XML support in SQL Server 2005. But we still continued with the old approach for a few more months. The reason why we did not use XQuery initially was because not all developers were trained enough to work with XQuery at that time. We had been very busy and had been totally focussing on getting the modules completed and hence thought of moving ahead with the previous approach and refactoring the code at a later stage.

The reason why I brought you to this discussion is that, I believe it is incorrect to say that 'X is the best approach'. But the correct way is to say 'X is the best approach for Y task given A, B and C conditions'. 'A, B and C' may be the project dead line, size of the application, volume of changes needed, skill level of the programmers and the like.

The goal of my XML workshop is to expose XML capabilities of SQL Server 2005 from various angles and give you examples which you might be able to use in your applications right when you decide to go in the XML way.

Parsing a delimited string
The most common approach seen in the Internet code samples is to use PATINDEX in a WHILE loop and use SUBSTRING() to chop each piece of data. This data is inserted to a memory table or temp table and consumed. I did a search in Google and found the following articles which explains how to tokenize/parse a delimited string.

Tokenizing a String Using PARSENAME
Splitting a string of unlimited length
Quick T-SQL to parse a delimited string
T-SQL string parsing
This article presents a different approach which splits a delimited string using the XML approach. The following example adds XML tags to a comma separated string and retrieves a result set using XQuery. [code]

1 /*
2 Here is the string that we need to split
3 */
4 DECLARE @str VARCHAR(100)
5 SET @str = '0001,0002,0003,0004,0005'
6
7 /*
8 I am converting the string to an XML structure by
9 inserting XML tags.
10 */
11 DECLARE @x XML
12 SET @x = '' + REPLACE(@str, ',', '') + ''
13
14 /*
15 Now we can apply XQuery to return a result set
16 */
17 SELECT x.i.value('.', 'VARCHAR(4)') AS Item
18 FROM @x.nodes('//i') x(i)
19
20 /*
21 Item
22 ----
23 0001
24 0002
25 0003
26 0004
27 0005
28
29 (5 row(s) affected)
30 */
If you want to reuse this code, probably it is a good idea to create a function which does this. Here is an example. [code]

1 CREATE FUNCTION dbo.SplitString
2 (
3 @str VARCHAR(MAX),
4 @delimeter CHAR(1)
5 )
6 RETURNS @ret TABLE (Token VARCHAR(MAX))
7 AS
8 BEGIN
9
10 DECLARE @x XML
11 SET @x = '' + REPLACE(@str, @delimeter, '') + ''
12
13 INSERT INTO @ret
14 SELECT x.i.value('.', 'VARCHAR(MAX)') AS token
15 FROM @x.nodes('//t') x(i)
16
17 RETURN
18 END
It may not be a good approach to use VARCHAR(MAX). If you do not expect really large strings and tokens, it is better to use a fixed size that will suite your specific data requirements.

As a last step, let us test the function we just created. [code]

1 SELECT * FROM dbo.SplitString('0001|0002|0003|0004|0005', '|')
2 SELECT * FROM dbo.SplitString('0001,0002,0003,0004,0005', ',')
Conclusions
Huh! Yet another way of splitting a string.