Schema validation using Reg Ex

One of the most powerful tools that XML in SQL Server brings to the table is Regular expresions. In XML Schemas Regular Expressions are an easy method to validate your data without having to bolt on complex CLR functions to you database.

Consider a UK Postcode they are constructed in a known pattern.

MK13 3XZ

The first part of a post code is either 1 or 2 letters in the range of a to z or (to be pedantic, as with databases you have to) A to Z.  This is followed by a number in the range of 1  to 99 or for use of regex’s 2 numbers in the range of 1 to 9.  The two parts of thepostcode should be seperated with a space.

The second part of the postcode is constructed of a single number in the range 0 to 9 and then two letters a to z or A to Z.

Admittedly there are other ways to use sql server to confirm that the data is a valid post code my favourite being running a query against all valid post codes but for a first parse a regEx could be the way forward.

A RegEx for a postcode could be “[a-zA-Z]{1,2}[0-9]{2} [0-9][a-zA-Z]{2}”

There are lots of articles on the web for the construction of RegEx’s and this is a small blog article so i won’t go into the deconstruction of the regEx here.

The regular expresion can be put into the XML schema in the following fashion

CREATE XML SCHEMA COLLECTION Postcode AS
N'<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="Address">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="Postcode" minOccurs="0">
        <xsd:simpleType>
         <xsd:restriction base="xsd:string">
          <xsd:pattern value="[a-zA-Z]{1,2}[0-9]{2} [0-9][a-zA-Z]{2}"/>
         </xsd:restriction>
       </xsd:simpleType>
     </xsd:element>
   </xsd:sequence>
  </xsd:complexType>
 </xsd:element>
</xsd:schema>'
GO
DECLARE @xml XML(DOCUMENT Postcode)
SET @XML = N'<Address><Postcode>NP32 4Df</Postcode></Address>'

The above XML can be loaded into the Schema’d variable. and be validated. however the xml fragment of

DECLARE @xml XML(DOCUMENT Postcode)
SET @XML = N'<Address><Postcode>3NP2 4Df</Postcode></Address>'

will automatically be rejected.

In summary you can can use the xml regex to automatically validate data with a known sting format before the data is processed (i.e. in the loading stage of SQL Server).  This can be useful if you are trying to load large volumes of data from xml or you want to keep all you validation conditions in one area.

Advertisements
This entry was posted in SQL, XML. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s