[ Matching Tags using Regular Expressions Balancing Groups ]


Matching Tags using Regular Expressions Balancing Groups

The regular expressions engine provided by the .NET Framework includes a new feature known as ‘balancing groups’.

This feature allows you to increment/decrement the match count of a named capturing group by giving the group a positive and negative match context. You can then test to see you have an equal number of matches, by testing if the group has a value (i.e. an effective zero result means the group was balanced). You can include this syntax in your match pattern, so that only the balanced result is considered a match.

Microsoft don’t really go into this much and only show a small example of matching opening and closing paranthesis.

In my case, I wanted to match a specific chunk of HTML code in a file and then find the closing tag to matching the name of the opening tag.

For example:

  <div class="targetContent">
   Something in here
   <div> Something else in here</div>

Using standard regular expressions, searching for <div class=”targetContent”> to </div> can work in two ways. Non-greedy mode, matches on the </div> of the inner div. In greedy mode, it matches all the way to end of the outer div.

What I wanted to do, is match on the last </div> that makes the tags balance, which can be done using balancing groups!

C# Code:

pattern = "<div class=\"targetContent\">.*?((?<TAG><div).*?(?<-TAG></div>))?(?(TAG)(?!))</div>";

Effectively, what the expression does is:

  1. Start the match from the div with class=”targetContent”
  2. Match any internal content
  3. Whenever it encounters another div tag, it increments the TAG count
  4. Match any nested content
  5. Whenever it encounters another closing div tag, it decrements the TAG count
  6. It becomes a match when the tag count is equal
  7. Finally match on the closing tag of our outer div

This can be applied to any XML style markup, where you have the notation of opening and closing tags.

Share This!

2 Comments to Matching Tags using Regular Expressions Balancing Groups

  1. Gary's Gravatar Gary
    April 18, 2013 at 8:06 pm | Permalink

    This doesn’t seem to be working for me. At least not when testing it using Rad Software Regular Expression Designer. Not sure if it is the pattern or the software but try your match on the following:

    Something in here
    Something else in here
    Something else in here
    Something else in here

    No balancing occurs and it will incorrectly end the match on the 4th line rather than the 6th line.

  2. Richard Edwards's Gravatar Richard Edwards
    March 12, 2009 at 11:07 am | Permalink

    Thanks Craig, just what I was looking for.

Leave a Reply
Car Leasing | Lease Cars