Regex

Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio


Jump to: navigation, search
Exam Prep. Guides
Exam 70-536 Study Guide

1. Types and collections
2. Process, threading,…
3. Embedding features
4. Serialization, I/O
5. .NET Security
6. Interop., reflection,…
7. Global., drawing, text

edit

The regex class contains a collection of regularly used expressions for string manipulation:

  • IsMatch
  • Match
  • Matches


Contents

Matches

The Matches method is one of the hardest methods to use due to the seemingly strange string search patterns. The string pattern format originated from UNIX programming, so unless you have a good knowledge of that, chances are you will have no idea how to decipher the patterns.

Scenario

You have the following string and your program must extract the coordinate data.

“You must first go to these 4 points: {X=23, Y=43} {X=1, Y=2} {X=4, Y=8} {X=0, Y=8} finally after going to all of those points you must go to {X=4, Y=1}”

You could decode this in two main ways as the following examples show.

Example - The Long Way

Although it solves the problem it is designed for one task and one task only. If at a later date that task was to change slightly the chances are it would be easier to start over again rather than trying to adapt the existing code. Note that by using a state machine in a situation such as this simplifies the code over multiple if statements.

private enum states
{
  FINDING_X_VALUE,
  CONFIM_X_VALUE,
  READING_X_VALUE,
  FINDING_Y_VALUE,
  CONFIRM_Y_VALUE,
  READING_Y_VALUE
}
private void longDecode(String toDecode)
{
  StringBuilder tempNumber = new StringBuilder();
  states currentState = states.FINDING_X_VALUE;
  List<Point> points = new List<Point>();
  Point currentPoint = new Point();
  Boolean error = false;
  Int32 i = 0;
 
  while ((i < toDecode.Length) && (error == false))
  {
    switch (currentState)
    {
      case states.FINDING_X_VALUE:
        {
          if (toDecode[i] == 'X')
          {
            currentState = states.CONFIM_X_VALUE;
          }
        }
        break;
 
      case states.CONFIM_X_VALUE:
        {
          if (toDecode[i] == '=')
          {
            currentPoint = new Point();
            tempNumber = new StringBuilder();
            currentState = states.READING_X_VALUE;
          }
          else
          {
            currentState = states.FINDING_X_VALUE;
          }
        }
        break;
      case states.READING_X_VALUE:
        {
          if (toDecode[i] != ',')
          {
            tempNumber.Append(toDecode[i]);
          }
          else
          {
            try
            {
              currentPoint.X = Convert.ToInt32(tempNumber.ToString());
            }
            catch
            {
              error = true;
            }
            currentState = states.FINDING_Y_VALUE;
          }
        }
        break;
 
      case states.FINDING_Y_VALUE:
        {
          if (toDecode[i] == 'Y')
          {
            currentState = states.CONFIRM_Y_VALUE;
          }
        }
        break;
 
      case states.CONFIRM_Y_VALUE:
        {
          if (toDecode[i] == '=')
          {
            tempNumber = new StringBuilder();
            currentState = states.READING_Y_VALUE;
          }
          else
          {
            //Only gets here if there has been an 'X='
            //Therefore there must be a 'Y=' following it
            error = true;
          }
        }
        break;
 
      case states.READING_Y_VALUE:
        {
          if (toDecode[i] != '}')
          {
            tempNumber.Append(toDecode[i]);
          }
          else
          {
            try
            {
              currentPoint.Y = Convert.ToInt32(tempNumber.ToString());
            }
            catch
            {
              error = true;
            }
 
            if (error == false)
            {
              currentState = states.FINDING_X_VALUE;
              points.Add(currentPoint);
            }
          }
        }
        break;
 
      default:
        //Should never get here
        break;
    }
 
    i++;
  }
 
  if (error == false)
  {
    for (int y = 0; y < points.Count; y++)
    {
      textBox1.AppendText(String.Format("Point: {0} | {1}\r\n", y, points[y].ToString()));
    }
  }
  else
  {
    textBox1.Text = "ERROR!";
  }
 
}


Example - The Short Way

private void shortDecode(String toDecode)
{
  Int32 i = 0;
  Boolean error = false;
  Point currentPoint = new Point();
  List<Point> points = new List<Point>();
  MatchCollection XPoints = Regex.Matches(toDecode, @"{X=(\d{1,})");
  MatchCollection YPoints = Regex.Matches(toDecode, @"Y=(\d{1,})}");
 
  error = (XPoints.Count != YPoints.Count) ? true : error;
 
  while ((i < XPoints.Count) && (error == false))
  {
    currentPoint = new Point();
 
    try
    {
      currentPoint.X = Convert.ToInt32( XPoints[i].Groups[1].ToString());
      currentPoint.Y = Convert.ToInt32( YPoints[i].Groups[1].ToString());
    }
    catch
    {
      error = true;
    }
 
    if (error == false)
    {
      points.Add(currentPoint);
    }
 
    i++;
  }
 
  if (error == false)
  {
    for (int y = 0; y < points.Count; y++)
    {
      textBox1.AppendText(String.Format("Point: {0} | {1}\r\n", y, points[y].ToString()));
    }
  }
  else
  {
    textBox1.Text = "ERROR!";
  }
}

Output

Image:stringmatch.gif


Processing Comparison

Although you get code which is nice and small and easy to adapt for future requirements you also end up with slower code. If speed is an issue then the State machine approach should be taken due to it being over 3 times faster than using the Regex method.

MSDN references



Personal tools