Regex
Microsoft .NET Framework, ASP.NET, Visual C# (CSharp, C Sharp, C-Sharp) Developer Training, Visual Studio
The regex class contains a collection of regularly used expressions for string manipulation:
-
IsMatch -
Match -
Matches
-
Contents |
Matches
The Matches method is one of the hardest methods to use due to the seemingly strange string search patterns. The string pattern format originated from UNIX programming, so unless you have a good knowledge of that, chances are you will have no idea how to decipher the patterns.
Scenario
You have the following string and your program must extract the coordinate data.
“You must first go to these 4 points: {X=23, Y=43} {X=1, Y=2} {X=4, Y=8} {X=0, Y=8} finally after going to all of those points you must go to {X=4, Y=1}”
You could decode this in two main ways as the following examples show.
Example - The Long Way
Although it solves the problem it is designed for one task and one task only. If at a later date that task was to change slightly the chances are it would be easier to start over again rather than trying to adapt the existing code. Note that by using a state machine in a situation such as this simplifies the code over multiple if statements.
private enum states { FINDING_X_VALUE, CONFIM_X_VALUE, READING_X_VALUE, FINDING_Y_VALUE, CONFIRM_Y_VALUE, READING_Y_VALUE }
private void longDecode(String toDecode) { StringBuilder tempNumber = new StringBuilder(); states currentState = states.FINDING_X_VALUE; List<Point> points = new List<Point>(); Point currentPoint = new Point(); Boolean error = false; Int32 i = 0; while ((i < toDecode.Length) && (error == false)) { switch (currentState) { case states.FINDING_X_VALUE: { if (toDecode[i] == 'X') { currentState = states.CONFIM_X_VALUE; } } break; case states.CONFIM_X_VALUE: { if (toDecode[i] == '=') { currentPoint = new Point(); tempNumber = new StringBuilder(); currentState = states.READING_X_VALUE; } else { currentState = states.FINDING_X_VALUE; } } break; case states.READING_X_VALUE: { if (toDecode[i] != ',') { tempNumber.Append(toDecode[i]); } else { try { currentPoint.X = Convert.ToInt32(tempNumber.ToString()); } catch { error = true; } currentState = states.FINDING_Y_VALUE; } } break; case states.FINDING_Y_VALUE: { if (toDecode[i] == 'Y') { currentState = states.CONFIRM_Y_VALUE; } } break; case states.CONFIRM_Y_VALUE: { if (toDecode[i] == '=') { tempNumber = new StringBuilder(); currentState = states.READING_Y_VALUE; } else { //Only gets here if there has been an 'X=' //Therefore there must be a 'Y=' following it error = true; } } break; case states.READING_Y_VALUE: { if (toDecode[i] != '}') { tempNumber.Append(toDecode[i]); } else { try { currentPoint.Y = Convert.ToInt32(tempNumber.ToString()); } catch { error = true; } if (error == false) { currentState = states.FINDING_X_VALUE; points.Add(currentPoint); } } } break; default: //Should never get here break; } i++; } if (error == false) { for (int y = 0; y < points.Count; y++) { textBox1.AppendText(String.Format("Point: {0} | {1}\r\n", y, points[y].ToString())); } } else { textBox1.Text = "ERROR!"; } }
Example - The Short Way
private void shortDecode(String toDecode) { Int32 i = 0; Boolean error = false; Point currentPoint = new Point(); List<Point> points = new List<Point>(); MatchCollection XPoints = Regex.Matches(toDecode, @"{X=(\d{1,})"); MatchCollection YPoints = Regex.Matches(toDecode, @"Y=(\d{1,})}"); error = (XPoints.Count != YPoints.Count) ? true : error; while ((i < XPoints.Count) && (error == false)) { currentPoint = new Point(); try { currentPoint.X = Convert.ToInt32( XPoints[i].Groups[1].ToString()); currentPoint.Y = Convert.ToInt32( YPoints[i].Groups[1].ToString()); } catch { error = true; } if (error == false) { points.Add(currentPoint); } i++; } if (error == false) { for (int y = 0; y < points.Count; y++) { textBox1.AppendText(String.Format("Point: {0} | {1}\r\n", y, points[y].ToString())); } } else { textBox1.Text = "ERROR!"; } }
Output
Processing Comparison
Although you get code which is nice and small and easy to adapt for future requirements you also end up with slower code. If speed is an issue then the State machine approach should be taken due to it being over 3 times faster than using the Regex method.
