Saturday 2 June 2012

Validating a numeric string

Today I wanted to validate a numeric string, in the format "YYYYMMDD".  I wasn't particularly worried about checking that there were only 12 months in the year, 28-31 days in the month, or whether the year was in a sensible range, I just wanted to make sure that I'd found a string of 8 numeric digits.


So my first thought was to check the length using $length, an easy place to start...



  if $length(str) = 8 )
    ;good start
  endif



Next I went on to think about checking it was numeric, $number being the first function that popped into my head which seemed relevant...



  if $length(str) = 8 & $number(str) = str )
    ;pretty good
  endif



But then I remember that this matches even if the string contains spaces, which I didn't want (nor do I think is correct, personally), so I had to add a check for that...



  if $length(str) = 8 & $number(str) = str & $scan(str," ") = 0 )
    ;even better
  endif



And then I thought about other "numeric" characters that were allowed.  Depending on your language settings, this could include plus (+), minus (-), fullstop (.) and comma (,) - there may be more as well. 


It was at this point that I thought I must be doing something wrong, this clearly wasn't the best way of doing this.  


Then I remember a suggestion a colleague of mine had made to me when I started this blog - syntax strings.  It's very easy to use a syntax string to pattern match for numbers...



  if ( $length(str) = 8 & str = '#*' )
    ;much better
  endif


I won't go into the details of syntax strings as this is well documented, but suffice to say that a hash (#) means a numeric digit (0-9) and star (*) means any number of them (0-n).  Combined with my length check, this seemed right.  

But then it occurred to me that I could improve this even further...


 if ( str = '########' )
    ;perfect
  endif


In this case I have specified that I require 8 numeric digits, which is exactly what I wanted to check for.  Obviously using a syntax string here has simplified my code and made it much more readable, it's clear to another developer what I'm trying to achieve.

Now usually I'd have settled for the code that was using $length, $number$ and $scan, which would have sufficed and seemed pretty logical as I was writing it.  But noticing how much more elegant this code was got me thinking... How much better does this code perform?  Surely better than all those function calls!

I took the last 3 of my code blocks and ran them 2,000,000 times, giving the following results...

  • $length+$number+$scan = 00:07.70, 00:07.70, 00:07.65 (almost 8 seconds)
  • $length+syntax string = 00:06.33, 00:06.25, 00:06.27 (over 6 seconds)
  • Just the syntax string = 00:05.00, 00:05.09, 00:05.00 (about 5 seconds)


As you can see not only is this code more elegant and therefore more maintainable, but it also performs better.  Given then difference is only a second or two over 2 million iterations, it's not really much of a consideration by itself, but they say every little helps!

Thanks to Dave W for the inspiration which led to this post.

Summary: Don't always jump to using Uniface functions for validation and other checks, it can often be simpler and more efficient to use syntax strings to perform pattern matches.

No comments:

Post a Comment