Tuesday, 25 November 2014

Syntax strings in different modes

Earlier I wrote a post about syntax strings versus literal strings, which involved me diving into the Uniface manuals to check my facts.  I talked about using $syntax to convert a literal string into a syntax string - very useful!  

However, what I didn't know before was that you can have different modes.  This is something that appears to have been added in Uniface 9.4.01, without me noticing.

There's a lovely table in the Uniface manuals to describe them, but I've tested their examples and I think there are some inaccuracies.  So I'm going to try and lay them out for you here...

Classic

This is the default behaviour, but is also represented by '%[X]'.  In this case, the usual pattern matching rules apply, with syntax codes being used as wildcards to represent patterns of characters.

  • Proc code: $syntax("D&G")
  • Syntax string: '%[X]%D&G'
  • Matches: "DOG", "DIG", "DUG", etc.

You can see the syntax string starts with the mode defined in this case, which is optional for classic mode, because this is the default.

Case Sensitive

In this mode, all characters will only match with characters of the same case.  Also, syntax codes will be treated as their literal characters, and not as wildcards.

  • Proc code: $syntax("D&G","CS")
  • Resulting syntax string: '%[CS]D%&G%[X]'
  • Matches: "D&G"

You can see the syntax string starts with the mode defined in this case, but it also ends with the exit mode, switching back to the default/classic mode.

The manuals suggest that in the proc code the mode is passed in without the double quotes, but I found this had to be a string for it to work.  They also suggest that the mode "S" can be used instead of "CS" for the same thing - I found that only "CS" works in proc code, but you can use '%[CS]' or '%[S]' in syntax strings you write yourself.


Case Insensitive

In this mode, all characters will match with characters of either upper or lower case.  Again, syntax codes will be treated as their literal characters, and not as wildcards.

  • Proc code: $syntax("D&G","CI")
  • Resulting syntax string: '%[CI]D%&G%[X]'
  • Matches: "D&G", "d&g", "D&g" and "d&G"

Again, I had to use a string to define the mode in the proc code, and only "CI" worked, but '%[CI]' and '%[I]' both worked in syntax strings I wrote myself.

NLS Locale

In this mode, all characters will match with characters of either upper or lower case, depending on the National Language Support (NLS) locale that the system is in (can be checked or set using $nlslocale).  Again, syntax codes will be treated as their literal characters, and not as wildcards.

  • Proc code: $syntax("i#B","NLS")
  • Resulting syntax string: '%[NLS]i%#B%[X]'
  • Matches: If your local is Turkish (tr_TR) then "i#B" and "İ#b", but not "I#B"

Once more, I had to use a string to define the mode in the proc code, and only "NLS" worked, but '%[NLS]' and '%[N]' both worked in syntax strings I wrote myself.

Mixing modes

You can also define a combination of modes in a single pattern, something like this...
  • Proc code: $syntax("%[CI]D%&%[CS]G%[X]")
  • Resulting syntax string: '%[CI]D%&%[CS]G%[X]'
  • Matches: "D&G" and "d&G"

There's a slight typo in the manuals with the proc code here, as they have missed the "%" from the beginning!  

Summary: You can use different modes in syntax strings, either specifying the case-sensitivity or relying on the NLS locale, but be careful if you're using $syntax because you'll lose the wildcards.

Monday, 24 November 2014

Syntax strings versus literal strings

There are a number of very important differences between syntax strings and literal strings, which I will attempt to highlight for you here.

Firstly, the Uniface definition of a syntax string...
Uniface enables you to determine if the data in a string value matches a desired pattern using syntax strings.  A syntax string is a group of characters and syntax codes enclosed in single quotation marks (').
So that's the first big difference right there, literal strings are in double quotes (") and syntax strings are in single quotes (')...

  literalString = "Literal string"
  syntaxString = 'Syntax string'

A syntax string is used for pattern matching, but they are much more simplistic than the regular expressions that you might be used to in other languages.  The syntax codes are quite straight forward...

  • # - one digit (0-9)
  • & - one letter (A-Z, a-z)
  • @ - one letter, digit or underscore (A-Z, a-z, 0-9, _)
  • ~& - one extended letter
  • ~@ - one extended letter, digit or underscore
  • ? - one ASCII character
  • A-Z - that letter, in uppercase
  • a-z - that letter, in uppercase or lowercase

On top of these are a few other syntax codes of note...

  • If you want to search for the literal version of a syntax code, eg. you want to search for the hash character (#) not a digit (0-9), then you can escape the syntax code using % before it.  Therefore, to search for a hash character it would be '%#' and to search for a percentage character it would be '%%'.
  • If you want to search for a unknown number of syntax code, eg. you want to search for 2 or more digits, then you can use * after it.  Therefore, to search for 2 or more digits it would be '###*' - in this case the first two hash characters represent one digit each, and the third hash character is part of the the syntax code '#*', which means zero or more (0-n) digits.
  • If you want part of the pattern to be optional (to match the pattern or blank) then you can put rounded brackets around it, using ( and ).  For example, '##(#)' would match with 2 or 3 digits (but not more than 3).
  • Any other character is treated literally as that character.  

Uniface also gives a handy function that can be used to convert a literal string into a syntax string, sensibly named $syntax...

  syntaxString = $syntax("Literal string")

This can be especially useful if you're storing the pattern in the database, or some other configuration.

Here are a few examples from the Uniface manuals...

Proc with Syntax String
Result
if ('#' = "123")
if ('#*' = "123")
FALSE
TRUE
if ('#*' = vValue)
if ('#*' = "%%vValue")
TRUE
TRUE
if ('&###' = "1234")
if ('@###' = "1234")
FALSE
TRUE
if ('?' = "A")
if ('??' = "A")
if ('??*' = "A")
if ('?' = "ABC")
if ('??*' = "ABC")
TRUE
FALSE
TRUE
FALSE
TRUE
if ('(#(-))&&&' = "ABC")
if ('(#(-))&&&' = "1ABC")
if ('(#(-))&&&' = "1-ABC")
if ('(#(-))&&&' = "12ABC")
TRUE
TRUE
TRUE
FALSE


I've had (occasionally heated!) discussions with developers before, when they have said that...

  if ( myVar = "Y" )

...and...

  if ( myVar = 'Y' )

...are interchangable.  

Yes, I can see that the pattern 'Y' only matches with the string "Y" and nothing else, but they are entirely different ideas, and should not be used interchangeably.  If you want to only match with "Y", then use the literal string "Y", because that's what you mean.  

In my experience, developers who say that these are interchangable are developers who don't understand what the differences are.

Summary: Syntax strings are very useful for pattern matching, but very different than literal strings.  Know the difference, and know when to use which.