POSIX regex Library HOWTO

From GLUG-BOM

You are here: Main Page > Howtos > POSIX regex Library HOWTO

We often use grep while working on the command-line. This nifty utility does a lot for its 90k. The POSIX regex library provides similar pattern-matching functionality with a C API.


Contents

Basics

This library works in two steps:

  • Compiling the regular expression

We initially have a char[] buffer containing the regular expression for e.g. '[a-zA-Z0-9]*'. This is compiled into a regex_t structure. This is used for matching the pattern against candidate strings. Compilation is done with the regcomp() function. This function returns zero for a successful match or an error code for failure.

  • Executing the regular expression

The compiled regular expression is matched against target strings using regexec(). This requires five inputs, the two most important one being the string to be matched (const char*) and the compiled pattern buffer. This function returns zero for a successful match or REG_NOMATCH for failure.


Further...

Compiling the regex

  • Here a pattern buffer is initialised according to the input regular experession strings and the type of regular expression required:
 1) basic/extended (REG_EXTENDED),
 2) whether 'match any character' should match a newline (REG_NEWLINE),
 3) case insensitive (REG_ICASE) and
 4) subexpression matching.

1) In basic regular expressions the metacharacters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions \?, \+, \{, \|, \(, and \). (Verbatim from the man page).

2) If the flag REG_NEWLINE is used when compiling a regex with regcomp(), match-any-character operators don’t match a newline. For e.g. a negated character class not containing a newline does not match a beginning-of-line ^.

       For context:
       [^abcd] is a regex that matches any string not containing the characters 'a','b','c' or 'd'

3) Whether HeLLo should match 'hello' 4) Subexpressions created by grouping regex elements with parantheses '()' are sometimes useful.

  For example: to match strings with the same character repeated, say, three times, we can use:
       '(.)\1{2}'

Executing the regex

TODO

Summary and Notes

  • The GNU regex library offers a slightly different approach to the problem of matching strings with regular

expressions.

  • Source code illustrating the use of this library is uploaded here.