13. Regular expression library

The experimental REGEX module implement regular expression parser and pattern matching functionality.

Currently its in very early stage and implements only very few basic regex operations.

All functions and symbols are in “regex” module, use require to get access to it.

require daslib/regex

13.1. Type aliases

CharSet = uint[8]

Character set for regex.

ReGenRandom = iterator<uint>

Regex generator random iterator.

variant MaybeReNode

Regex node or nothing.

Variants:
  • value : ReNode ? - Node.

  • nothing : void? - Nothing.

13.2. Enumerations

ReOp

Type of regular expression operation.

Values:
  • Char = 0 - Matching a character

  • Set = 1 - Matching a character set

  • Any = 2 - Matches any character

  • Eos = 3 - Matches end of string

  • Group = 4 - Matching a group

  • Plus = 5 - Repetition: one or more

  • Star = 6 - Repetition: zero or more

  • Question = 7 - Repetition: zero or one

  • Concat = 8 - First followed by second

  • Union = 9 - Either first or second

13.3. Structures

ReNode

Regular expression node.

Fields:
  • op : ReOp - Regex operation

  • id : int - Unique node identifier

  • fun2 : function<(regex: Regex ;node: ReNode ?;str:uint8?):uint8?> - Matchig function

  • gen2 : function<(node: ReNode ?;rnd: ReGenRandom ;str: StringBuilderWriter ):void> - Generator function

  • at : range - Source range

  • text : string - Text fragment

  • textLen : int - Length of text fragment

  • all : array< ReNode ?> - All child nodes

  • left : ReNode ? - Left child node

  • right : ReNode ? - Right child node

  • subexpr : ReNode ? - Subexpression node

  • next : ReNode ? - Next node in the list

  • cset : CharSet - Character set for character class matching

  • index : int - Index for character class matching

  • tail : uint8? - Tail of the string

Regex

Regular expression structure.

Fields:
  • root : ReNode ? - Root node of the regex.

  • match : uint8? - Original source text.

  • groups : array<tuple<range;string>> - Captured groups.

  • earlyOut : CharSet - Character set for early out optimization.

  • canEarlyOut : bool - Whether early out optimization is enabled.

13.4. Compilation and validation

visit_top_down(node: ReNode?; blk: block<(var n:ReNode?):void>)

Visitor for regex nodes in top-down manner.

Arguments:
is_valid(re: Regex): bool

Whether the regex is valid.

Arguments:
regex_compile(re: Regex; expr: string): bool

Precompile a regular expression.

Arguments:
  • re : Regex

  • expr : string

regex_compile(expr: string): Regex

Precompiles regular expression.

Arguments:
  • expr : string

regex_compile(re: Regex): Regex

Precompile a regular expression.

Arguments:
regex_debug(regex: Regex)

Debugs regular expression by printing its structure.

Arguments:
debug_set(cset: CharSet)

Debugs character set by printing all characters it contains.

Arguments:

13.5. Access

regex_group(regex: Regex; index: int; match: string): string

Returns the substring matched by the specified regex group.

Arguments:
  • regex : Regex

  • index : int

  • match : string

regex_foreach(regex: Regex; str: string; blk: block<(at:range):bool>)

Iterate over all matches of a regex in a string.

Arguments:
  • regex : Regex

  • str : string

  • blk : block<(at:range):bool>

13.6. Match & replace

regex_match(regex: Regex; str: string; offset: int = 0): int

Matches a regular expression against a string and returns the position of the match.

Arguments:
  • regex : Regex

  • str : string

  • offset : int

regex_replace(regex: Regex; str: string; blk: block<(at:string):string>): string

Replaces substrings matched by the regex with the result of the provided block.

Arguments:
  • regex : Regex

  • str : string

  • blk : block<(at:string):string>

13.7. Generation

re_gen_get_rep_limit(): uint

Limit of repetitions for regex quantifiers.

re_gen(re: Regex; rnd: ReGenRandom): string

Generate a random string matching the regex.

Arguments: