Multiple Co-operating Grammars in Raku
Recently I’ve been using Raku’s grammars to parse a custom language for a project I’m dealing with at work. Grammars are a very powerful tool for quickly iterating on parsers and exploring how a language might work.
In the process of this experimentation I bumped into a corner that doesn’t seem to be documented anywhere: calling grammars from each other.
This might sound like a very strange thing to want to do, but in fact embedded languages crop up with surprising regularity in computing. As an obvious example, imagine you’re trying to parse a web page. Rather than trying to describe all the content in the page in a single grammar, it makes sense to have a grammar for HTML, another for CSS, a third for JS, and so on. This separation becomes essential if you also need to deal with some of the different templating languages that are used to generate HTML.
My use-case is similar; I have an input file that uses two different languages. One (the main language) is block-based and looks broadly like this:
thing name {
"Some text"
foo {
}
bar {
}
}
Embedded at certain points in this primary language is another language that is, essentially, Lisp.
(bind some-symbol "value")
I want to have a separate Raku grammar for each, and call the secondary grammar from the primary one when required. It turns out this is very possible, and pretty straight-forward once you know the correct incantation.
There are two possible approaches, depending whether or not you care about sharing an action object.
Shared actions
The simplest way is to call a fully-qualified rule in the secondary grammar from a rule in the primary grammar.
grammar SecondaryGrammar {
rule TOP {
| ...
| <bar>
}
rule bar {
...
}
}
grammar PrimaryGrammar {
rule TOP {
| ...
| <foo>
}
rule foo {
# Since my secondary langauge is Lisp I'm using a look-ahead to
# recognise an opening paren and trigger the change into the secondary
# grammar without consuming the '('
<?before '('>
<SecondaryGrammar::TOP>
# or <SecondaryGrammar::bar>, etc.
}
}
This works fine, but if you are using an action object with the primary grammar then rules in the secondary grammar will trigger method calls in the same action object. This could be what you want, or it could lead to name collisions.
If you aren’t using action objects or want to share one between the grammars, this is the easiest approach.
Separate actions
If you want your two grammars to use different action objects then life becomes a little more complicated. The secondary grammar can stay the same as above, but the way you call into it needs to change:
unit grammar PrimaryGrammar;
rule foo {
# We need somewhere to hold the Match returned by the secondary grammar
:my $inner;
# Since my secondary langauge is Lisp I'm using a look-ahead to
# recognise an opening paren and trigger the change into the secondary
# grammar without consuming the '('
<?before '('>
# Now we know there is a chunk of the secondary language coming we can
# parse it by passing the original source into the secondary grammar
# and telling it to start at the current position ($/.to)
# We can pass in whatever action object we like in the usual way, and
# do whatever we need to with the result.
{
$inner = SecondaryGrammar.subparse(
$/.orig,
:pos($/.to),
:actions(SecondaryActions.new)
);
}
# Ensure that this rule fails if the inner match does
<?{ ?$inner }>
# Unlike the first approach, running a separate subparse means that our
# current parse doesn't see the characters matched by the secondary grammar
# as being consumed, and will try to match them again.
# This matches and discards the number of characters matched by the
# secondary grammar.
.**{$inner.to - $/.pos}
}
So there we have it! Being able to define multiple grammars that are aware of each other and able to co-operate makes parsing complex combinations of embedded languages far cleaner.
Finally, many thanks to moritz on the #raku IRC channel for pointing me in the right direction with this.