In response to a feature request from some of the guys here, my current work on Deelang focusses on implementing proper equality and comparison operators as part of the language. The current release has no support for operators beyond basic arithmetic – equality and comparison have traditionally been implemented as methods hacked on top of the standard library. This results in code such as:
1.eql(2) => true 5.lt(10) => true 10.gt(20) => false
Leaving aside for a moment that the above isn’t actually that bad (in my opinion at least), implementing this as operators is actually quite simple for both the dex and deevm compilers. However, while putting it together I ran up against an old elephant that’s been sitting quietly in the corner for some time now – the current parser is a mess, and it might just be time to rewrite the grammar.
The problem is not new, but I’ve ignored it for a while. Basically, it boils down to the fact that the current parser cannot handle code such as:
(1+2).foo()
Not only does this not work, but it fails miserably with nothing more specific than a MismatchedSetException
at the terminator, after a lot of backtracking and ultimately ignoring the method call completely. The above code parses to the following tree:
Notice all the abandoned trees (in red), before the (erroneous) final parse, and the MismatchSetException up there on the right. The method call gets parsed at one point, but that tree was then abandoned in favour of a tree in which the call is quietly ignored. This is clearly one confused parser, and all over something as simple as
(1+2).foo()
! Clearly, this needs fixing.
As I say, I’ve ignored this for a while. It should be relatively simple to fix (and indeed it is) with a bit of rejigging in the grammar. However, this problem is actually symptomatic of something deeper – namely, that the Deelang grammar is a mess. In the past, as problems such as this have cropped up, they’ve been fixed by adding to the grammar. New productions, imaginary tokens, and syntactic predicates have all been added to cope with a specific case, with no real wider plan.As long as the tests still passed at the end, the additions stayed.
The result of all this is that things that should be handled in a unified way are actually handled in a variety of ways. My personal favourite example of this is the way chained method calls are handled – I won’t illustrate it here, but if you’re interested just debug something like “foo().bar().baz().quux()” in ANTLRWorks. Trust me, it’s not pretty. It’s inefficient, it’s inconsistent with other parts of the grammar, and it requires the compiler to jump through some pretty awkward hoops to keep track of who is using what target register. It worked well when the only target was the (stack-based) Dee vm, but as requirements have grown it’s become cumbersome – the only reason it still works this way is inertia.
With all this in mind, I’ve decided that now is the ideal time to rewrite the grammar to get rid of these issues. I’ve never been a big believer in planning to throw one away, but in this case it looks like I will, anyhow. To be fair, I’m not planning a complete rewrite – large parts of the grammar are fine as they are (literals, for example). But the meat of it – from atoms through function calls to method calls – will be rewritten in a way that’s more consistent, cleaner, and hopefully requires a lot less backtracking. I’m also aiming to reduce the lookahead where I can, although some of the actual language design makes this quite difficult.
Unfortunately this grand plan must remain just that for now – I’m very short on time to actually work on this at the moment. Since it’s not an actual key requirement I can’t allocate any actual work time to it (even though it will make things easier and save time down the line). So provisionally, I’ve set aside Sunday for the rewrite.
Now I just need to hope nothing more pressing crops up between now and then. Wish me luck!