A recursive descent parser in Forth

栏目: IT技术 · 发布时间: 6年前

内容简介：I was reading through Schorre’sSo it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out ther

I was reading through Schorre’s paper on Metta II again (see also the Wikipedia article ). For those that don’t know, Meta II is a metacompiler, which is to say, a language that can compile itself. It can’t, of course, be turtles all the way down, otherwise you wouldn’t be able to bootstrap it.

So it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out there can’t permit this to go unchallenged given its grammar functionality.

A criticism of Meta II is that a fair bit of work has to go into writing a virtual machine. It’s not an excessive amount of work, though.

One weakness of Meta II, I think, is that you can’t perform arbitrary actions, only transform input to output. To give an example, Meta II might produce an instruction like:

TST ' .OUT'

which tests the input string against the given string (in this ‘ .OUT’). This gives us a problem: we cannot, for example, create assembly code output from Meta II, as assemblers don’t allow for the inlining of strings. They need to be collected and put in a separate area. Meta II has no mechanism that allows you to do that.

Raku grammars allow you to perform actions as it parses, but often they aren’t what you want. Take, for example, a rule that might look something like

rule my-if { 
    'IF' <expr> 'THEN' <statements> 'END' 
    { do-something } 
}

The problem is that <expr> and <statements> are processed before do-something, which is not what you want. You want to be able to “mix in” processing at different levels of parsing.

The solution in Raku’s case is either to bolt on an action class to the parsed tree, or parse the resultant tree manually yourself. The former approach is the standard one, but I think that the latter approach might be easier in a lot of circumstances.

It got me to thinking as to how Forth, a David compared to Raku’s Goliath, might compete in this arena.

As a small diversion, I tried to see if I could put together a recursive descent parser for arithmetic involving just numbers, parentheses, addition and multiplication. Here’s what I whipped up using gforth:

create yytext 80 allot
variable yylen
: .yytext yytext yylen @ type ; \ print yytext
: yylex parse-name yylen ! yytext yylen @ move ; \ scan next token
: yy0 yytext c@ ;
: IS? yy0 = ; \ ( c -- t ) is first char of lexeme c?
: NUM? yy0 '0 >= yy0 '9 <= and ;

\ NOW WE DEFINE OUR ACTUAL RULES

defer ex1

: BRA yylex ex1 yylex ;
: EX3 num? if ." LDA " .yytext cr yylex else BRA then ;
: EX2 EX3 begin '* is? while yylex EX3 ." MUL" cr repeat ;
: EX1' EX2 begin '+ is? while yylex EX2 ." ADD" cr repeat ;

' ex1' IS ex1

: RUN refill drop yylex ex1 cr .s bye ;
run
1 + ( 5 + ( 2 + 3 * 4 ) * 6 )

Running it produces the following code:

LDA 1
LDA 5
LDA 2
LDA 3
LDA 4
MUL
ADD
LDA 6
MUL
ADD
ADD

Now, there are severe limitations to the approach, and there’s a little bit of “cheating” going on, with some simplifying assumptions, but I was gratified with just how little code I needed to produce the desired result.

For comparison purposes, the equivalent rules in Metta II are:

EX3 = .NUMBER .OUT('LD ' *) / '(' EX1 ')' .,EX2 = EX3 $ ('*' EX3 .OUT('MUL')) .,EX1 = EX2 $ ('+' EX2 .OUT('ADD')) .,

Part of the cheating I adopted involves using Forth’s own parser. How could this be improved?

There are a few approaches I can think of.

One might be to implement the idealised assembly language exactly as it is stated by Schorre. The main problem as I see it is with the problem of branching instructions. It’s difficult to visualise the nesting structure of the outputted branches, but if the forward branches are strictly nested, then we should be able to cope .

A second approach might be that instead of writing a Forth that parses the idealised code, make Meta II output valid Forth code. We then don’t have to write anything. We just run the code.

A third might be to write a “messy Meta II” parser, in the style of my Forth solution above, which takes in a clean Metta II description and produces a messy one.

A fourth solution might be to write the grammars in a more Forth-friendly style, with something like:

: EX1 EX2 $ ( '+'EX2 => ."ADD" ) ;

You’d then define $ as an immediate word that scans the next token. ‘(‘ would also be an immediate word that does its own scanning up to and include the ‘)’. I’m sure there’d be many details to work out.

The coolness of all this is that it would then be “Forth all the way down” (well, maybe a couple of levels deep, but you wouldn’t need to implement an assembler). And you’d be able to perform any arbitrary actions you liked.

Interesting, no? I have to say, there’s something just way intriguing about Forth itself and how it can effectively reconfigure itself. Perhaps one can achieve similar expressiveness with Raku’s “slang” capabilities?

Maybe one could write a Forth as a Raku slang, and use that slang to create other slangs? Come on Rakuers, you know you want to :wink:

For more mind-blowing stuff, check out the page on creating a BNF parser in Forth. I haven’t studied it yet, but it looks interesting. I am constantly amazed as to how experienced Forthers manage to achieve so much with so little.

Your Forth program isn’t finished until it looks like a Haiku. :wink:

Postscript: Greetings Bulgaria! Looking at the analytics for this blog, I notice that an overwhelming number of readers today are from Bulgaria. I have no idea why they should find my blog interesting all of a sudden, but welcome anyway.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

A recursive descent parser in Forth

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Linux/Unix设计思想

甘卡兹 / 漆犇 / 人民邮电出版社 / 2012-3-28 / 39.00元

《Linux\Unix设计思想/图灵程序设计丛书》内容简介：将Linux的开发方式与Unix的原理有效地结合起来，总结出Linux与Unix软件开发中的设计原则。《Linux\Unix设计思想/图灵程序设计丛书》前8章分别介绍了Linux与Unix中9条基本的哲学准则和10条次要准则。第9章和第10章将Unix系统的设计思想与其他系统的设计思想进行了对比。最后介绍了Unix哲学准则在其他领域中的应......一起来看看《Linux/Unix设计思想》这本书的介绍吧!

码农工具

A recursive descent parser in Forth

Linux/Unix设计思想

Base64 编码/解码

SHA 加密

RGB CMYK 转换工具