A recursive descent parser in Forth

栏目: IT技术 · 发布时间: 4年前

内容简介:I was reading through Schorre’sSo it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out ther

I was reading through Schorre’s paper on Metta II again (see also the Wikipedia article ).  For those that don’t know, Meta II is a metacompiler, which is to say, a language that can compile itself. It can’t, of course, be turtles all the way down, otherwise you wouldn’t be able to bootstrap it.

So it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out there can’t permit this to go unchallenged given its grammar functionality.

A criticism of Meta II is that a fair bit of work has to go into writing a virtual machine. It’s not an excessive amount of work, though.

One weakness of Meta II, I think, is that you can’t perform arbitrary actions, only transform input to output. To give an example, Meta II might produce an instruction like:

TST ' .OUT'

which tests the input string against the given string (in this ‘ .OUT’). This gives us a problem: we cannot, for example, create assembly code output from Meta II, as assemblers don’t allow for the inlining of strings. They need to be collected and put in a separate area. Meta II has no mechanism that allows you to do that.

Raku grammars allow you to perform actions as it parses, but often they aren’t what you want. Take, for example, a rule that might look something like

rule my-if { 
    'IF' <expr> 'THEN' <statements> 'END' 
    { do-something } 
}

The problem is that <expr> and <statements> are processed before do-something, which is not what you want. You want to be able to “mix in” processing at different levels of parsing.

The solution in Raku’s case is either to bolt on an action class to the parsed tree, or parse the resultant tree manually yourself. The former approach is the standard one, but I think that the latter approach might be easier in a lot of circumstances.

It got me to thinking as to how Forth, a David compared to Raku’s Goliath, might compete in this arena.

As a small diversion, I tried to see if I could put together a recursive descent parser for arithmetic involving just numbers, parentheses, addition and multiplication. Here’s what I whipped up using gforth:

create yytext 80 allot
variable yylen
: .yytext yytext yylen @ type ; \ print yytext
: yylex parse-name yylen ! yytext yylen @ move ; \ scan next token
: yy0 yytext c@ ;
: IS? yy0 = ; \ ( c -- t ) is first char of lexeme c?
: NUM? yy0 '0 >= yy0 '9 <= and ;

\ NOW WE DEFINE OUR ACTUAL RULES

defer ex1

: BRA yylex ex1 yylex ;
: EX3 num? if ." LDA " .yytext cr yylex else BRA then ;
: EX2 EX3 begin '* is? while yylex EX3 ." MUL" cr repeat ;
: EX1' EX2 begin '+ is? while yylex EX2 ." ADD" cr repeat ;

' ex1' IS ex1

: RUN refill drop yylex ex1 cr .s bye ;
run
1 + ( 5 + ( 2 + 3 * 4 ) * 6 )

Running it produces the following code:

LDA 1
LDA 5
LDA 2
LDA 3
LDA 4
MUL
ADD
LDA 6
MUL
ADD
ADD

Now, there are severe limitations to the approach, and there’s a little bit of “cheating” going on, with some simplifying assumptions, but I was gratified with just how little code I needed to produce the desired result.

For comparison purposes, the equivalent rules in Metta II are:

EX3 = .NUMBER .OUT('LD ' *) / '(' EX1 ')' .,EX2 = EX3 $ ('*' EX3 .OUT('MUL')) .,EX1 = EX2 $ ('+' EX2 .OUT('ADD')) .,

Part of the cheating I adopted involves using Forth’s own parser. How could this be improved?

There are a few approaches I can think of.

One might be to implement the idealised assembly language exactly as it is stated by Schorre. The main problem as I see it is with the problem of branching instructions. It’s difficult to visualise the nesting structure of the outputted branches, but if the forward branches are strictly nested, then we should be able to cope  .

A second approach might be that instead of writing a Forth that parses the idealised code, make Meta II output valid Forth code.  We then don’t have to write anything. We just run the code.

A third might be to write a “messy Meta II” parser, in the style of my Forth solution above, which takes in a clean Metta II description and produces a messy one.

A fourth solution might be to write the grammars in a more Forth-friendly style, with something like:

: EX1 EX2 $ ( '+'EX2 => ."ADD" ) ;

You’d then define $ as an immediate word that scans the next token. ‘(‘ would also be an immediate word that does its own scanning up to and include the ‘)’. I’m sure there’d be many details to work out.

The coolness of all this is that it would then be “Forth all the way down” (well, maybe a couple of levels deep, but you wouldn’t need to implement an assembler). And you’d be able to perform any arbitrary actions you liked.

Interesting, no? I have to say, there’s something just way intriguing about Forth itself and how it can effectively reconfigure itself. Perhaps one can achieve similar expressiveness with Raku’s “slang” capabilities?

Maybe one could write a Forth as a Raku slang, and use that slang to create other slangs? Come on Rakuers, you know you want to :wink:

For more mind-blowing stuff, check out the page on creating a BNF parser in Forth. I haven’t studied it yet, but it looks interesting. I am constantly amazed as to how experienced Forthers manage to achieve so much with so little.

Your Forth program isn’t finished until it looks like a Haiku. :wink:

Postscript: Greetings Bulgaria! Looking at the analytics for this blog, I notice that an overwhelming number of readers today are from Bulgaria. I have no idea why they should find my blog interesting all of a sudden, but welcome anyway.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

技术的本质

技术的本质

布莱恩•阿瑟(Brian Arthur) / 曹东溟、王健 / 浙江人民出版社 / 2014-4-1 / 62.90

★《技术的本质》是复杂性科学奠基人、首屈一指的技术思想家、“熊彼特奖”得主布莱恩•阿瑟所创建的一套关于技术产生和进化的系统性理论,本书是打开“技术黑箱”的钥匙,它用平实的语言将技术最本质的思想娓娓道来。 ★技术,是一个异常美丽的主题,它不动声色地创造了我们的财富,成就了经济的繁荣,改变了我们存在的方式。尽管技术如此重要,却少有人在快节奏的生活中停下来深入思考技术。我们了解技术的原理,却不知道......一起来看看 《技术的本质》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

图片转BASE64编码
图片转BASE64编码

在线图片转Base64编码工具

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码