A recursive descent parser in Forth

栏目: IT技术 · 发布时间: 4年前

内容简介:I was reading through Schorre’sSo it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out ther

I was reading through Schorre’s paper on Metta II again (see also the Wikipedia article ).  For those that don’t know, Meta II is a metacompiler, which is to say, a language that can compile itself. It can’t, of course, be turtles all the way down, otherwise you wouldn’t be able to bootstrap it.

So it compiles down to an idealised assembly language whose instructions have to be implemented in some other language. I know that there is at least one C and lua implementation, for example. No doubt the Raku folks out there can’t permit this to go unchallenged given its grammar functionality.

A criticism of Meta II is that a fair bit of work has to go into writing a virtual machine. It’s not an excessive amount of work, though.

One weakness of Meta II, I think, is that you can’t perform arbitrary actions, only transform input to output. To give an example, Meta II might produce an instruction like:

TST ' .OUT'

which tests the input string against the given string (in this ‘ .OUT’). This gives us a problem: we cannot, for example, create assembly code output from Meta II, as assemblers don’t allow for the inlining of strings. They need to be collected and put in a separate area. Meta II has no mechanism that allows you to do that.

Raku grammars allow you to perform actions as it parses, but often they aren’t what you want. Take, for example, a rule that might look something like

rule my-if { 
    'IF' <expr> 'THEN' <statements> 'END' 
    { do-something } 
}

The problem is that <expr> and <statements> are processed before do-something, which is not what you want. You want to be able to “mix in” processing at different levels of parsing.

The solution in Raku’s case is either to bolt on an action class to the parsed tree, or parse the resultant tree manually yourself. The former approach is the standard one, but I think that the latter approach might be easier in a lot of circumstances.

It got me to thinking as to how Forth, a David compared to Raku’s Goliath, might compete in this arena.

As a small diversion, I tried to see if I could put together a recursive descent parser for arithmetic involving just numbers, parentheses, addition and multiplication. Here’s what I whipped up using gforth:

create yytext 80 allot
variable yylen
: .yytext yytext yylen @ type ; \ print yytext
: yylex parse-name yylen ! yytext yylen @ move ; \ scan next token
: yy0 yytext c@ ;
: IS? yy0 = ; \ ( c -- t ) is first char of lexeme c?
: NUM? yy0 '0 >= yy0 '9 <= and ;

\ NOW WE DEFINE OUR ACTUAL RULES

defer ex1

: BRA yylex ex1 yylex ;
: EX3 num? if ." LDA " .yytext cr yylex else BRA then ;
: EX2 EX3 begin '* is? while yylex EX3 ." MUL" cr repeat ;
: EX1' EX2 begin '+ is? while yylex EX2 ." ADD" cr repeat ;

' ex1' IS ex1

: RUN refill drop yylex ex1 cr .s bye ;
run
1 + ( 5 + ( 2 + 3 * 4 ) * 6 )

Running it produces the following code:

LDA 1
LDA 5
LDA 2
LDA 3
LDA 4
MUL
ADD
LDA 6
MUL
ADD
ADD

Now, there are severe limitations to the approach, and there’s a little bit of “cheating” going on, with some simplifying assumptions, but I was gratified with just how little code I needed to produce the desired result.

For comparison purposes, the equivalent rules in Metta II are:

EX3 = .NUMBER .OUT('LD ' *) / '(' EX1 ')' .,EX2 = EX3 $ ('*' EX3 .OUT('MUL')) .,EX1 = EX2 $ ('+' EX2 .OUT('ADD')) .,

Part of the cheating I adopted involves using Forth’s own parser. How could this be improved?

There are a few approaches I can think of.

One might be to implement the idealised assembly language exactly as it is stated by Schorre. The main problem as I see it is with the problem of branching instructions. It’s difficult to visualise the nesting structure of the outputted branches, but if the forward branches are strictly nested, then we should be able to cope  .

A second approach might be that instead of writing a Forth that parses the idealised code, make Meta II output valid Forth code.  We then don’t have to write anything. We just run the code.

A third might be to write a “messy Meta II” parser, in the style of my Forth solution above, which takes in a clean Metta II description and produces a messy one.

A fourth solution might be to write the grammars in a more Forth-friendly style, with something like:

: EX1 EX2 $ ( '+'EX2 => ."ADD" ) ;

You’d then define $ as an immediate word that scans the next token. ‘(‘ would also be an immediate word that does its own scanning up to and include the ‘)’. I’m sure there’d be many details to work out.

The coolness of all this is that it would then be “Forth all the way down” (well, maybe a couple of levels deep, but you wouldn’t need to implement an assembler). And you’d be able to perform any arbitrary actions you liked.

Interesting, no? I have to say, there’s something just way intriguing about Forth itself and how it can effectively reconfigure itself. Perhaps one can achieve similar expressiveness with Raku’s “slang” capabilities?

Maybe one could write a Forth as a Raku slang, and use that slang to create other slangs? Come on Rakuers, you know you want to :wink:

For more mind-blowing stuff, check out the page on creating a BNF parser in Forth. I haven’t studied it yet, but it looks interesting. I am constantly amazed as to how experienced Forthers manage to achieve so much with so little.

Your Forth program isn’t finished until it looks like a Haiku. :wink:

Postscript: Greetings Bulgaria! Looking at the analytics for this blog, I notice that an overwhelming number of readers today are from Bulgaria. I have no idea why they should find my blog interesting all of a sudden, but welcome anyway.


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

中国创投简史

中国创投简史

投资界网站 / 人民邮电出版社 / 2017-1-1 / 55

《中国创投简史》系统梳理了自20世纪80年代开始的中国创投产业发展历程,回顾了各个时代中的代表人物、知名投资机构以及他们所创下的一个个投资奇迹。从熊晓鸽、徐新、沈南鹏等风险投资人的成长经历中,从搜狐、腾讯、百度、小米等一代代科技企业巨头的诞生与演变过程中,我们可以看到风险投资的力量、创业者的企业家精神以及科技创造伟大财富的神奇过程。 对于风险投资和私募股权行业的从业者以及有融资需求的创业者来......一起来看看 《中国创投简史》 这本书的介绍吧!

HTML 压缩/解压工具
HTML 压缩/解压工具

在线压缩/解压 HTML 代码

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试