Mon 25 May 2020
Filed underPosts
Tagsidris
I've just released a new version of Idris 2 , which is a significant release in that it's the first one which is able to compile itself. Thus, it's the first one where we can really see how the performance of Idris 2 improves on the performance of Idris 1.
A question people sometimes ask when discussing Idris on various parts of the Internet is "Why is Idris 2 so much faster than Idris 1?" Sometimes it involves the phrase "the author claims that..." so here are some numbers, as measured today on my Dell XPS 13:
- Idris 1 building Idris 2 (bootstrap version): 743s
- Idris 2 building Idris 2 (with the Chez Scheme runtime): 76s
This is for type checking and generating/optimising intermediate code only, and doesn't include generating and compiling the C/Chez Scheme output. The bootstrap version is a slightly earlier version, and a little bit smaller, but they are similar enough to be able to make a useful comparison, and certainly give you a feel of how much of an improvement Idris 2 is. That's nearly an order of magnitude!
So, why the difference? Idris 1 is implemented in Haskell, but that has little (if anything) to do with the difference. Rather, it's a combination of:
- my improved understanding of what it takes to implement a dependently typed language (or, indeed, any language)
- insights gained from other people's experiments, most notably Andras Kovacs' work on smalltt
- Lots of profiling, via gprof while bootstrapping, and the Chez Scheme profiling tools .
A lot of the improvements are down to avoiding make bad decisions in the first place! This is easy to say in hindsight, but Idris 1 was originally a vehicle for experimenting with some ideas in dependently typed language design that were hard to do just by modifying other systems and so I was more interested in having a system sooner, rather than having a better system. Unfortunately, some of those early decisions proved hard to undo as the system grew and as it gained users.
The rest of this post summarises some of the most significant internal differences.
The global context
The global context stores all of the information that might be needed for typechecking: function and data types and definitions, among other things. We need function definitions as well as types, even for imported modules, since we may need to do evaluation at the type level during type checking.
The big difference between Idris 1 and Idris 2 is that Idris 2 represents the context as an array, with constant time access and update. Yes, it's mutable! We recognise that there's going to be a lot of updates: mainly solving unification problems as type checking proceeds, but also during interface resolution. And, we only ever have one context (ignoring ambiguity resolution, but I won't go into that here). So, having a single mutable context with constant time access has been valuable. Yes, it's a bit ugly to a pure functional programmer, but I'm not going to let that get in the way of performance! At least the types are precise about where updates might happen.
This turns out to have another benefit, which is that on loading an imported module, we don't need to add definitions to the context until they are actually referenced. Most names from imports are never referenced - at least not all in the same file - so Idris 2 lazily decodes definitions and mutates the context when they are needed. In fact profiling Idris 1 suggests that it spends a huge amount of time unnecessarily loading definitions.
Unification
Unification is an important part of type checking a dependently language language, to resolve implicit arguments. For example, if you have a function
append : Vect n a -> Vect m a -> Vect (n + m) a
and an application
append [1,2,3] [4,5,6,7]
...then the typechecker needs to infer that n = 3 , m = 4 and a = Integer . Idris 2 implements dynamic pattern unification . In Idris 1, metavariables (that is, the implicits that need to be solved) are stored locally to a term, so solving them requires searching through the term and substituting the definition, which can be expensive. Idris 2, on the other hand, stores metavariables globally, in the mutable global context, following roughly the scheme described in Ulf Norell's thesis .
Furthermore, we don't substitute the solution of a metavariable in a term unless we really really need to (in practice, that's only while checking linear binders are used appropriately, so fairly rare). This preserves sharing as far as possible, which can be important in terms with dependent types. For example (with apologies for using a vector again):
[1,2,3]
elaborates into
((::) {a=Integer} {n=S (S Z))} 1 ((::) {a = Integer} {n=S Z} 2 ((::3) {a = Integer} {n=Z} NIl)))
That Z appears three times, the subexpression S Z appears twice, and this gets worse as the vector grows. So sharing is important! This is the main insight gained from Andras Kovacs' work on smalltt .
Compile-time evaluation
Type checking a dependently typed language involves evaluating expressions, possibly in the presence of unknown variables, to check whether two terms are equal or to infer values for metavariables. For example, we might need to solve for ?meta in the following:
plus (S (S x)) (S (S y)) = S ?meta
Idris 1 does this by evaluating the terms, leading to the equation
S (S (plus x (S (S y)))
...then deciding ?meta = S (plus (x (S (S y)))) . This is fine for small terms, but what if the expression is
plus 100 100 = S ?meta
...or something even larger? We don't need to evaluate fully to normal form to establish that ?meta = plus 99 100 . Idris 2 therefore reduces to head normal form during type checking, which is probably should have done all along! Cons should not evaluate its arguments .
Normally, you won't notice this. But, sometimes, in Idris 1, compile time evaluation really hurts performance, and a dependently typed language needs to recognise that compile time evaluation is going to happen a lot!
Chez Scheme runtime
Idris 2 benefits from a robust, well-engineering and optimised run time system, by compiling to Chez Scheme. One might expect that Chez Scheme would be slower than the C code generated by Idris 1, since it is dynamically typed, but this ignores two important points:
- It is possible to turn off the dynamic type checks in Chez Scheme. So, the run time never checks if a constructor argument is within bounds of the vector where we store it, and never checks whether a primitive is the right type.
- Even more importantly, there has never been significant effort put into the Idris 1 run time. Generating good C code corresponding to a functional program which might have lots of small (and higher order) definitions is extremely challenging, so it is better to take advantage of someone else's work here.
Abandoning tactic-based elaboration
Idris 1 used a tactic-based elaborator, much as you'd see in Coq proofs . This made implementing new high level features relatively easy, using an elaboration monad which was, in a sense, an embedded domain specific language for writing new tactics. This is usable directly by elaborator reflection meaning that users could try extending Idris in Idris itself. While this is a fun way to work on the language, it took a lot of effort to get it to perform even as well as it did (which, as you see above, is not that well). Idris 2 takes a much simpler approach, elaborating syntax directly into the core representation.
Non-reasons
One performance problem in Idris 1 was due to space leaks in the Haskell caused by using lazy data structures where they weren't needed. Despite our best efforts, we were never able to resolve these fully. No doubt it would have been better to consider where strict data structures would have been better. But, we didn't. This isn't to say that Idris has proved a better implementation choice because it's strict, just to say that we needed to put more thought, earlier, into where we needed strict data in Haskell.
以上所述就是小编给大家介绍的《Why is Idris 2 so much faster than Idris 1?》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
Lighttpd源码分析
高群凯 / 机械工业出版社 / 2010-3 / 59.00元
本书主要针对lighttpd源码进行了深度剖析。主要内容包括:lighttpd介绍与分析准备工作、lighttpd网络服务主模型、lighttpd数据结构、伸展树、日志系统、文件状态缓存器、配置信息加载、i/o多路复用技术模型、插件链、网络请求服务响应流程、请求响应数据快速传输方式,以及基本插件模块。本书针对的lighttpd项目版本为稳定版本1.4.20。 本书适合使用lighttpd的人......一起来看看 《Lighttpd源码分析》 这本书的介绍吧!
RGB转16进制工具
RGB HEX 互转工具
URL 编码/解码
URL 编码/解码