内容简介:The question why functions cannot be added to compact regions often comes up in GHC’s IRC channel, and because I don’t work on or with compact regions, every time I see the question I try to remember the reason for why functions can’t be moved to a compact
The question why functions cannot be added to compact regions often comes up in GHC’s IRC channel, and because I don’t work on or with compact regions, every time I see the question I try to remember the reason for why functions can’t be moved to a compact region, often coming up with incorrect answers on the way. There’s also a question about this in GHC’s issue tracker.
The problem is documented briefly in the GHC source code , but in the following, I want to explain things in more detail, and in my own words.
At the core of the problem are top-level tunks, which are called CAFs (constant applicative forms) in GHC source code.Here’s an example:
x :: [Int] x = fromEnumTo 1 1000000
When evaluated x
allocates a cons cell on the heap and becomes a pointer (an “indirection”) to it.
Because CAFs are top-level closures, you might expect them to be alive during the lifetime of a program, but that’s not ideal because they sometimes allocate large amounts. In the example above, when we fully evaluate it we’ll have a million Int
s and a million cons cells ( []
is a static closure so it’s not allocated on the heap). A cons cell is 3 words, an Int
is two words, so that’s 2M heap objects (which will have to be traversed by the GC in every major GC) and 40M of heap space.
So instead GHC tracks CAFs like any other heap-allocated object and reclaims the space when a CAF is no longer reachable from the program’s root set.
In the rest of this post, I will discuss how CAFs relate to compact regions, and in particular what this means for the possible inclusion of functions in compact regions.
CAFs and compact regions
From GC perspective a compact region is a single object, and the GC does not traverse objects within compact regions. If any object inside a compact region is reachable then the whole region is retained. This means that I can’t have a pointer from a compact region to outside if that pointer needs to tracked by the GC. So CAF references from compact regions are not allowed as they need to be tracked by the GC.
For constructors or when copying a top-level CAF directly this is still not an issue. Here’s an example where we move a CAF and a constructor that refers to a CAF to compact regions:
module Main where import GHC.Compact -- A CAF x :: [Int] x = fromEnumTo 1 1000000 data D = D [Int] main = do -- Adding a CAF to a compact region _ <- compact x -- Adding a constructor that refers to a CAF to a compact region _ <- compact (D x) return ()
This is fine because when copying a thunk (in this case, the CAF x
) we evaluate it and copy the value instead. So in compact x
above what’s copied is the fully evaluated value of x
. Similarly in compact (D x)
we copy the constructor, and for the first field we first evaluate the thunk and copy the value.
This process of evaluating thunks when copying effectively eliminates CAF references from objects when copying them to a compact region.
Why can we not do the same for functions? Because unlike constructors, functions refer to CAFs in their code , instead of payload .Here’s an example:
module Main where import GHC.Compact x :: [Int] x = fromEnumTo 1 1000000 f :: () -> Int f () = sum x main = do _ <- compact f -- This fails return ()
Here f
is a function with a CAF reference. Here’s its code in the final assembly:
.section .text ... .globl Main.f_info .type Main.f_info, @function Main.f_info: _c2vs: leaq -16(%rbp),%rax cmpq %r15,%rax jb _c2vt _c2vu: movq $block_c2v5_info,-8(%rbp) movq %r14,%rbx addq $-8,%rbp testb $7,%bl jne _c2v5 _c2v6: jmp *(%rbx) .align 8 .quad 0 .long 30 .long Main.x_closure-(block_c2v5_info)+0 block_c2v5_info: _c2v5: movl $stg_INTLIKE_closure+257,%eax movl $Main.x_closure,%ecx ...
Note the references $Main.x_closure
. If we wanted to copy this function to a compact region we’d have to update the code to replace these references to x
’s value in the compact region, which is quite hard to do.
To avoid dealing with this we simply don’t allow copying functions to compact regions.
What about functions with no CAF references?
Functions with no CAF references don’t have tracked references in their code so it’s fine to copy them to a compact region. I recently implemented a proof-of-concept here . Here’s an example from GHC’s test suite that would fail before, but passes with my patch:
module Main where import GHC.Compact data HiddenFunction = HiddenFunction (Int -> Int) main = do _ <- compact (HiddenFunction (+1)) return ()
While allowing functions with no CAF references works fine, it’s not too useful in practice. Before explaining why, here’s a definition:
A closure is CAFFY if it directly or transitively refers to a CAF.
CAFFY-ness is the main property we’re interested in. For example, we could have a function that doesn’t refer to a CAF directly, but if one of the functions it calls refers to a CAF then the function is CAFFY, and CAFFY functions can’t be copied to a compact region.
With this definition in mind, there are two problems with allowing non-CAFFY functions in compact regions:
- Most non-trivial functions are CAFFY, hence allowing non-CAFFY functions is not too useful
- CAFFY-ness is hard to control
As an evidence for the first point, here’s a simple function:
f :: IO () f = putStrLn "hi"
This trivial function is CAFFY, for the reasons related to code generation for string literals in GHC. It’s not hard to guess that if a function this simple is CAFFY then a lot of other function in practice will be CAFFY as well.
For the second point I’ll again just give an example:
f :: () -> Int f () = sum x where x :: [Int] x = fromEnumTo 1 1000000
Is this function CAFFY? The answer is complicated:
-
It’s CAFFY with
-O0
because-O0
implies-fignore-interface-pragmas
, which means imported values (theFoldable
dictionary in our example) are considered CAFFY. -
With
-O0 -fno-ignore-interface-pragmas
it’s not CAFFY. -
With
-O
it’s CAFFY again, because GHC generates this STG:Main.f1 :: GHC.Types.Int [GblId] = {} \u [] case Main.$wgo 1# 0# of ww_s2yX [Occ=Once] { __DEFAULT -> GHC.Types.I# [ww_s2yX]; }; Main.f :: () -> GHC.Types.Int [GblId, Arity=1, Str=<S,1*H>, Unf=OtherCon []] = {} \r [ds_s2yY] case ds_s2yY of { () -> Main.f1; };
The problem is
f
now refers tof1
, which is a CAF, sof
is now CAFFY.
In short, CAFFY-ness depends on things that are not in programmer’s control, like the CAFFY-ness of called functions, or how GHC’s simplifier will behave, which depends on many things, like optimization levels and code generation flags. Pretty much every GHC version will generate slightly different code, causing changes in CAFFY-ness. You’ll also get different CAFFY-ness properties in release builds compared to debug and test builds, because those are built with different GHC parameters.
So even if we allowed non-CAFFY functions in compact regions, it’d be a bad practice to rely on this.
Here’s another example, try to guess whether this is a CAF or not: (the language pragma should give a hint)
{-# LANGUAGE NoMonomorphismRestriction #-} x = fromEnumTo 1 1000000
Conclusion
While it’s possible to allow non-CAFFY functions in compact regions, I think it would be a bad practice to rely on this behavior, as it’s very difficult (even impossible in some cases) to predict and maintain CAFFY-ness.
There are ways to allow CAFFY functions in compact regions, like
-
Capturing global references from a function in the function’s payload and referring to the payload instead of the global value directly. For example, in the
f
above, instead of referring toMain.x_closure
directly, we could storeMain.x_closure
in the function’s payload, and refer to that instead. That way we could copy a function to a compact region similar to how we copy constructors.The problem is this is much less efficient than referring to the closure directly, and this is a cost that every function would have to pay, not just the ones that we want to add to a compact region.
We could think about a pragma, say
{-# COMPACTABLE f #-}
, to generate “compactable” code forf
wheref
’s top-level references would be implemented as I explained above. I think this is workable, but it’s still a lot of implementation effort. Also, any function thatf
directly or transitively refers to will have to beCOMPACTABLE
or non-CAFFY for this to work. -
Compact regions could have dynamic SRTs where CAFFY references of objects would be added as they’re moved to a compact region. The GC would then track SRTs of compact regions.
There may be others ways to make this work as well. The problem is certainly not unsolvable, but it requires significant time investment to get done, and so far there hasn’t been enough incentive for this.
Finally, there are techniques like defunctionalization that makes it possible to express the same program using ADTs instead of functions, so not being able to add functions to compact regions is usually not a roadblock.
以上所述就是小编给大家介绍的《The problem with adding functions to compact regions》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!
猜你喜欢:本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
数据化运营速成手册
胡晨川 / 电子工业出版社 / 2017-4 / 55
《数据化运营速成手册》用于提升互联网公司员工的数据应用能力,即数据化运营能力。首先,从最常用的数据图表切入,帮助执行层正确地绘图,管理层正确地看图;接着,梳理运营中最基本的数据应用知识,涉及数据获取、数据清洗、数据认知、分析框架、指标体系、运营实验等内容。然后,介绍作者认为必要的统计学知识,包括假设检验、方差分析、回归分析和时间序列分解,并引入了管理科学中的规划求解方法。最后,介绍了数据分析工具的......一起来看看 《数据化运营速成手册》 这本书的介绍吧!
图片转BASE64编码
在线图片转Base64编码工具
MD5 加密
MD5 加密工具