2013-04-22
String
vs. ByteString
/Text
[a]
vs. Vector
Size requirements of common data structures
(ignoring shared instances; sizes in Words and Bytes):
Type | Size (to be rounded up to full Ws) |
---|---|
(ta,tb) |
3 W + size(ta ) + size(tb ) |
[t] |
3N W + N size(t ) |
ByteString (strict) |
9 W + N B |
Text (strict) |
6 W + 2N+ B |
Map tk tv |
6N W + N (size(tk ) + size(tv )) |
Set t |
5N W + N size(t ) |
IntMap t |
(3N + 5(N-1)) W + N size(t ) |
IntSet |
(2N + 5(N-1)) W |
HashMap tk tv |
4.5N W + N (size(tk ) + size(tv )) |
HashSet t |
4.5N W + N size(t ) |
Vector t (boxed) |
(6 + N) W + N size(t ) |
HsSyn {RdrName,Name,Id}
.HsSyn Id
desugared into the much simpler typed CoreExpr
representation (i.e. GHC Core).CoreExpr
is transformed into STG represention for code generation and after that fed as "C--" into the selected code-generator backend.-ddump-…
flags.Similiar to Haskell kernel sub-language shown in the 1st lecture
Formal foundation: "System F with Type Equality Coercions"
GHC's representation of CoreExpr
:
type CoreExpr = Expr Var
data Expr b = Var Id
| Lit Literal
| App (Expr b) (Arg b)
| Lam b (Expr b)
| Let (Bind b) (Expr b)
| Case (Expr b) b Type [Alt b]
| Cast (Expr b) Coercion
| Tick (Tickish Id) (Expr b)
| Type Type
| Coercion Coercion
deriving (Data, Typeable)
For debugging purposes, there's a more convenient
"External Representation for the GHC Core Language"
documented in the GHC User's Guide.
Core (after all simpl
ifier passes) can be easily extracted via
ghc -fforce-recomp -ddump-simpl -dsuppress-uniques \
-dsuppress-coercions -dsuppress-type-applications \
-O2 -c Foo.hs
(alternatively, -dsuppress-all
removes most annotations)
Sometimes more convenient & with syntax coloring: the ghc-core
tool (install with cabal install ghc-core
):
ghc-core --no-asm -- Foo.hs
The following -dsuppress-*
options exist as of GHC 7.6.3:
all
: In core dumps, suppress everything that is suppressable (except for uniques
)uniques
: Suppress the printing of uniques in debug output (easier to use diff)idinfo
: Suppress extended information about identifiers where they are boundmodule-prefixes
: Suppress the printing of module qualification prefixessignatures
: Suppress type signaturesapplications
: Suppress type applications (i.e. all @
-terms)coercions
: Suppress the printing of coercions in Core dumps to make them shorterFun fact: -ddump-simpl
works in GHCi too!
The following simple Haskell example
{-# LANGUAGE BangPatterns #-}
mkPair a = (a,a)
mkPair' !a = (a,a) -- same as: mkPair' a = seq a (a,a)
gets compiled to Core (-ddump-simpl -dsuppress-all
):
mkPair = \ @ t_aeT a_aeH -> (a_aeH, a_aeH)
mkPair' = \ @ t_aeM a_aeI ->
case a_aeI of a1_XeL { __DEFAULT -> (a1_XeL, a1_XeL) }
@t_aeT
and @t_aeM
denote type-arguments.case
-expression has an additional binder a1_XeL
.case
-expression is pattern matched against the wildcard pattern __DEFAULT
the a_aeI
value is forced to WHNF!The case
expression in Core
case a of a1 { __DEFAULT -> (a1,a1); pat1 -> exp1; pat2 -> exp2 }
can be thought of in terms of H98 as
case a of
a1 | seq a1 False -> undefined
a1@pat1 -> exp1
a1@pat2 -> exp2
a1@_ -> (a1,a1)
__DEFAULT
comes first but is matched last.
Pattern failures are explicit in Core (and are attached to __DEFAULT
).
Let's try again with non-polymorphic version:
{-# LANGUAGE BangPatterns #-}
mkPair, mkPair' :: Int -> (Int,Int)
mkPair a = (a,a)
mkPair' !a = (a,a)
Core now reads:
mkPair = \ a_aeI -> (a_aeI, a_aeI)
mkPair' = \ a_aeJ ->
case a_aeJ of a1_XeM { I# ipv_seR -> (a1_XeM, a1_XeM) }
@
-arguments have vanished.__DEFAULT
has been replaced by I# _
pattern.The simple class & instance definition:
class Monoid a where
(<>) :: a -> a -> a
ie :: a
instance Monoid () where
() <> () = ()
ie = ()
...results in the following Core definitions:
-- created by class definition:
<> = \ @a (d :: Monoid a) -> case d of _ { D:Monoid m1 _ -> m1 }
ei = \ @a (d :: Monoid a) -> case d of _ { D:Monoid _ m2 -> m2 }
-- created by instance definition:
$fMonoid()_$c<> = \ (v1 :: ()) (v2 :: ()) ->
case v1 of _ { () -> v2 }
$fMonoid() = D:Monoid @() $fMonoid()_$c<> ()
Data constructors are not defined explicitly in Core unless strict fields occur
(in which case a constructor-wrapper is defined):
data Foo a = Foo a
data Bar a = Bar !a
data Doo = Doo { unDoo :: Int }
mkFoo = Foo
mkBar = Bar
results in
$WBar :: forall a. a -> Bar a
$WBar = \ @a (v :: a) -> case v of v2 { __DEFAULT -> Bar @a v2 }
unDoo = \ (d :: Doo) -> case d of _ { Doo d -> d }
mkBar = $WBar
mkFoo = Foo
Newtypes result in coercions in Core:
newtype Foo = Foo Int
incFoo (Foo i) = Foo (i+1)
results in
incFoo :: Foo -> Foo
incFoo = a `cast` (<Foo> -> Sym <(NTCo:Foo)>
:: (Foo -> Int) ~# (Foo -> Foo))
a :: Foo -> Int
a = \ ds -> + @ Int $fNumInt (ds `cast`
(<NTCo:Foo> :: Foo ~# Int)) (I# 1)
Use -dsuppress-coercions
to reduce clutter by cast
-terms
-O0
)Now let's try with a more complex function:
sum :: [Int] -> Int
sum (x:xs) = x + sum xs
sum [] = 0
compiled to Core w/ -O0
(and keeping type-annotations):
sum = \ (ds :: [Int]) ->
case ds of _ {
[] -> I# 0;
: x xs -> + @Int $fNumInt x (sum xs)
}
:
and +
are in prefix position.$fNumInt
is the Num
typeclass dictionary for Int
and +
is field-accessor.-O2
)When using -O2
GHC produces different Core:
sum = \ (w :: [Int]) ->
case $wsum w of ww { __DEFAULT -> I# ww }
$wsum = \ (w :: [Int]) ->
case w of _ {
[] -> 0;
: x xs -> case x of _ {
I# x1 -> case $wsum xs of ww {
__DEFAULT -> +# x1 ww
}
}
}
ww
.(+#) :: Int# -> Int# -> Int#
.Int
for each recursion step, by having $wsum
return an unboxed Int#
instead of a boxed Int
(think of the heap!).foldl
-styleThe previous definition of sum
was equivalent to foldr (+) 0
and is not tail-recursive, thus causing a stack overflow.
Let's try again with a foldl
-style accumulator pattern:
sum :: [Int] -> Int
sum = go 0
where
go s (x:xs) = go (s+x) xs
go s [] = s
...and the resulting -O0
Core:
go = \ s ds -> case ds of _ {
[] -> s;
: x xs -> go (+ $fNumInt s x) xs
}
sum = go (I# 0)
foldl
-styleCompiling with -O1
eliminates the thunk leak magically:
$wgo :: Int# -> [Int] -> Int#
$wgo = \ ww w -> case w of _ {
[] -> ww;
: x xs -> case x of _ {
I# y -> $wgo (+# ww y) xs
}
}
sum = \ w -> case $wgo 0 w of ww { __DEFAULT -> I# ww }
However, if we switch to a polymorphic sum :: Num a => [a] -> a
, the leak is back even with -O2
:
sum = \ @a ($dNum :: Num a) ->
let { lvl = + @a $dNum } in
letrec { go = \ (s :: a) (ds :: [a]) ->
case ds of _ {
[] -> s;
: x xs -> go (lvl s x) xs
};
} in go (fromInteger @a $dNum (__integer 0))
Let's look at another example:
count :: [Int] -> (Int,Int) -> (Int,Int)
count [] np = np
count (x:xs) (n,p) = f xs (if x<0 then (n+1,p) else (n,p+1))
$wcount :: [Int] -> Int -> Int -> (# Int, Int #)
$wcount = \ w n p ->
case w of _ {
[] -> (# n, p #);
: x xs -> case x of _ {
I# x1 -> case <# x1 0 of _ {
False -> $wcount xs n (case p of _ { I# x2 -> I# (+# x2 1) }) ;
True -> $wcount xs (case n of _ { I# x2 -> I# (+# x2 1) }) p
}
}
}
count = \ w np ->
case w1 of _ {
(n, p) -> case $wcount w n p of _ {
(# n', p' #) -> (n', p')
}
}
The (# Int, Int #)
in the Core output represents GHC's Unboxed Tuples.
Primary use is for functions that return multiple values.
Cannot be assigned to a variable, thus need to be deconstructed immediately.
Avoids heap allocation normally associated with normal tuples.
However, (# Int, Int #)
is not the same as (# Int#, Int# #)
.
CoreExpr
: variables, literals, applications, lambdas, let, case@
is used to denote explicit type-arguments/applications in Core.case d of b { __DEFAULT -> e }
expression is different from H98's.let(rec)
bindings perform heap allocations.case
expressions perform/force evaluations.Int#
) are implicitly strict.(# … #)
eliminate heap allocation for the tuple-constructor.cast
expressionsCan be dumped via -ddump-stg
Similiar to Core but with more explicit allocation semantics:
let
is the only place where heap allocations occur.
Useful for seeing more clearly where thunks are created.
Let's evaluate the expression
main = evaluate $ count [1..12345678] (0,0)
with +RTS -s -hT -i0.01
This generates summary GC statistics:
1,283,991,912 bytes allocated in the heap
3,062,980,520 bytes copied during GC
261,147,880 bytes maximum residency (26 sample(s))
4,701,992 bytes maximum slop
515 MB total memory in use (0 MB lost due to fragmentation)
...
Productivity 4.0% of total user, 3.8% of total elapsed
...so something's definitely wrong here!
The -hT -i0.01
RTS option causes a heap residency profile being created in the file <program>.hp
The .hp
file contains heap samplings which look like
BEGIN_SAMPLE 0.02
base:GHC.Conc.Sync.ThreadId 16
FUN_1_0 32
base:Data.Dynamic.Dynamic 24
ghc-prim:GHC.Tuple.(,) 24
base:Data.Maybe.Just 16
MUT_ARR_PTRS_CLEAN 552
base:GHC.Arr.STArray 40
FUN_0_1 16
MVAR_CLEAN 32
STACK 912
base:GHC.MVar.MVar 16
WEAK 48
TSO 112
BLACKHOLE 32
THUNK_1_0 64305024
END_SAMPLE 0.02
.hp
files can be plotted by e.g. hp2ps
, hp2pretty
, or hp2html
:
This looks like a thunk leak.
Let's try adding strictness to force evaluation:
count :: [Int] -> (Int,Int) -> (Int,Int)
count [] !np = np
count (x:xs) !(n,p) = f xs (if x<0 then (n+1,p) else (n,p+1))
...but this doesn't seem to make any difference at all!
(⊥,⊥)
to WHNF is a no-op.Let's try again adding strictness to force evaluation:
count :: [Int] -> (Int,Int) -> (Int,Int)
count [] np = np
count (x:xs) (!n,!p) = f xs (if x<0 then (n+1,p) else (n,p+1))
Core -O2
:
count_$s$wcount :: [Int] -> Int# -> Int# -> (# Int, Int #)
count_$s$wcount = \ w n p ->
case sc of _ {
[] -> (# I# n, I# p #);
: x xs -> case x of _ {
I# x1 -> case <# x1 0 of _ {
False -> count_$s$wcount xs n (+# p 1) ;
True -> count_$s$wcount xs (+# n 1) p
}
}
}
…
Now, the curried n
and p
function arguments are unboxed Int#
s.
New statistics:
987,695,672 bytes allocated in the heap
84,648 bytes copied during GC
36,080 bytes maximum residency (11 sample(s))
21,264 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
…
Productivity 100.0% of total user, 92.6% of total elapsed
..versus previous statistics:
1,283,991,912 bytes allocated in the heap
3,062,980,520 bytes copied during GC
261,147,880 bytes maximum residency (26 sample(s))
4,701,992 bytes maximum slop
515 MB total memory in use (0 MB lost due to fragmentation)
…
Productivity 4.0% of total user, 3.8% of total elapsed
From 4% to 100% productivity by adding just two !
annotations!
The first version of sum
had only a stack overflow because it wasn't tail-recursive.
The count
example had a thunk leak (which often goes hand in hand with stack overflows), because thunks were accumulating on the heap.
And then there's the (ordinary) space leak...
Consider
let xs = [1..10000000::Int] -- that's 1e7 items
in sum xs * product xs
which results in
800,041,392 bytes allocated in the heap
2,777,788,352 bytes copied during GC
399,735,000 bytes maximum residency (19 sample(s))
68,504,360 bytes maximum slop
908 MB total memory in use (0 MB lost due to fragmentation)
…
Productivity 4.1% of total user, 3.8% of total elapsed
Remember that [Int]
requires up to (3+2)N words.
The reason can be explained by looking at the Core output for
let xs = [1..10000000::Int] -- that's 1e7 items
in sum xs * product xs
which is
result = case $wsum' xs 0 of ww {
__DEFAULT -> case $wprod xs 1 of ww1 {
__DEFAULT -> I# (*# ww ww1)
}
}
xs = eftInt 1 10000000 -- [1..10000000]
Here xs
is allocated and kept alive until $wprod
has been evaluated.
CSE can be harmful: the code below has same Core output!
let x = product [1..10^7::Int]
y = sum [1..10^7::Int]
in x*y
Profiling Memory Usage chapter in GHC User's Guide
Profiling and Optimization chapter in Real World Haskell
E.Z.Yang's "Space Leak" tagged blog posts