Last Time

GHC & Haskell Platform
Haskell 98/2010 & Haskell Language Report
WHNF & ⊥
Partial/Total Functions
Types, Kinds, Polymorphism & Typeclasses
Module System
The IO type & do syntax-sugar
Haskell kernel sub-language

Today

GHC's Memory Management
Value representation on the heap
- String vs. ByteString/Text
- [a] vs. Vector

Garbage

factorial :: Integer -> Integer
factorial n = go (n `max` 0) 1
  where
    go 0 a = a
    go i a = go (i-1) $! (a*i)

$! is used to make sure the accumulator value a is strictly evaluated.
But values in Haskell are (usually) immutable, and therefore go constructs two new Integer values on each recursion step.
But only the final value of a is of interest after factorial terminates, every temporary value constructed during the computation is Garbage now.
So in Haskell we create a lot of Garbage due to data immutability.

Garbage Collection

GHC uses a so-called generational, copying collector
- "generational hypothesis": Most objects die young
- Heap is divided into Generations
- Whenever generation n is collected, so are all younger generations m < n
- Objects get promoted to the next generation if they survive the GC cycle(s)
- defragmentation is achieved by copying the live data

Garbage Collection (cont.)

GHC's runtime-system (RTS) exposes has a couple of parameters via special CLI options. When compiling a Haskell program via
```
echo 'main = putStrLn "Hello World"' > HelloWorld.hs
ghc --make -O -rtsopts HelloWorld.hs
```
you can query the executable via ./Helloworld +RTS --help to get a list of currently customizable RTS parameters.
Default configuration:
- Rather small 512k "Nursery" area (+RTS -A<size> option)
- 2 Generations (+RTS -G<n> option)
When "Nursery" area is exhausted a GC is triggered to free up memory!

Garbage Collection (cont.)

It's often beneficial to increase the allocation/nursery area (+RTS -A<size>) to match today's cpu cache-sizes.
Using +RTS -qg1 which allows parallel GC only for major collections sometimes helps for multicore programs.
+RTS -I<seconds> configures the idle-timer for triggering major collections. In interactive applications the default 300ms setting might not be sensible.
+RTS -M<size> allows to set the maximum heap size (unlimited by default). Note: when heap size nears configured maximum, a compaction algorithm is used instead of the default copying collection algorithm.

Garbage Collection (cont.)

Runtime statistics can be generated by +RTS -s:

    36,169,392 bytes allocated in the heap
     4,057,632 bytes copied during GC
     1,065,272 bytes maximum residency (2 sample(s))
        54,312 bytes maximum slop
             3 MB total memory in use (0 MB lost due to fragmentation)

Generation 0:    67 collections,     0 parallel,  0.04s,  0.03s elapsed
Generation 1:     2 collections,     0 parallel,  0.03s,  0.04s elapsed

SPARKS: 359207 (557 converted, 149591 pruned)

INIT  time    0.00s  (  0.00s elapsed)
MUT   time    0.01s  (  0.02s elapsed)
GC    time    0.07s  (  0.07s elapsed)
EXIT  time    0.00s  (  0.00s elapsed)
Total time    0.08s  (  0.09s elapsed)

%GC time      89.5%  (75.3% elapsed)

Alloc rate    4,520,608,923 bytes per MUT second

Productivity  10.5% of total user, 9.1% of total elapsed

Heap Objects

Heap objects on the heap have the following basic structure:
Heap objects sizes are multiples of word-size (i.e. 32bit or 64bit)
The "Info Table" contains information needed by the GC such as:
- size of the payload
- where pointers to other heap objects are located in the payload

`Int` on the Heap

Let's start with Int:

> :info Int
data Int = GHC.Types.I# GHC.Prim.Int#   -- Defined in `GHC.Types'

Types with the MagicHash, i.e. Int# are special primitive unboxed types provided by GHC.
Unboxed types represent "raw" values for no "info table" exists.
Unboxed types have the special kind # (so that you can't mix them up with normal * kinded types by accident).

`Int` on the Heap (cont.)

Using record-style diagrams, the Int type
```
data Int = I# Int#
```
can be visualized as
Each rectangular field represents word-size (i.e. 32bit or 64bit)
Empty fields contain pointers
Rounded nodes represent a constructor's info-table/entry-code

`Char` on the Heap

Similiar to Int#:
```
data Char = C# Char#
```
leads to
Optimization in GHC: pre-allocated shared heap-objects for latin-1 subset of Char (likewise for Int: − 17 < n < 17)

`Bool` on the Heap

The two possible values of the enumeration type
```
data Bool = False | True
```
are represented by
However, field-less constructors are allocated only once on the heap!

`(Bool,Bool,Int)` on the Heap

Consequently, the tuple value
```
(True,True,42) :: (Bool,Bool,Int)
```
is represented on the heap as
Note the shared True heap object

`String` on the Heap

Strings are represented by lists
```
type String = [Char]
data [a] = [] | (:) a [a]
```
therefore the string "foo" (= 'f':'o':'o':[]) becomes

`String` on the Heap (cont.)

The memory usage for a N-character latin-1 String is 3N words.
The same latin-1 string that needs just N + 1 bytes in C, requires 24N bytes on the Haskell heap on x86-64!
The worst case memory usage for a general Unicode String is 5N words.
Conclusion: Try avoiding String whenever sensible
(and use ByteString or Text instead)

`newtype` on the Heap

newtypes are invisible on the heap, i.e. B 42 :: Bar and 42 :: Int look exactly the same on the Heap, whereas F 23 :: Foo adds an indirection:
```
newtype Bar = B Int
data Foo = F Int
```

`UNPACK` Pragma (cont.)

By using UNPACK in the strict version of (Int,Int)

data IntPair = IP {-# UNPACK #-} !Int
                  {-# UNPACK #-} !Int

GHC is instructed to embed the field into IntPair, i.e.

data IntPair = IP Int# Int#

and the value IP 23 42 becomes

The strict `ByteString` type

The internal type definition of strict bytestrings:

-- Copied from "bytestring:Data.ByteString.Internal":
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
                     {-# UNPACK #-} !Int                -- offset
                     {-# UNPACK #-} !Int                -- length

-- Copied from "base:GHC.ForeignPtr":
data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents

data ForeignPtrContents = ... | PlainPtr (MutableByteArray# RealWorld)

The strict `ByteString` type (cont.)

ByteStrings are essentially unboxed Word8 character arrays
Data.ByteString.Char8 provides functions using latin-1 Char instead of Word8

The pointer/offset/length representation allows for efficient slicing:

import Data.ByteString.Internal (ByteString(..))

take :: Int -> ByteString -> ByteString
take n ps@(PS fp off len)
    | n <= 0    = empty
    | n >= len  = ps
    | otherwise = PS fp off n
{-# INLINE take #-}

However, there's the danger of resource leaks when unintended sharing occurs, e.g.
take 5 (replicate 1048576 'X')

The strict `ByteString` type (cont.)

The ForeignPtrContents type more in detail:

data ForeignPtrContents
  = PlainForeignPtr  !(IORef (Finalizers, [IO ()]))
  | MallocPtr        (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
  | PlainPtr         (MutableByteArray# RealWorld)

For normal ByteStrings, we care only about the PlainPtr

The primitive MutableByteArray# type is represented by

typedef struct {
    StgHeader  header; // one word for non-profiled build
    StgWord    bytes;
    StgWord    payload[FLEXIBLE_ARRAY];
} StgArrWords;

The strict `ByteString` type (cont.)

To wrap it all up, the unpacked PS constructor:

data ByteString = PS Addr# ForeignPtrContents
                     Int#   -- offset
                     Int#   -- length

Space Usage of strict `ByteString`s

9 word fixed overhead for new ByteStrings or
5 word fixed overhead for shared copies existing ByteStrings, and
⌈N / wordsize⌉ words variable overhead for N-length string

Summary of strict `ByteString`s

Good:
- compact representation of byte arrays
- provides pure, Data.List-like API
- supports zero-copy slicing
- O(1) indexing
Not so good:
- cons/append requires allocating a new ByteArray and copying everything over
- unintended sharing can lead to resource leaks
- fixed overhead is significant for short strings
- Not suitable for handling Unicode strings

Lazy `ByteString`s

Basically, lazy list of strict ByteStrings:

import qualified Data.ByteString as S

data ByteString
    = Empty 
    | Chunk {-# UNPACK #-} !S.ByteString ByteString

append can share data from original lazy ByteStrings (c.f. (++)/[a])
can be used for lazy I/O (but beware of the dangers of lazy I/O)
converting from lazy ByteStrings to strict ByteStrings is expensive in general
useful for buffering during serialization (-> bytestring/text builder concept)

`Text`

Bytestrings are perfect for 8bit character sets such as ASCII or ISO-8859-1
But often need to operate on the full Unicode character set:
- The Glyph "€" is described by (21-bit) code point U+20AC.
- In UTF-8 "€" is serialized to three 8-bit code-units "0xE2 0x82 0xAC".
- In UTF-16 "€" is serialized to one 16-bit code-unit "0x20AC".
- In UTF-8 "$" (U+0024) is serialized to one 8-bit code-unit "0x24".
- UTF-32 is constant-length encoding in 32-bit code-units.
  (whereas UTF-8 and UTF-16 are variable-length encodings.)

`Text` (cont.)

Similiar concept as for ByteString but uses UTF-16 representation

data Text = Text {-# UNPACK #-} !Array    -- payload
                 {-# UNPACK #-} !Int      -- offset
                 {-# UNPACK #-} !Int      -- length
    
data Array = Array ByteArray#   -- UTF-16 representation

Due to the UTF-16 representation (-> surrogate pairs), operations such as Text.length becomes O(n) as it counts the number of code points (as opposed to number of UTF-16 code units)
Fixed overhead of 6 words for new Text, 4 words for shared copies.

`Integer` on the Heap

The Integer type has internally two constructors which optimizes for "small" integers (S#) which fit into an Int#:
```
-- copied from GHC.Integer.Type
data Integer
    = S# Int#
    | J# Int# ByteArray#
```
Large integers (J#) are handled by the GNU Multiple Precision Arithmetic Library (GMP)
Integer can't be unboxed or UNPACKed

`Data.Vector`

A different way to represent a sequence of values instead of single-linked list [a] is to use an packed array of Pointers to a.

That's what Data.Vector essentially does:

data Vector a = Vector {-# UNPACK #-} !Int
                       {-# UNPACK #-} !Int
                       {-# UNPACK #-} !(Array a)

data Array a = Array (GHC.Prim.Array# a) -- array of pointers

`Data.Vector` (cont.)

Representing Vector Int on the heap takes up 6 + 3N words as Vector just stores an array of pointers to I# objects:

Like for Text fixed overhead 6 words for new Vectors, 4 words for shared copies.

`Data.Vector.Unboxed`

Representing [Int] takes up 5N words, Vector Int still needs up to 6 + 3N words.
With Data.Vector.Unboxed the Ints are stored directly in the ByteArray#, thus U.Vector has an overhead of 6 + N words!

VU 183.653 Methodical, industrial Software-Engineering using the Haskell functional programming language

Last Time

Today

Garbage

Garbage Collection

Garbage Collection (cont.)

Garbage Collection (cont.)

Garbage Collection (cont.)

Heap Objects

Int on the Heap

Int on the Heap (cont.)

Char on the Heap

Bool on the Heap

(Bool,Bool,Int) on the Heap

String on the Heap

String on the Heap (cont.)

newtype on the Heap

UNPACK Pragma (cont.)

The strict ByteString type

The strict ByteString type (cont.)

The strict ByteString type (cont.)

The strict ByteString type (cont.)

Space Usage of strict ByteStrings

Summary of strict ByteStrings

Lazy ByteStrings

Text

Text (cont.)

Integer on the Heap

Data.Vector

Data.Vector (cont.)

Data.Vector.Unboxed

`Int` on the Heap

`Int` on the Heap (cont.)

`Char` on the Heap

`Bool` on the Heap

`(Bool,Bool,Int)` on the Heap

`String` on the Heap

`String` on the Heap (cont.)

`newtype` on the Heap

`UNPACK` Pragma (cont.)

The strict `ByteString` type

The strict `ByteString` type (cont.)

The strict `ByteString` type (cont.)

The strict `ByteString` type (cont.)

Space Usage of strict `ByteString`s

Summary of strict `ByteString`s

Lazy `ByteString`s

`Text`

`Text` (cont.)

`Integer` on the Heap

`Data.Vector`

`Data.Vector` (cont.)

`Data.Vector.Unboxed`