2013-04-08
IO
type & do
syntax-sugarString
vs. ByteString
/Text
[a]
vs. Vector
factorial :: Integer -> Integer
factorial n = go (n `max` 0) 1
where
go 0 a = a
go i a = go (i-1) $! (a*i)
$!
is used to make sure the accumulator value a
is strictly evaluated.
But values in Haskell are (usually) immutable, and therefore go
constructs two new Integer
values on each recursion step.
But only the final value of a
is of interest after factorial
terminates, every temporary value constructed during the computation is Garbage now.
So in Haskell we create a lot of Garbage due to data immutability.
GHC's runtime-system (RTS) exposes has a couple of parameters via special CLI options. When compiling a Haskell program via
echo 'main = putStrLn "Hello World"' > HelloWorld.hs
ghc --make -O -rtsopts HelloWorld.hs
you can query the executable via ./Helloworld +RTS --help
to get a list of currently customizable RTS parameters.
+RTS -A<size>
option)+RTS -G<n>
option)When "Nursery" area is exhausted a GC is triggered to free up memory!
It's often beneficial to increase the allocation/nursery area (+RTS -A<size>
) to match today's cpu cache-sizes.
Using +RTS -qg1
which allows parallel GC only for major collections sometimes helps for multicore programs.
+RTS -I<seconds>
configures the idle-timer for triggering major collections. In interactive applications the default 300ms setting might not be sensible.
+RTS -M<size>
allows to set the maximum heap size (unlimited by default). Note: when heap size nears configured maximum, a compaction algorithm is used instead of the default copying collection algorithm.
Runtime statistics can be generated by +RTS -s
:
36,169,392 bytes allocated in the heap
4,057,632 bytes copied during GC
1,065,272 bytes maximum residency (2 sample(s))
54,312 bytes maximum slop
3 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed
Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed
SPARKS: 359207 (557 converted, 149591 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.02s elapsed)
GC time 0.07s ( 0.07s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.08s ( 0.09s elapsed)
%GC time 89.5% (75.3% elapsed)
Alloc rate 4,520,608,923 bytes per MUT second
Productivity 10.5% of total user, 9.1% of total elapsed
Heap objects on the heap have the following basic structure:
Heap objects sizes are multiples of word-size (i.e. 32bit or 64bit)
Int
on the HeapLet's start with Int
:
> :info Int
data Int = GHC.Types.I# GHC.Prim.Int# -- Defined in `GHC.Types'
Types with the MagicHash
, i.e. Int#
are special primitive unboxed types provided by GHC.
Unboxed types represent "raw" values for no "info table" exists.
Unboxed types have the special kind #
(so that you can't mix them up with normal *
kinded types by accident).
Int
on the Heap (cont.)Using record-style diagrams, the Int
type
data Int = I# Int#
can be visualized as
Rounded nodes represent a constructor's info-table/entry-code
Char
on the HeapSimiliar to Int#
:
data Char = C# Char#
leads to
Optimization in GHC: pre-allocated shared heap-objects for latin-1 subset of Char
(likewise for Int
: − 17 < n < 17)
Bool
on the HeapThe two possible values of the enumeration type
data Bool = False | True
are represented by
However, field-less constructors are allocated only once on the heap!
(Bool,Bool,Int)
on the HeapConsequently, the tuple value
(True,True,42) :: (Bool,Bool,Int)
is represented on the heap as
Note the shared True
heap object
String
on the HeapStrings are represented by lists
type String = [Char]
data [a] = [] | (:) a [a]
therefore the string "foo"
(= 'f':'o':'o':[]
) becomes
String
on the Heap (cont.)The memory usage for a N-character latin-1 String
is 3N words.
The same latin-1 string that needs just N + 1 bytes in C, requires 24N bytes on the Haskell heap on x86-64!
The worst case memory usage for a general Unicode String
is 5N words.
Conclusion: Try avoiding String
whenever sensible
(and use ByteString
or Text
instead)
newtype
on the Heapnewtype
s are invisible on the heap, i.e. B 42 :: Bar
and 42 :: Int
look exactly the same on the Heap, whereas F 23 :: Foo
adds an indirection:
newtype Bar = B Int
data Foo = F Int
UNPACK
Pragma (cont.)By using UNPACK
in the strict version of (Int,Int)
data IntPair = IP {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
GHC is instructed to embed the field into IntPair
, i.e.
data IntPair = IP Int# Int#
and the value IP 23 42
becomes
ByteString
typeThe internal type definition of strict bytestrings:
-- Copied from "bytestring:Data.ByteString.Internal":
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
-- Copied from "base:GHC.ForeignPtr":
data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
data ForeignPtrContents = ... | PlainPtr (MutableByteArray# RealWorld)
ByteString
type (cont.)ByteString
s are essentially unboxed Word8
character arrays
Data.ByteString.Char8
provides functions using latin-1 Char
instead of Word8
The pointer/offset/length representation allows for efficient slicing:
import Data.ByteString.Internal (ByteString(..))
take :: Int -> ByteString -> ByteString
take n ps@(PS fp off len)
| n <= 0 = empty
| n >= len = ps
| otherwise = PS fp off n
{-# INLINE take #-}
However, there's the danger of resource leaks when unintended sharing occurs, e.g.
take 5 (replicate 1048576 'X')
ByteString
type (cont.)The ForeignPtrContents
type more in detail:
data ForeignPtrContents
= PlainForeignPtr !(IORef (Finalizers, [IO ()]))
| MallocPtr (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
| PlainPtr (MutableByteArray# RealWorld)
For normal ByteString
s, we care only about the PlainPtr
The primitive MutableByteArray#
type is represented by
typedef struct {
StgHeader header; // one word for non-profiled build
StgWord bytes;
StgWord payload[FLEXIBLE_ARRAY];
} StgArrWords;
ByteString
type (cont.)To wrap it all up, the unpacked PS
constructor:
data ByteString = PS Addr# ForeignPtrContents
Int# -- offset
Int# -- length
ByteString
sByteString
s orByteString
s, andByteString
sData.List
-like APIByteArray
and copying everything overByteString
sBasically, lazy list of strict ByteString
s:
import qualified Data.ByteString as S
data ByteString
= Empty
| Chunk {-# UNPACK #-} !S.ByteString ByteString
append
can share data from original lazy ByteString
s (c.f. (++)
/[a]
)ByteString
s to strict ByteString
s is expensive in generaluseful for buffering during serialization (-> bytestring/text builder concept)
Text
Bytestring
s are perfect for 8bit character sets such as ASCII or ISO-8859-1
€
" is described by (21-bit) code point U+20AC
.€
" is serialized to three 8-bit code-units "0xE2 0x82 0xAC
".€
" is serialized to one 16-bit code-unit "0x20AC
".$
" (U+0024
) is serialized to one 8-bit code-unit "0x24
".Text
(cont.)ByteString
but uses UTF-16 representationdata Text = Text {-# UNPACK #-} !Array -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
data Array = Array ByteArray# -- UTF-16 representation
Due to the UTF-16 representation (-> surrogate pairs), operations such as Text.length
becomes O(n) as it counts the number of code points (as opposed to number of UTF-16 code units)
Fixed overhead of 6 words for new Text
, 4 words for shared copies.
Integer
on the HeapThe Integer
type has internally two constructors which optimizes for "small" integers (S#
) which fit into an Int#
:
-- copied from GHC.Integer.Type
data Integer
= S# Int#
| J# Int# ByteArray#
Large integers (J#
) are handled by the GNU Multiple Precision Arithmetic Library (GMP)
Integer
can't be unboxed or UNPACK
ed
Data.Vector
A different way to represent a sequence of values instead of single-linked list [a]
is to use an packed array of Pointers to a
.
That's what Data.Vector
essentially does:
data Vector a = Vector {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
{-# UNPACK #-} !(Array a)
data Array a = Array (GHC.Prim.Array# a) -- array of pointers
Data.Vector
(cont.)Representing Vector Int
on the heap takes up 6 + 3N words as Vector
just stores an array of pointers to I#
objects:
Like for Text
fixed overhead 6 words for new Vector
s, 4 words for shared copies.
Data.Vector.Unboxed
Representing [Int]
takes up 5N words, Vector Int
still needs up to 6 + 3N words.
With Data.Vector.Unboxed
the Int
s are stored directly in the ByteArray#
, thus U.Vector
has an overhead of 6 + N words!