2013-04-08
IO type & do syntax-sugarString vs. ByteString/Text[a] vs. Vectorfactorial :: Integer -> Integer
factorial n = go (n `max` 0) 1
where
go 0 a = a
go i a = go (i-1) $! (a*i)
$! is used to make sure the accumulator value a is strictly evaluated.
But values in Haskell are (usually) immutable, and therefore go constructs two new Integer values on each recursion step.
But only the final value of a is of interest after factorial terminates, every temporary value constructed during the computation is Garbage now.
So in Haskell we create a lot of Garbage due to data immutability.
GHC's runtime-system (RTS) exposes has a couple of parameters via special CLI options. When compiling a Haskell program via
echo 'main = putStrLn "Hello World"' > HelloWorld.hs
ghc --make -O -rtsopts HelloWorld.hs
you can query the executable via ./Helloworld +RTS --help to get a list of currently customizable RTS parameters.
+RTS -A<size> option)+RTS -G<n> option)When "Nursery" area is exhausted a GC is triggered to free up memory!
It's often beneficial to increase the allocation/nursery area (+RTS -A<size>) to match today's cpu cache-sizes.
Using +RTS -qg1 which allows parallel GC only for major collections sometimes helps for multicore programs.
+RTS -I<seconds> configures the idle-timer for triggering major collections. In interactive applications the default 300ms setting might not be sensible.
+RTS -M<size> allows to set the maximum heap size (unlimited by default). Note: when heap size nears configured maximum, a compaction algorithm is used instead of the default copying collection algorithm.
Runtime statistics can be generated by +RTS -s:
36,169,392 bytes allocated in the heap
4,057,632 bytes copied during GC
1,065,272 bytes maximum residency (2 sample(s))
54,312 bytes maximum slop
3 MB total memory in use (0 MB lost due to fragmentation)
Generation 0: 67 collections, 0 parallel, 0.04s, 0.03s elapsed
Generation 1: 2 collections, 0 parallel, 0.03s, 0.04s elapsed
SPARKS: 359207 (557 converted, 149591 pruned)
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.02s elapsed)
GC time 0.07s ( 0.07s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.08s ( 0.09s elapsed)
%GC time 89.5% (75.3% elapsed)
Alloc rate 4,520,608,923 bytes per MUT second
Productivity 10.5% of total user, 9.1% of total elapsedHeap objects on the heap have the following basic structure:
Heap objects sizes are multiples of word-size (i.e. 32bit or 64bit)
Int on the HeapLet's start with Int:
> :info Int
data Int = GHC.Types.I# GHC.Prim.Int# -- Defined in `GHC.Types'Types with the MagicHash, i.e. Int# are special primitive unboxed types provided by GHC.
Unboxed types represent "raw" values for no "info table" exists.
Unboxed types have the special kind # (so that you can't mix them up with normal * kinded types by accident).
Int on the Heap (cont.)Using record-style diagrams, the Int type
data Int = I# Int#
can be visualized as
Rounded nodes represent a constructor's info-table/entry-code
Char on the HeapSimiliar to Int#:
data Char = C# Char#
leads to
Optimization in GHC: pre-allocated shared heap-objects for latin-1 subset of Char (likewise for Int: − 17 < n < 17)
Bool on the HeapThe two possible values of the enumeration type
data Bool = False | True
are represented by
However, field-less constructors are allocated only once on the heap!
(Bool,Bool,Int) on the HeapConsequently, the tuple value
(True,True,42) :: (Bool,Bool,Int)
is represented on the heap as
Note the shared True heap object
String on the HeapStrings are represented by lists
type String = [Char]
data [a] = [] | (:) a [a]
therefore the string "foo" (= 'f':'o':'o':[]) becomes
String on the Heap (cont.)The memory usage for a N-character latin-1 String is 3N words.
The same latin-1 string that needs just N + 1 bytes in C, requires 24N bytes on the Haskell heap on x86-64!
The worst case memory usage for a general Unicode String is 5N words.
Conclusion: Try avoiding String whenever sensible
(and use ByteString or Text instead)
newtype on the Heapnewtypes are invisible on the heap, i.e. B 42 :: Bar and 42 :: Int look exactly the same on the Heap, whereas F 23 :: Foo adds an indirection:
newtype Bar = B Int
data Foo = F Int
UNPACK Pragma (cont.)By using UNPACK in the strict version of (Int,Int)
data IntPair = IP {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
GHC is instructed to embed the field into IntPair, i.e.
data IntPair = IP Int# Int#
and the value IP 23 42 becomes
ByteString typeThe internal type definition of strict bytestrings:
-- Copied from "bytestring:Data.ByteString.Internal":
data ByteString = PS {-# UNPACK #-} !(ForeignPtr Word8) -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
-- Copied from "base:GHC.ForeignPtr":
data ForeignPtr a = ForeignPtr Addr# ForeignPtrContents
data ForeignPtrContents = ... | PlainPtr (MutableByteArray# RealWorld)ByteString type (cont.)ByteStrings are essentially unboxed Word8 character arrays
Data.ByteString.Char8 provides functions using latin-1 Char instead of Word8
The pointer/offset/length representation allows for efficient slicing:
import Data.ByteString.Internal (ByteString(..))
take :: Int -> ByteString -> ByteString
take n ps@(PS fp off len)
| n <= 0 = empty
| n >= len = ps
| otherwise = PS fp off n
{-# INLINE take #-}However, there's the danger of resource leaks when unintended sharing occurs, e.g.
take 5 (replicate 1048576 'X')
ByteString type (cont.)The ForeignPtrContents type more in detail:
data ForeignPtrContents
= PlainForeignPtr !(IORef (Finalizers, [IO ()]))
| MallocPtr (MutableByteArray# RealWorld) !(IORef (Finalizers, [IO ()]))
| PlainPtr (MutableByteArray# RealWorld)For normal ByteStrings, we care only about the PlainPtr
The primitive MutableByteArray# type is represented by
typedef struct {
StgHeader header; // one word for non-profiled build
StgWord bytes;
StgWord payload[FLEXIBLE_ARRAY];
} StgArrWords;ByteString type (cont.)To wrap it all up, the unpacked PS constructor:
data ByteString = PS Addr# ForeignPtrContents
Int# -- offset
Int# -- length
ByteStringsByteStrings orByteStrings, andByteStringsData.List-like APIByteArray and copying everything overByteStringsBasically, lazy list of strict ByteStrings:
import qualified Data.ByteString as S
data ByteString
= Empty
| Chunk {-# UNPACK #-} !S.ByteString ByteStringappend can share data from original lazy ByteStrings (c.f. (++)/[a])ByteStrings to strict ByteStrings is expensive in generaluseful for buffering during serialization (-> bytestring/text builder concept)
TextBytestrings are perfect for 8bit character sets such as ASCII or ISO-8859-1
€" is described by (21-bit) code point U+20AC.€" is serialized to three 8-bit code-units "0xE2 0x82 0xAC".€" is serialized to one 16-bit code-unit "0x20AC".$" (U+0024) is serialized to one 8-bit code-unit "0x24".Text (cont.)ByteString but uses UTF-16 representationdata Text = Text {-# UNPACK #-} !Array -- payload
{-# UNPACK #-} !Int -- offset
{-# UNPACK #-} !Int -- length
data Array = Array ByteArray# -- UTF-16 representation
Due to the UTF-16 representation (-> surrogate pairs), operations such as Text.length becomes O(n) as it counts the number of code points (as opposed to number of UTF-16 code units)
Fixed overhead of 6 words for new Text, 4 words for shared copies.
Integer on the HeapThe Integer type has internally two constructors which optimizes for "small" integers (S#) which fit into an Int#:
-- copied from GHC.Integer.Type
data Integer
= S# Int#
| J# Int# ByteArray#Large integers (J#) are handled by the GNU Multiple Precision Arithmetic Library (GMP)
Integer can't be unboxed or UNPACKed
Data.VectorA different way to represent a sequence of values instead of single-linked list [a] is to use an packed array of Pointers to a.
That's what Data.Vector essentially does:
data Vector a = Vector {-# UNPACK #-} !Int
{-# UNPACK #-} !Int
{-# UNPACK #-} !(Array a)
data Array a = Array (GHC.Prim.Array# a) -- array of pointersData.Vector (cont.)Representing Vector Int on the heap takes up 6 + 3N words as Vector just stores an array of pointers to I# objects:
Like for Text fixed overhead 6 words for new Vectors, 4 words for shared copies.
Data.Vector.UnboxedRepresenting [Int] takes up 5N words, Vector Int still needs up to 6 + 3N words.
With Data.Vector.Unboxed the Ints are stored directly in the ByteArray#, thus U.Vector has an overhead of 6 + N words!