Exceptions

Haskell 98/2010 defines a simple exception handling infrastructure in the IO-monad with the following primitives:

userError :: String  -> IOError 
ioError   :: IOError -> IO a  
catch     :: IO a    -> (IOError -> IO a) -> IO a

instance Monad IO where
    -- … bindings for return, (>>=) and (>>) 
    fail s = ioError (userError s)

However, in state-of-the-art Haskell An Extensible Dynamically-Typed Hierarchy of Exceptions is used.

Extensible Exceptions

Typeclass-based framework:

data SomeException = ∀e . Exception e => SomeException e

class (Typeable e, Show e) => Exception e where
    toException   :: e -> SomeException
    fromException :: SomeException -> Maybe e

New primitives such as:

throwIO :: Exception e => e -> IO a
throw :: Exception e => e -> a
catch :: Exception e => IO a -> (e -> IO a) -> IO a
try :: Exception e => IO a -> IO (Either e a)

Extensible Exceptions (cont.)

bracket is a particularly useful abstraction:

bracket :: IO a -> (a -> IO b) -> (a -> IO c) -> IO c

The type a is for representing a resource, and therefore the first two arguments are for acquiring/releasing a resource, e.g.:
```
bracket (openFile "filename" ReadMode)
        (hClose)
        (\fileHandle -> do { … })
```

Extensible Exceptions (cont.)

Control.Exception provides several functions for catching exceptions. From the module documentation:
Here's a rule of thumb for deciding which catch-style function to use:
- If you want to do some cleanup in the event that an exception is raised, use finally, bracket or onException.
- To recover after an exception and do something else, the best choice is to use one of the try family.
- ... unless you are recovering from an asynchronous exception, in which case use catch or catchJust.

Extensible Exceptions (cont.)

Pre-defined exception types (i.e. instances of Exception class):

data IOException -- abstract; IOError is type-synomym of this
data ArithException = Overflow | Underflow | DivideByZero | …
data AsyncException = UserInterrupt | ThreadKilled | …
data ErrorCall = ErrorCall String -- raised by 'error' function
…

See Control.Exception for more details.

Parallelism vs. Concurrency

Concurrency = dealing with lots of things at once
- Allow interleaved execution of multiple tasks
- Useful for dealing with I/O
- Affects program structure
Parallelism = doing lots of things at once
- Simultaneous execution of multiple tasks
- Useful for speeding up CPU bound computations
- Requires multiple execution units (→ multicore).

Process vs. (OS-)Threads vs. Green-Threads

Process: executed instance of a program including its state
- Heavy-weight due to context(-switches)
(OS-)Thread: (aka light-weight process) execution thread inside process scheduled by OS
- Slightly less heavy-weight context(switches) between threads due to shared context
Green-Threads: (aka microthreads or user-space threads) really light-weight threads scheduled by process itself
- Significantly less overhead (& context) compared to OS-threads

Concurrent Haskell

Provides means to create interleaved threads of exectuion.
Most important new primitive (from Control.Concurrent):
```
forkIO :: IO () -> IO ThreadId
```
forkIO spawns a new thread for executing the given IO () action.
- The ThreadId handle provides a way to identify the new thread for throwing asynchronous exceptions to. We'll get back to that in a minute.

Concurrent Haskell

A very ~~stupid~~simple single-threaded HTTP server:

main = withSocketsDo $ do
    sock <- listenOn $ PortNumber 8000
    loop sock
  where
    loop sock = do
        (h,_,_) <- accept sock
        body h
        loop sock

    body h = hPutStr h msg >> hFlush h >> hClose h

    msg = "HTTP/1.0 200 OK\r\nContent-Length: 7\r\n\r\nPong!\r\n"

Concurrent Haskell

A very ~~stupid~~simple concurrent HTTP server:

main = withSocketsDo $ do
    sock <- listenOn $ PortNumber 8000
    loop sock
  where
    loop sock = do
        (h,_,_) <- accept sock
        forkIO $ body h
        loop sock

    body h = hPutStr h msg >> hFlush h >> hClose h

    msg = "HTTP/1.0 200 OK\r\nContent-Length: 7\r\n\r\nPong!\r\n"

Asynchronous Exceptions

Exceptions thrown via throw are considered synchronous, as they occur explictly at well-defined points in the control flow.
However, there are also asynchronous exception such as StackOverflow or HeapOverflow, which can potentially occur "at any point" during execution.

Also, Asynchronous exceptions can be thrown by user-code via

throwTo :: ThreadId -> Exception -> IO ()

killThread :: ThreadId -> IO ()
killThread tid = throwTo tid ThreadKilled

Asynchronous Exceptions (cont.)

throwTo returns normally only after the exception has been delivered to the target.

Critical section can be protected from asynchronous exceptions by mask:

bracket acquire release action = mask $ \restore -> do
    h <- acquire
    r <- restore (action h) `onException` release h
    _ <- release h
    return r

However, some (usually) blocking operations are interruptible which means they can receive asynchronous exception even in the scope of a mask!

The Software Crises

The 1^st Software Crisis: Assembly languages didn't scale

The 2^nd Software Crisis: Imperative languages (& tools) didn't scale

The 3^rd Software Crisis: OO languages don't scale
- CPU manufacturers are hitting a wall increasing clock frequencies and turn to increasing number of cores instead to provide more performance
- Automatic parallelization (even in a pure language) is a hard problem (→ ongoing research)
- Need better languages & tools for exploiting multicore processors

Amdahl's Law

Speedup $S (N) = \frac{1}{1 - P + \frac{P}{N}}$ with $N$ umber of processors and parallel $P$ ortion:

(source: Wikimedia Commons / CC-BY-SA-3.0)

Challenges of Parallelism

Reasoning about parallel execution is hard
Parallelism difficult to abstract away
Not all Problems are suited to be parallelized
Different data-structures needed sometimes
Debugging parallel programs is more difficult
(non-determinism of bug occurence, combinatorial state-explosion, race conditions, deadlocks, …)

Parallel Haskell

Immutable data, purity and explicitness of side-effects reduces freedom of programmer

C.f. programming constructs providing language support for structured programming
Parallel programming still isn't trivial, but it becomes a bit easier with more structure.
In Parallel Haskell the language is minimally augmented with a few primitives.

Parallel Haskell

Two new primitives for annotating pure code:

infixr 0 `par`
par :: a -> b -> b
infixr 1 `pseq`
pseq :: a -> b -> b

par x y is semantically equivalent to y
…but indicates that WHNF of x may be computed in parallel

pseq is semantically equivalent to seq
…but with a stronger guarantee w.r.t. execution order

Parallel Haskell (cont.)

Requires linkage with -threaded runtime and executaion with +RTS -N2 or more to be of any use
+RTS -N $n$ controls the number of so-called Haskell Execution Contexts (HEC)
Recent GHCs allows to change the number of HECs at runtime via setNumCapabilities :: Int -> IO ()
Threads spawned via forkIO are distributed to available HECs and thus may exhibit parallelism

Parallel Haskell (cont.)

Which of the following expressions exploits parallelism?

x `par` x+y
x `par` y+x
x `par` y `par` x+y
x `par` y `par` y+x

Parallel Haskell (cont.)

Which of the following expressions exploits parallelism?

x `par` x+y
x `par` y+x
x `par` y `par` x+y
x `par` y `par` y+x

Parallelism depends on evaluation order of (+), better to use pseq to make evaluation order explicit:
```
x `par` y `pseq` x+y
```

Parallel Haskell (cont.)

Obligatory fibonacci-based usage example:

fib 0 = 0
fib 1 = 1
fib n = f1 + f2
  where
    f1 = fib (n-1)
    f2 = fib (n-2)

parfib n
  | n < 11    = fib n
  | otherwise = f1 `par` (f2 `pseq` (f1+f2))
  where
    f1 = parfib (n-1)
    f2 = parfib (n-2)

Parallel Haskell (cont.)

par creates sparks which may or may not result in a new thread
It's easy to use par ineffectively (e.g. by forgetting pseqs)
The previous example falls back to fib for $n < 11$ to avoid parallelization overhead
each HEC has an associated spark queue
idle HECs steal work from other HEC's spark queues

Parallel Haskell: `+RTS -s`

Spark firing can be monitored via runtime statistics +RTS -s:
```
SPARKS: 9369514 (215 converted, 1058180 overflowed, 0 dud, 8078160 GC'd, 232959 fizzled)
```
- converted sparks mean they were succesfully computed
- overflowed sparks couldn't be created because the spark-queue was full
- dud sparks were not created because they were already evaluated
- GCed sparks were declared dead before conversion
- fizzled sparks were already evaluated when they were about to be converted

Parallel Haskell: Threadscope

Compile with -rtsopts -eventlog to enable runtime event tracing

Run program with the following +RTS option:

-l[flags]  Log events in binary format to the file <program>.eventlog
           where [flags] can contain:
              s    scheduler events
              g    GC and heap events
              p    par spark events (sampled)
              f    par spark events (full detail)
              u    user events (emitted from Haskell code)
              a    all event classes above
             -x    disable an event class, for any flag above
           the initial enabled event classes are 'sgpu'

Startup up ThreadScope with threadscope <program>.eventlog …

Parallel Haskell: FFI

unsafe FFI calls may block other threads (c.f. non-allocating computations)

safe FFI calls add about 100 ns overhead:

-- int square(int a) { return a*a; }
foreign import ccall unsafe "square" c_sq_unsafe :: CInt -> CInt
foreign import ccall   safe "square" c_sq_safe :: CInt -> CInt
hs_sq x = x*x :: CInt

On Intel i7-3770 with GHC 7.6.3/Linux/64bit:

about 10 ns for hs_sq and c_sq_unsafe
about 120 ns for c_sq_safe

safe FFI calls spawn additional os-threads and don't eat up HECs

Parallel Haskell: Strategies

Simple abstraction for specifying the evaluation degree:

type Strategy a = a -> ()

using :: a -> Strategy a -> a
using x s = s x `pseq` x

Some trivial strategies:

r0, rseq :: Strategy a
r0   x = ()
rseq x = x `pseq` ()

rdeepseq :: NFData a => Strategy a

Parallel Haskell: Strategies (cont.)

A less trivial strategy:

parList :: Strategy a -> Strategy [a]
parList strat []     = ()
parList strat (x:xs) = strat x `par` parList strat xs

Usage:

parMap :: Strategy b -> (a -> b) -> [a] -> [b]
parMap strat f xs = map f xs `using` parList strat

provides parallel mapping with strategy as parameter

Parallel Haskell: Strategies (cont.)

Strategies exploit lazyness to provide compositionality/modularity
Reasoning about performance is obfuscated by exploited lazyness
Strategies rely on data-structures with unevaluated components to create parallelism

However, strategies are just one possible abstraction for providing parallelism

Parallel Haskell: `Eval` monad

A more explicit abstraction:

data Eval a

instance Monad Eval

runEval :: Eval a -> a

rseq :: a -> Eval a
rpar :: a -> Eval a

Eval is a strict identity monad

Parallel Haskell: `Eval` monad (cont.)

Allows to write

a `par` (b `pseq` a + b)

with a more explicit control flow syntax as

runEval $ do 
   a' <- rpar a     -- start evaluation of a in parallel
   b' <- rseq b     -- evaluate b
   return $ a' + b' -- return result a+b

Parallel Haskell: `Eval` monad (cont.)

New strategies implementations based on Eval:

type Strategy a = a -> Eval a

using :: a -> Strategy a -> a
using x strat = runEval (strat x)

rpar and rseq are strategies too:
```
rpar, rseq :: Strategy a
```
Use rparWith to set the evaluation degree of parallel computations:
```
rparWith :: Strategy a -> Strategy a
```

Communication & State

Let's go back to task-based concurrency with forkIO and friends
So far we've only discussed independent tasks which did not communicate with each other
Remember forkIO's type signature:
```
forkIO :: IO () -> IO ThreadId
```
Knowing the ThreadId does not even provide a way to wait on thread completion!

`MVar` communication primitive

Somewhat like a mutable & synchronized Maybe container:

data MVar a

newEmptyMVar :: IO (MVar a)
newMVar      :: a -> IO (MVar a)
takeMVar     :: MVar a -> IO a
putMVar      :: MVar a -> a -> IO ()

takeMVar takes the MVar's item (but blocks if MVar empty)
putMVar puts an item into the MVar (but blocks if MVar full)

`MVar` communication primitive (cont.)

Instead of using takeMVar and putMVar directly, there's are useful wrappers which combine takeMVar/putMVar transactions in an atomic (or rather exception-safe) way, e.g.:

readMVar   :: MVar a -> IO a
swapMVar   :: MVar a -> a -> IO a  
withMVar   :: MVar a -> (a -> IO b) -> IO b
modifyMVar :: MVar a -> (a -> IO (a, b)) -> IO b

However, care must be taken to avoid issuing putMVars (w/o prior takeMVars) while using the wrappers above

Locks

MVars can be used for emulating mutexes:

type Lock = MVar ()

newLock :: IO Lock
newLock = newMVar ()

withLock :: Lock -> IO a -> IO a
withLock x = withMVar x . const

Similar to context-manager syntax in Python:

logMsg msg = withLock loggingLock $ do
    print =<< getPOSIXTime
    putStr " | "
    putStrLn msg

Waiting on Thread Completion

With MVars we can now wait on thread completion:

main = do
    done <- newEmptyMVar
    forkIO $ do
        -- ...do stuff...
        putMVar done ()

    -- ...do other stuff...
    () <- takeMVar done  -- blocks until threads completes
    return ()

Instead of () we could also communicate back a result value

`async` Package: Futures & Promises

The previous example hints at a common pattern, namely Futures

The async package provides ready-to-use (exception-aware) abstractions:

data Async a

async :: IO a -> IO (Async a)
withAsync :: IO a -> (Async a -> IO b) -> IO b
wait :: Async a -> IO a
cancel :: Async a -> IO ()

C.f. Eval monad from parallel package

`async` Package: Convenience Functions

The async package provides other useful helpers as well, e.g.:

concurrently :: IO a -> IO b -> IO (a, b)
mapConcurrently :: Traversable t => (a -> IO b) -> t a -> IO (t b)
race :: IO a -> IO b -> IO (Either a b)

Linking threads together (w.r.t. cancellation):

link :: Async a -> IO ()
link2 :: Async a -> Async b -> IO ()

`monad-par` Package: The `Par` Monad

monad-par provides an even more explicit abstraction avoiding lazyness

The pure interface:

data Par a
instance Monad Par

runPar :: Par a -> a
fork :: Par () -> Par ()

data IVar
new :: Par (IVar a)
get :: IVar a -> Par a
put :: NFData a => IVar a -> a -> Par ()

`monad-par` Package: The `Par` Monad (cont.)

Emphasis on representing computation dataflows:

runPar $ do
  [a,b,c,d] <- sequence [new,new,new,new]
  fork $ do x <- get a; put b (x+1)
  fork $ do x <- get a; put c (x+2)
  fork $ do x <- get b; y <- get c; put d (x+y)
  fork $ do put a (3 :: Int)
  get d

IVars can be seen as vertices in a graph and each IVar has (at most) one incoming edge.
Alternatively, G-style blockdiagrams where IVars represent nodes' inputs/outputs and edges correspond to get/put pairs.

`monad-par` Package: The `Par` Monad (cont.)

Convenience operation spawn providing future/promise pattern:

spawn :: NFData a => Par a -> Par (IVar a)
spawn p = do
    r <- new
    fork (p >>= put r)
    return r

`monad-par` Package: The `Par` Monad (cont.)

Parallel Fibonacci example, again:

parfib :: Int -> Par Int
parfib n
  | n <= 2     = return 1
  | otherwise  = do
     f1 <- spawn $ parfib (n - 1)
     f2 <- spawn $ parfib (n - 2)
     f1' <- get f1
     f2' <- get f2
     return (f1' + f2')

VU 183.653 Methodical, industrial Software-Engineering using the Haskell functional programming language

Last Time

Today's Topics

Exceptions

Extensible Exceptions

Extensible Exceptions (cont.)

Extensible Exceptions (cont.)

Extensible Exceptions (cont.)

Parallelism vs. Concurrency

Process vs. (OS-)Threads vs. Green-Threads

Concurrent Haskell

Concurrent Haskell

Concurrent Haskell

Asynchronous Exceptions

Asynchronous Exceptions (cont.)

The Software Crises

Amdahl's Law

Challenges of Parallelism

Parallel Haskell

Parallel Haskell

Parallel Haskell (cont.)

Parallel Haskell (cont.)

Parallel Haskell (cont.)

Parallel Haskell (cont.)

Parallel Haskell (cont.)

Parallel Haskell: +RTS -s

Parallel Haskell: Threadscope

Parallel Haskell: FFI

Parallel Haskell: Strategies

Parallel Haskell: Strategies (cont.)

Parallel Haskell: Strategies (cont.)

Parallel Haskell: Eval monad

Parallel Haskell: Eval monad (cont.)

Parallel Haskell: Eval monad (cont.)

Communication & State

MVar communication primitive

MVar communication primitive (cont.)

Locks

Waiting on Thread Completion

async Package: Futures & Promises

async Package: Convenience Functions

monad-par Package: The Par Monad

monad-par Package: The Par Monad (cont.)

monad-par Package: The Par Monad (cont.)

monad-par Package: The Par Monad (cont.)

Reading Material

Parallel Haskell: `+RTS -s`

Parallel Haskell: `Eval` monad

Parallel Haskell: `Eval` monad (cont.)

Parallel Haskell: `Eval` monad (cont.)

`MVar` communication primitive

`MVar` communication primitive (cont.)

`async` Package: Futures & Promises

`async` Package: Convenience Functions

`monad-par` Package: The `Par` Monad

`monad-par` Package: The `Par` Monad (cont.)

`monad-par` Package: The `Par` Monad (cont.)

`monad-par` Package: The `Par` Monad (cont.)