Benefits of a (Good) Typesystem

Helps a lot with refactoring
Helps avoiding errors at compile-time
(c.f. unit tests)

Provides documentation via type-signatures:

... :: (a -> b) -> [a] -> [b]
... :: Monad m => (a -> m Bool) -> [a] -> m [a]

(c.f. doc-strings in LISP, Python, et al.)

Security & Access-control

Exhaustive Case Analysis

What's the problem with the code below?

data CObjKind = CObjPerson | CObjBike | CObjDog

isHuman CObjPerson = True
isHuman _          = False

Exhaustive Case Analysis

More defensive style:

data CObjKind = CObjPerson | CObjBike | CObjDog

isHuman CObjPerson = True
isHuman CObjBike   = False
isHuman Cobjdog    = False

Use explicit cases to have compiler help you when refactoring CObjKind!

Typeclasses

Type-classes allow to attach uniform interfaces to types
- We humans don't cope well with complexity
- Common structure/properties of different "things" (e.g. monoids)
Type-classes should not be used for name-spacing
- Bad cost/benefit: Type-signatures & errors get more noisy/complex
- Modules provide name-spacing

Type-classes should have associated laws

mempty <> x   = x
x <> mempty   = x
x <> (y <> z) = (x <> y) <> z

Typeclasses

Every Haskell programmer should read the Typeclassopedia

Make Illegal States Unrepresentable

"A contact shall have either an email or a postal address"

1st Attempt:

data Contact = Contact
    { cName   :: Name,
    , cEmail  :: EmailInfo,
    , cPostal :: PostalInfo,
    }

Can't represent required states ☹

Make Illegal States Unrepresentable

"A contact shall have either an email or a postal address"

2nd Attempt:

data Contact = Contact
    { cName   :: Name,
    , cEmail  :: Maybe EmailInfo,
    , cPostal :: Maybe PostalInfo,
    }

Can represent required states ☺

…as well as undesirable states ☹

What's so bad about the undesired states?
(Hint: what's so annoying about nullable types?)

Make Illegal States Unrepresentable

"A contact shall have either an email or a postal address"

Final Attempt:

data Contact = Contact
    { cName           :: Name,
    , cEmailOrPostal  :: Either EmailInfo PostalInfo
    }

Can represent only required states! ☺

Make Illegal States Unrepresentable

In languages with "weaker" typesystems:

typedef struct {
    cobj_kind_t   kind; // person, bike, dog, ...
    cobj_age_t    age;
    cobj_weight_t weight;
    cobj_size_t   size;
    cobj_gender_t gender;
} cobj_t;

Fields exist even though not sensible → error-prone

Make Illegal States Unrepresentable

With algebraic data types:

data CObjKind = CObjPerson Age Gender
              | CObjBike Weight
              | CObjDog Size

Alternatively (as used in some ML-dialects):

data CObjKind = CObjPerson CObjPersonProps
              | CObjBike   CObjBikeProps
              | CObjDog    CobjDogProbs

data CObjPersonProps = CObjPersonProps
    { copAge :: Age
    , copGender :: Gender
    }
…

CS240h: Functional Systems in Haskell

Note: the following slides have been adapted (with permission) from Stanford's CS240h: Functional Systems in Haskell class.
Highly recommended: Go over the lecture notes to read up on everything we couldn't cover in this course!

ADTs: Interpreter

Let's create a very small fragment of a programming language:

data Expr = Num Int             -- atom
          | Str String          -- atom
          | Op BinOp Expr Expr  -- compound
          deriving (Show)

data BinOp = Add | Concat
             deriving (Show)

And an interpreter for it:

interp x@(Num _)                     = x
interp x@(Str _)                     = x
interp (Op Add a b)                  = Num (i a + i b)
  where i x = case interp x of Num a -> a
interp (Op Concat (Str a) (Str b))   = Str (a ++ b)

ADTs: Does it work?

Our very quick round of prototyping gave us a tiny interpreter that actually seems to work:

>> interp (Op Add (Num 2) (Num 3))
Num 5

Please help me to spot some problems with my interpreter!

ADTs: Two sides of the same problem

We can construct ill-formed expressions ("add a Num to a Str").
Our interpreter crashes on these expressions, because we (quite reasonably) didn't take their possible existence into account.

ADTs: Watch your language!

Here's a slightly modified version of our language:

data Expr a = Num Int
            | Str String
            | Op BinOp (Expr a) (Expr a)
              deriving (Show)

-- This is unchanged.
data BinOp = Add | Concat
             deriving (Show)

We've introduced a type parameter here...

...But we never actually use it to represent a value of whatever type a is.

Let's see where that takes us.

Phantoms: Some modifications to our interpreter

Here is our modified interpreter.

interp x@(Num _)       = x
interp x@(Str _)       = x
interp (Op Add a b)    = Num (i a + i b)
  where i x = case interp x of Num a -> a
interp (Op Concat a b) = Str (i a ++ i b)
  where i x = case interp x of Str y -> y

Our only change is to apply interp recursively if we're asked to perform a Concat.

We could have done this in our original interpreter, so that can't be the real fix. But what is?

What's the type of the rewritten interp?

Phantoms: Our new type

The interpreter function now has this type:

interp :: Expr a -> Expr a

But we know from the definitions of Expr and BinOp that we never use a value of type a. Then what purpose does this type parameter serve?

Recall the type of Expr:

data Expr a = ...
            | Op BinOp (Expr a) (Expr a)

Phantoms: Some context

Let's think of that a parameter as expressing our intent that:

The operands of an Op expression should have the same types.
The resulting Expr value should also have this type.

data Expr a = ...
            | Op BinOp (Expr a) (Expr a)

In fact, the type system will enforce these constraints for us.

Phantoms: Building blocks

The first step in making all of this machinery work is to define some functions with the right types.

These two functions will construct atoms in our language:

num :: Int -> Expr Int
num = Num

str :: String -> Expr String
str = Str

Phantoms: Applying operators safely

These two functions construct compound expressions:

add :: Expr Int -> Expr Int -> Expr Int
add = Op Add

cat :: Expr String -> Expr String -> Expr String
cat = Op Concat

Notice that each one enforces the restriction that its parameters must be compatible.

Export Control: A trusted computing base

One we have our functions defined, the last step is to lock our world down.

Here's what the beginning of my module looks like:

module Interp
    ( Expr,       -- type constructor
      interp,     -- interpreter
      num, str,   -- atom constructors
      add, cat,   -- expression constructors
    ) where

Notice that we've exercised careful control over what we're exporting.

We export the Expr type constructor, but none of its value constructors.
Users of our module don't need BinOp, so we don't export that at all.

Export Control: More about our type and export choices

Consequences of exporting only the type constructor for Expr:

Clients cannot use the value constructors to create new values.
The only way for a client to construct expressions is using our handwritten "smart constructor" functions with their carefully chosen types.
Clients cannot pattern-match on an Expr value. Our internals are opaque; we could change our implementation without clients being able to tell.

These are in fact the completely standard techniques for creating abstract data types in Haskell. So where does the type parameter come in?

Phantoms: Consequences of that type parameter

Due to our judicious use of both abstraction and that type parameter:

Clients cannot construct ill-formed expressions. Any attempts will be rejected by the type checker.

This additional safety comes "for free":

We don't need runtime checks for ill-formed expressions, because they cannot occur.
Our added type parameter never represents data at runtime, so it has zero cost when the program runs.

Phantoms: Definition

When we refer to a type parameter on the left of a type definition, without ever using values of that type on the right, we call it a phantom type.

We're essentially encoding compile-time data using types, and the compiler computes with this data before our program is ever run.

GADTs: Alternative Approach

Take the following Expr type:

data Expr a = I Int
            | B Bool
            | Add (Expr a) (Expr a)
            | Eq  (Expr a) (Expr a)

…and we'd like to have:

eval :: Expr a -> a

But alas, this does not work.

GADTs: Alternative Approach

However, that's what Generalized Algebraic Data Types are for:

data Expr a where
    I   :: Int  -> Expr Int
    B   :: Bool -> Expr Bool
    Add :: Expr Int -> Expr Int -> Expr Int
    Eq  :: Expr Int -> Expr Int -> Expr Bool

eval :: Expr a -> a
   eval (I n) = n
   eval (B b) = b
   eval (Add e1 e2) = eval e1 + eval e2
   eval (Eq  e1 e2) = eval e1 == eval e2

Why does this work?

Access Control: IORefs

We've already seen the IORef type, which gives us mutable references:

import Data.IORef

newIORef    :: a -> IO (IORef a)

readIORef   :: IORef a -> IO a
writeIORef  :: IORef a -> a -> IO ()

modifyIORef :: IORef a -> (a -> a) -> IO ()

Access Control: Managing mutation

Application writers are often faced with a question like this:

I have a big app, and parts of it need their behaviour tweaked by an administrator at runtime.

There are of course many ways to address this sort of problem.

Let's consider one where we use a reference to a piece of config data.

Any code that's executing in the IO monad can, if it knows the name of the config reference, retrieve the current config:

curCfg <- readIORef cfgRef

The trouble is, ill-behaved code could clearly also modify the current configuration, and leave us with a debugging nightmare.

Access Control: Phantom types to the rescue!

Let's create a new type of mutable reference.

We use a phantom type t to statically track whether a piece of code is allowed to modify the reference or not.

import Data.IORef

newtype Ref t a = Ref (IORef a)

Remember, our use of newtype here means that the Ref type only exists at compile time: it imposes no runtime cost.

Since we are using a phantom type, we don't even need values of our access control types:

data ReadOnly
data ReadWrite

We're already in a good spot! Not only are we creating compiler-enforced access control, but it will have zero runtime cost.

Access Control: Creating a mutable reference

To create a new reference, we just have to ensure that it has the right type.

newRef :: a -> IO (Ref ReadWrite a)
newRef a = Ref `fmap` newIORef a

Access Control: Reading and writing a mutable reference

Since we want to be able to read both read-only and read-write references, we don't need to mention the access mode when writing a type signature for readRef.

readRef :: Ref t a -> IO a
readRef (Ref ref) = readIORef ref

Of course, code can only write to a reference if the compiler can statically prove (via the type system) that it has write access.

writeRef :: Ref ReadWrite a -> a -> IO ()
writeRef (Ref ref) v = writeIORef ref v

Access Control: Converting a reference to read-only

This function allows us to convert any kind of reference into a read-only reference:

readOnly :: Ref t a -> Ref ReadOnly a
readOnly (Ref ref) = Ref ref

In order to prevent clients from promoting a reference from read-only to read-write, we do not provide a function that goes in the opposite direction.

We also use the familiar technique of constructor hiding at the top of our source file:

module Ref
    ( Ref, -- export type ctor, but not value ctor
      newRef, readOnly,
      readRef, writeRef
    ) where

Forall: Implicit quantifier

What does this type signature mean?

something :: a -> a

We know that for all possible types a, this function accepts a value of that type, and returns a value of that type.

We clearly cannot enumerate all possible types, so we equally clearly cannot create all (or indeed any) values of these types.

Therefore, if we exclude crashes and infinite loops, the only possible behaviour for this function is to return its input.

Forall: Being even more explicit: quantifiers

In fact, Haskell provides a keyword, forall, to make this quantification over type parameters more explicit:

something :: forall a. a -> a

The same "universal quantification" syntax works with typeclass constraints:

something :: forall a. (Show a) -> String

Here, our quantifier is "for all types a, where the only thing we know about a is what the Show typeclass tells us we can do".

These forall keywords are implied if they're not explicitly written.

Containing Handles: Building blocks

Here are some typical functions that a low-level database library will provide, for clients that have to modify data concurrently:

begin    :: Connection -> IO Transaction
commit   :: Transaction -> IO ()
rollback :: Transaction -> IO ()

We can create a new transaction with begin, finish an existing one with commit, or cancel one with rollback.

Typically, once a transaction has been committed or rolled back, accessing it afterwards will result in an exception.

Containing Handles: Shaky foundations build a shaky house

Clearly, these constructs make it easy to inadvertantly write bad code.

oops conn = do
  txn <- begin conn
  throwIO (AssertionFailed "forgot to roll back!")
  -- also forgot to commit!

We can avoid rollback and commit forgetfulness with a suitable combinator:

withTxn :: Connection -> IO a -> IO a
withTxn conn act = do
  txn <- begin conn
  r <- act `onException` rollback txn
  commit txn
  return r

All right! The code running in act never sees a Transaction value, so it can't leak a committed or rolled back transaction.

Containing Handles: But still...

We're not out of the woods yet!

High-performance web apps typically use a dynamically managed pool of database connections.

getConn :: Pool -> IO Connection
returnConn :: Pool -> Connection -> IO ()

It's a major bug if a database connection is not returned to the pool at the end of a handler.

So we write a combinator to handle this for us:

withConn :: Pool -> (Connection -> IO a) -> IO a
withConn pool act =
  bracket (getConn pool) (returnConn pool) act

Nice and elegant. But correct? Read on!

Containing Handles: Connections vs transactions

In a typical database API, once we enter a transaction, we don't need to refer to the handle we got until we either commit or roll back the transaction.

So it was fine for us to write a transaction wrapper like this:

withTxn :: Connection -> IO a -> IO a

On other other hand, if we're talking to a database, we definitely need a connection handle.

query :: Connection -> String -> IO [String]

So we have to pass that handle into our combinator:

withConn :: Pool -> (Connection -> IO a) -> IO a

Containing Handles: Handle Leakage

Unfortunately, since withConn gives us a connection handle, we can defeat the intention of the combinator (sometimes accidentally).

What is the type of this function?

evil pool = withConn pool return

Containing Handles: Phantom types to the rescue!

Here, we are using the newtype keyword to associate a phantom type with the IO monad.

newtype DB c a = DB {
      fromDB :: IO a
    }

We're going to run some code in the IO monad, and pass around a little extra bit of type information at compile time.

Let's create a phantom-typed wrapper for our earlier Connection type:

newtype SafeConn c = Safe Connection

Where are these phantom types taking us?

Containing Handles: Safe querying

The easiest place to start to understand with a little use of our new code, in the form of a function we'll export to clients.

This is just a wrapper around the query function we saw earlier, making sure that our newtype machinery is in the right places to keep the type checker happy.

safeQuery :: SafeConn c -> String -> DB c [String]
safeQuery (Safe conn) str = DB (query conn str)

Notice that our phantom type c is mentioned in both our uses of SafeConn c and DB c: we're treating it as a token that we have to pass around.

Our library will not be exporting the value constructors for SafeConn or DB to clients. Once again, this newtype machinery is internal to us!

Containing Handles: Giving a client a connection from a pool

Here, we'll use our earlier exception-safe withConn combinator. Recall its type:

withConn :: Pool -> (Connection -> IO a) -> IO a

To make it useful in our new setting, we have to wrap the Connection, and unwrap the DB c that is our act to get an action in the IO monad.

withSafeConn pool act =
  withConn pool $ \conn ->
    fromDB (act (Safe conn))

It's not at all obvious what this is doing for us until we see the type of withSafeConn.

Containing Handles: higher-rank `forall`

{-# LANGUAGE Rank2Types #-}

withConnection :: Pool
               -> (forall c. SafeConn c -> DB c a) 
               -> IO a

We've introduced a universal quantifier (that forall) into our type signature. And we've added a LANGUAGE pragma!

Let's not worry about those details just yet. What does our signature seem to want to tell us?

We accept a Pool.
And an "I have a connection, so I can talk to the database now" action that accepts a SafeConn c, returning a value a embedded in the type DB c.

Not so scary after all. Well, except for the details we're ignoring.

Containing Handles: Universal quantification to the rescue!

Let's start with the obviously bothersome part of the type signature.

(forall c. SafeConn c -> DB c a)

This is the same universal quantification we've seen before, meaning:

Our "I can haz connection" action must work over all types c.
The scope of c extends only to the rightmost parenthesis here.

Putting it back into context:

withConnection :: Pool
               -> (forall c. SafeConn c -> DB c a) 
               -> IO a

The type variable c can't escape from its scope, so a cannot be related to c.

Containing Handles: No Escape

withConnection :: Pool
               -> (forall c. SafeConn c -> DB c a) 
               -> IO a

Because SafeConn c shares the same phantom type as DB c, and the quantified c type cannot escape to the outer IO, there is no way for a SafeConn c value to escape, either!

In other words, we have ensured that a user of withConnection cannot either accidentally allow or force a connection to escape from the place where we've deemed them legal to use.

Forall: Rank-2 types

Standard Haskell types and functions have just one scope for universal quantification.

foo :: forall a b. a -> b -> a

When an extra level of scoping for universal quantification is introduced, this is called a rank-2 type.

fnord :: forall b. (forall a. a -> a) -> b

(Normal types are thus called rank-1 types.)

Although widely used, rank-2 types are not yet a part of the Haskell standard, hence our use of a pragma earlier:

{-# LANGUAGE Rank2Types #-}

`ST` Monad: Containing Mutations in Pure Computations

import Control.Monad.ST

This defines for us a function with a glorious rank-2 type:

>> :t runST
runST :: (forall s. ST s a) -> a

Since we've only just been introduced to rank-2 types, we know exactly what this implies:

What happens in the ST monad stays in the ST monad.
Nevertheles, we can obtain a pure result when we run an action in this monad. That's an exciting prospect!

`ST` Monad: Mutable references, `ST` style

The STRef type gives us the same mutable references as IORef, but in the ST monad.

import Control.Monad.ST
import Data.STRef

whee :: ST s Int
whee z = do
  r <- newSTRef z
  modifySTRef r (+1)
  readSTRef r

Let's try this in ghci:

>> runST (whee 1)
2

Thanks to chaining of the universally quantified s, there is no way for an STRef to escape from the ST monad, save by the approved route of reading its current value with readSTRef.

newSTRef  :: a -> ST s (STRef s a)
readSTRef :: STRef s a -> ST s a

`ST` Monad: Arrays and vectors

For working with large collections of uniform data, the usual representation in most languages is an array.

You can use the vector package for that in Haskell:

vector provides a vastly richer API than array.
A Vector is one-dimensional and indexed by Ints counting from zero, so it's easy to reason about.
An Array is indexed by an instance of the Ix class, can have arbitrary bounds, and makes my brain hurt.

`ST` Monad: Families and flavours of vectors

The vector package provides two "flavours" of vector type:

Vector types are immutable.
MVector types can be modified in either the ST or IO monad, and cannot be read by purely functional code.

Within these flavours, there are two "families" of vector type:

Unboxed vectors are tightly packed in contiguous memory. They are very fast, but it is only possible to create unboxed vectors of certain types, and an unboxed vector can't store thunks.
Normal vectors are boxed, just like ordinary Haskell values. Any value can be stored in a plain old vector, at the cost of an additional level of indirection.

We can thus have an immutable unboxed vector, a mutable boxed vector, and so on.

`ST` Monad: Mutable vectors in action

The classic Haskell implementation of a "quicksort":

import Data.List (partition)

qsort (p:xs) = qsort lt ++ [p] ++ qsort ge
  where (lt,ge) = partition (<p) xs
qsort _      = []

This isn't really a quicksort, because it doesn't operate in-place.

We can apply our newfound knowledge to this problem:

import qualified Data.Vector.Unboxed.Mutable as V
import Control.Monad.ST (ST)

quicksort :: V.MVector s Int -> ST s ()
quicksort vec = go 0 (V.length vec)
  where
    {- ... -}

`ST` Monad: The recursive step

    recur left right
      | left >= right = return ()
      | otherwise     = do
      idx <- partition left right
             (left + (right-left) `div` 2)
      recur left (idx-1)
      recur (idx+1) right

`ST` Monad: Partitioning the vector

(Remember, vec is in scope here.)

    partition left right pivotIdx = do
      pivot <- V.read vec pivotIdx
      V.swap vec pivotIdx right
      let loop i k
        | i == right = V.swap vec k right >> return k
        | otherwise  = do
          v <- V.read vec i
          if v < pivot
            then V.swap vec i k >>
                 loop (i+1) (k+1)
            else loop (i+1) k
      loop left left

`ST` Monad: From immutable to mutable, and back

We can even use this in-place sort to efficiently perform an in-place sort of an immutable array!

Our building blocks:

thaw   :: Vector a -> ST s (MVector s a)
create :: (forall s. ST s (MVector s a)) -> Vector a

thaw creates a new mutable vector, and copies the contents of the immutable vector into it.
create runs an ST action that returns a mutable vector, and "freezes" its result to be immutable, and hence usable in pure code.

import qualified Data.Vector.Unboxed as U

vsort :: U.Vector Int -> U.Vector Int
vsort v = U.create $ do
            vec <- U.thaw v
            quicksort vec
            return vec

`ST` Monad: Mutability, purity, and determinism

The big advantage of the ST monad is that it gives us the ability to efficiently run computations that require mutability, while both the inputs to and results of our computations remain pure.

In order to achieve this, we sacrifice some power:

We can't run arbitrary IO actions. No database accesses, no filesystem, etc.
Other potential sources of nondeterminism (e.g. threads) are thus also off limits.

VU 183.653 Methodical, industrial Software-Engineering using the Haskell functional programming language

Last Time

Today's Topics

Code Styling Guidelines

Benefits of a (Good) Typesystem

Exhaustive Case Analysis

Exhaustive Case Analysis

Typeclasses

Typeclasses

Make Illegal States Unrepresentable

Make Illegal States Unrepresentable

Make Illegal States Unrepresentable

Make Illegal States Unrepresentable

Make Illegal States Unrepresentable

CS240h: Functional Systems in Haskell

ADTs: Interpreter

ADTs: Does it work?

ADTs: Two sides of the same problem

ADTs: Watch your language!

Phantoms: Some modifications to our interpreter

Phantoms: Our new type

Phantoms: Some context

Phantoms: Building blocks

Phantoms: Applying operators safely

Export Control: A trusted computing base

Export Control: More about our type and export choices

Phantoms: Consequences of that type parameter

Phantoms: Definition

GADTs: Alternative Approach

GADTs: Alternative Approach

Access Control: IORefs

Access Control: Managing mutation

Access Control: Phantom types to the rescue!

Access Control: Creating a mutable reference

Access Control: Reading and writing a mutable reference

Access Control: Converting a reference to read-only

Forall: Implicit quantifier

Forall: Being even more explicit: quantifiers

Containing Handles: Building blocks

Containing Handles: Shaky foundations build a shaky house

Containing Handles: But still...

Containing Handles: Connections vs transactions

Containing Handles: Handle Leakage

Containing Handles: Phantom types to the rescue!

Containing Handles: Safe querying

Containing Handles: Giving a client a connection from a pool

Containing Handles: higher-rank forall

Containing Handles: Universal quantification to the rescue!

Containing Handles: No Escape

Forall: Rank-2 types

ST Monad: Containing Mutations in Pure Computations

ST Monad: Mutable references, ST style

ST Monad: Arrays and vectors

ST Monad: Families and flavours of vectors

ST Monad: Mutable vectors in action

ST Monad: The recursive step

ST Monad: Partitioning the vector

ST Monad: From immutable to mutable, and back

ST Monad: Mutability, purity, and determinism

Further Reading

Containing Handles: higher-rank `forall`

`ST` Monad: Containing Mutations in Pure Computations

`ST` Monad: Mutable references, `ST` style

`ST` Monad: Arrays and vectors

`ST` Monad: Families and flavours of vectors

`ST` Monad: Mutable vectors in action

`ST` Monad: The recursive step

`ST` Monad: Partitioning the vector

`ST` Monad: From immutable to mutable, and back

`ST` Monad: Mutability, purity, and determinism