Static types are dangerously interesting

栏目: IT技术 · 发布时间: 6年前

内容简介：Static type systems allow you to forbid certain bad behaviours from happening at runtime.That’s a good thing.However, I consistently underestimate just how difficult they can be to wield, even with cases as simple as the mismatched units in the funny signp

Static type systems allow you to forbid certain bad behaviours from happening at runtime.

That’s a good thing.

However, I consistently underestimate just how difficult they can be to wield, even with cases as simple as the mismatched units in the funny signpost above.

This post will take you through a tiny, real-world example, showing how you can use static types to swap one problem for another.

There will be code. Not much, but some. It’s in Haskell though, so apologies to 95% of you. And it’s in my ugly beginner’s Haskell, so apologies to the other 5% too.

The context

I work in finance. That means I go to the office, take off my coat, unlock my PC, then twist the legs off puppiesuntil my hands cramp. I then use what time in the day remains to write programs that handle money.

Within those programs I really don’t want to accidentally add $3 to £4. If you logged on to your internet banking and saw a balance of £$7, you’d probably throw your laptop out of the window and Forrest Gump your way to a cave in the mountains, never to return.

I would hazard a guess that most - if not all - Haskellites would tell you that with static types, preventing these kinds of errors is a walk in the park. And on that walk in the park you should avoid eye contact with any disheveled python programmers you come across, sitting cross-legged in puddles eating clods of mud with one hand and counting their dollarypounds with the other.

Anyway - I tried to use Haskell to write some pure functional code which prevents amounts of different currencies from being added together. It sounds easy but I couldn’t do it.

This is my first attempt at writing Haskell so the problem is quite likely that I suck. But I’m not ruling out the possibility that I suck and also this is far more difficult than it first seems.

Let’s begin.

The problem

I’d like model a list of trades. The data looks like this:

Quantity, Ticker, Price, Currency
100,      VOD.L,  1,     GBP
200,      VOD.L,  2,     GBP
300,      AAPL.O, 3,     USD
50,       4151.T, 5,     JPY

The first line here means: a trade of 100 shares of VOD.L were bought at a price of 1 GBP .

Then, I want to write a function which calculates the total notional in each currency . The word notional is a fancy way of saying price * quantity . Think of it as “value of the thing that changed hands”.

For illustration, the function signature might look something like this:

sumNotionals :: [Trade] -> Map Currency Rational

In English, it’s a function that takes a list of trades and returns a map from currency to quantity.

In this example the output would be:

Currency, Notional
GBP,      500
USD,      900
JPY,      250

Where GBP notional was calculated as 100*1 + 200*2 = 500 , USD is 300*3 = 900 , and JPY is 50*5 = 250

Nothing too complicated there, so let’s model our data. Recall a line of our input:

Quantity, Price, Currency, Ticker
100,      1,     GBP,      VOD.L

All fields aside from price are quite straightforward:

type Quantity = Int

data Currency
  = USD
  | GBP
  | JPY
  deriving (Eq, Ord, Show)

newtype Ticker = Ticker String

The only thing left is the price: money. I’m going to implement a money type, along with addition, multiplication, and the sumNotionals function in three different ways.

Each time I’ll use stronger typing than the last in an effort to reduce the space of possible runtime errors and ascend to typing nirvana.

Attempt 1 - partially correct

My first attempt is what I imagine the majority of (non-Haskell) programmers would come up with - each Money instance carries around with it a Currency , and when you try add two Money s together there is a runtime check that they have the same currency, and if there’s a mismatch an error is thrown. In Haskell it looks like this:

data Money = Money
    { mQty :: Rational
    , mCcy :: Currency
    }

plus :: Money -> Money -> Money
plus (Money qty1 ccy1) (Money qty2 ccy2)
  | ccy1 == ccy2 = Money (qty1 + qty2) ccy1
  | ccy2 /= ccy2 = error $ "Currency mismatch"

mul :: Money -> Rational -> Money
mul m1 v = Money (mQty m1 * v) (mCcy m1)

-- This will compile and run fine:
x = Money 1 GBP `plus` Money 2 GBP == Money 3 GBP

-- This will compile but will throw an error when evaluated
x = Money 1 GBP `plus` Money 2 USD

That looks pretty good - the code won’t produce incorrect values. And it’s easy to model our trades with this too:

data Trade = Trade
    { tQty    :: Rational
    , tPrice  :: Money
    , tTicker :: Ticker
    }

And only slightly harder to write our final function, sumNotionals :

import qualified Data.Map.Strict as M

sumNotionals :: [Trade] -> M.Map Currency Money
sumNotionals trades =
  let notional trade = tPrice trade `mul` tQty trade
      notionals = fmap notional trades
      currencies = fmap mCcy notionals
  in M.fromListWith plus $ zip currencies notionals

This is very concise, and if you’re comfortable reading Haskell, quite legible. It maps the list of trades to their individual notionals, zips that list with the currencies, then builds that into a map with the currency as the key, with any duplicate values are combined with our plus function.

This is all well and good aside from one problem: we’ve used a plus function which might throw an exception. That’s pretty poor - we set out to write a program without errors but have written one which might explode.

So can we do better? Lets see.

Attempt 2 - either works for me

The problem with our implementation so far is that the return type of the plus function is a lie. It says the function will return a Money , but we know that in the currency-mismatch case there might not be a Money that can be returned. So we can improve the return type by explicitly modelling this:

data Mismatch =
  Mismatch Currency Currency

type Result = Either Mismatch Money

Now we can write a plus function which will never fail at runtime:

plus :: Money -> Money -> Result
plus (Money qty1 ccy1) (Money qty2 ccy2)
  | ccy1 == ccy2 = Right $ Money (qty1 + qty2) ccy1
  | ccy1 /= ccy2 = Left $ Mismatch ccy1 ccy2

That’s great, right? Well, not quite, because when we try implement sumNotionals we end up with a bit of a mess:

eitherPlus :: Result -> Result -> Result
eitherPlus (Right m1) (Right m2) = plus m1 m2
eitherPlus (Left e) _            = Left e
eitherPlus _ (Left e)            = Left e

sumNotionals :: [Trade] -> Either Mismatch (M.Map Currency Money)
sumNotionals trades =
  let notional trade = Right $ tPrice trade `mul` tQty trade
      notionals = fmap notional trades
      currencies = fmap (mCcy . tPrice) trades
  in sequence $ M.fromListWith eitherPlus $ zip currencies notionals

That’s pretty ugly so let’s figure out why.

Our plus :: Money -> Money -> Result function wasn’t easy to compose because its output type is different to its input type. So we needed to define our own function- eitherPlus - which uses our Result in both the input and the output.

With that done, the implementation is quite similar to the previous one, with the main difference being the addition of a call to sequence which hoists the errors out of the map.

But something quite subtle has gone wrong. Look closely at the function signature:

[Trade] -> Either Mismatch (M.Map Currency Money)

We are forcing the caller to handle a currency mismatch even though there is no possible combination of inputs to our function which can cause it to happen .

This is really interesting: with our Attempt 1, plus was lying but sumNotionals was telling the truth. Now, plus is telling the truth but sumNotionals is lying.

When wrote plus the compiler was our friend, helpfully making sure we write code to handle all inputs. But as soon as we slightly scaled up our problem to sumNotionals , it turned against us. It isn’t smart enough to realise that grouping by currency means all the values under a given key have the same currency, and so dogmatically insists we write code which will never be executed.

Tradeoffs in action!

But let’s continue. Perhaps we can require that anyone calling our plus function first proves that the two currencies match. That way these ugly Either s should disappear entirely and we’re home free.

Attempt 3 - kindness begets kindness

As we just saw, the aim now is to prove to the compiler that any time we add two Money ’s together, they definitely have the same currency. We’ll need to somehow bring the currency into the type - perhaps something which looks like MoneyGBP and MoneyUSD . Luckily there’s some prior art to draw on here - the safe-money library. This approach relies on some extensions:

{-# LANGUAGE DataKinds      #-}
{-# LANGUAGE KindSignatures #-}

If you’re unfamiliar with what we mean by “kind”, it’s just the type of a type. If you want to learn more then read this excellent blog post.

Anyway - now you know all about kinds, we can define a Money type like so:

newtype Money (currency :: Currency) =
  Money { unMoney :: Rational }

The kind syntax is a bit unfamiliar, but all it is saying is that the Money type needs an argument (a type which is a Currency ) before it can be used. In other words, its kind is Currency -> * .

This Money type can be used like so:

-- This compiles!
x = Money 42 :: Money GBP

-- This doesn't compile, because Int isn't a Currency
y = Money 42 :: Money Int

Easy! And the definition of type-safe plus and multiply is exactly as you’d expect:

plus :: Money ccy -> Money ccy -> Money ccy
plus m1 m2 = Money (unMoney m1 + unMoney m2)

multiply :: Money ccy -> Rational -> Money ccy
multiply m1 v = Money (unMoney m1 * v)

And we finally have the compile-time check we wanted!

gbp = Money 123 :: Money GBP
usd = Money 345 :: Money USD

thisCompiles = gbp `plus` gbp

thisDoesNotCompile = gbp `plus` usd

--     * Couldn't match type 'USD with 'GBP
--      Expected type: Money 'GBP
--        Actual type: Money 'USD
--    * In the second argument of `plus', namely `usd'
--      In the expression: gbp `plus` usd
--      In an equation for `thisDoesNotCompile': thisDoesNotCompile = gbp `plus` usd

This is all fine and dandy…until we try define our Trade data type again, using this new Money definition. We realise that we need to pull the currency up into the type signature of the trade too:

data Trade (ccy :: Currency) =
  Trade
    { tQty    :: Rational
    , tPrice  :: Money ccy
    , tTicker :: Ticker
    }

Anywhere in our program that we have a trade, we need to statically prove its currency. How can you possibly do that, when you might be reading unknown data from the outside world? Well, we could make a new type for that case:

data UnknownTrade
  = UsdTrade (Trade USD)
  | GbpTrade (Trade GBP)
  | JpyTrade (Trade JPY)

This seems reasonable, as we now have a way to represent trades where the currency is unknown. But if we then want to do anything with it, we’ll need to pattern-match into each individual trade+currency type. Let’s put this aside for now.

Anyway, let’s implement our sumNotionals function one last time.

The first thing to do is figure out the type signature. The input side is easy - it should take a [UnknownTrade] . But the output side is more difficult - we need a “map where the keys are Currency s and the values are Money ccy s, where ccy is the same currency as is in the key”. This is an example of dependent types , which aren’t well supported in Haskell.

Unfortunately, this is where my Haskell skills fall short. If I were to guess the appropriate syntax, it would be something like this:

sumNotionals :: [UnknownTrade] -> Map (ccy :: Currency) (Money ccy)

That isn’t valid Haskell, though I’d be curious as to whether the dependent-map package could be used to build a substitute. My guess would be that it could, but with my writing-production-code hat on I can feel myself beginning to tremor at the prospect of introducing something this advanced when other reasonable alternatives exist.

And returning to our UnknownTrade type - it has as many cases as the number of currencies we’re modelling. In the real world there are a couple of hundred, and pattern matches that long are obviously unpalateable.

Is there a way of abstracting over this? I don’t know, but am secretely hoping someone will tell me!

Wrapping up

Well for a start we didn’t make it to nirvana. But it’s the journey that counts, right?

We started off implementing Money naively with a partial function. The implementation was concise and easy to read but would allow us to write code which could break at runtime.

We then made the error explicit in the return type, making the function total - it can no longer break as it forces the caller to handle the possibility that bad data was passed in. However, this lead us to forcing the caller into handling errors which can never happen.

Finally, we dipped our toes into a more advanced corner of the type system by using data kinds. It’s incredibly neat when dealing with hard-coded data, but as soon as you pull in data from the outside world and process it in interesting ways, the type-level proofs quickly become cumbersome.

So - if you came across this problem in production, which approach would you take? Personally, I think I would use the Money definition from #1 and #2, and implement both of their addition functions:

data Money =
  Money
    { mQty :: Rational
    , mCcy :: Currency
    }

unsafePlus :: Money -> Money -> Money
unsafePlus (Money qty1 ccy1) (Money qty2 ccy2)
  | ccy1 == ccy2 = Money (qty1 + qty2) ccy1
  | ccy2 /= ccy2 = error $ "Currency mismatch"

plus :: Money -> Money -> Either Mismatch Money
plus (Money qty1 ccy1) (Money qty2 ccy2)
  | ccy1 == ccy2 = Right $ Money (qty1 + qty2) ccy1
  | ccy1 /= ccy2 = Left $ Mismatch ccy1 ccy2

Then the caller can choose which method to use. Are they absolutely certain the currencies match, like in our sumNotionals example? Use unsafePlus . Is there some doubt? Use plus .

The general theme here - which you’ll recognise if you’ve used static typing yourself - is that choosing how to model your problem is hard . There are lotsof sensible ways of doing it and they each have their own tradeoffs.

And here we come onto the danger that static types pose to practical software engineering.

The challenge of finding the perfect types isn’t just hard, it’s also really interesting . This makes it all too easy to get sucked in to solving the meta-problem of writing beautiful, type-safe code, losing sight of the business problem that is, ultimately, all that matters.

We should all learn about purity, exceptions, monads, data kinds and dependent types - they will make us better programmers.

But don’t forget - we’re here to build systems, not to write proofs.

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持码农网

查看所有标签

猜你喜欢:

Static types are dangerously interesting

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

分布式机器学习：算法、理论与实践

刘铁岩、陈薇、王太峰、高飞 / 机械工业出版社 / 2018-10-20 / 89.00

人工智能和大数据时代，解决最有挑战性问题的主流方案是分布式机器学习！本书旨在全面介绍分布式机器学习的现状，深入分析其中的核心技术问题，并且讨论该领域未来的发展方向。由微软亚洲研究院机器学习核心团队潜心力作！鄂维南院士、周志华教授倾心撰写推荐序！本书旨在全面介绍分布式机器学习的现状，深入分析其中的核心技术问题，并且讨论该领域未来的发展方向。全书共12章。第1章是绪论，向大家展......一起来看看《分布式机器学习：算法、理论与实践》这本书的介绍吧!

码农工具