Dynamic Programming – basic examples

The key idea in Dynamic Programming (DP) is to generate a set of subproblems from a larger problem, and then use recursion or a bottoms up approach to combine the subproblem results to solve the original problem.

Recursion lends itself nicely to DP as the program stack itself becomes the mechanism to generate, traverse and combine the subproblems, and memoization is used to prevent re-solving the same subproblem if there’s overlap.

Let’s think about the classic recursive Fibonacci algorithm.

def fib_rec(n):
    if n<2: return n
    return fib_rec(n-1) + fib_rec(n-2)

The subproblems is calculating each number as the sum of the prior two. We use the program stack to generate these nodes depth first, with the base cases being the numbers in positions 0 and 1. Combing these sub-results is straightforward – we add them all the way back up the stack.

Now we just have to optimise with memoization to prevent re-calculating nodes.

def fib_dyn(n,memo={}):
    if n<2: return n
    if len(memo)>n:
        return memo[n]
    else: 
        fibn = fib_dyn(n-1,memo) + fib_dyn(n-2,memo)
        memo[n] = fibn
        return fibn

DP can also be done without recursion, via a bottoms up approach while adhering to the same principles.

def fib_bot_up(n):
    f=[]
    for k in range(n+1):
        if k<2: f.append(k)
        else: f.append(f[k-1]+f[k-2])
    return f[k]

You can see that in the bottoms up approach we start with the base cases, and move up the sub-results until we get to the desired position in the sequence. We could optimize further by keeping track of only the prior two numbers in our loop.

Let’s think about another simple problem: Let’s say you can climb a staircase of n steps by one or two or three steps at a time. How many ways are there to get to the top?

First, let’s think about the subproblems. We’ll call step zero as the top step – the base case. Once I’m there, there’s only one way to climb the staircase. From any step, I can get to the top by adding the ways to get to the top from 1 step up, 2 steps up and 3 steps up from where I am. Like the Fibonacci sequence, these subproblems can be nicely generated by the program stack using recursion.

def countways_rec(n):
    if n<0: return 0
    if n==0: return 1
    return countways_rec(n-1)+countways_rec(n-2)+countways_rec(n-3)

Now we add memoization:

def countways_dyn(n,memo={}):
    if n<0: return 0 
    if n==0: return 1
    if len(memo)>n:
        return memo[n]
    else:
        return countways_dyn(n-1,memo)+countways_dyn(n-2,memo)+countways_dyn(n-3,memo)

Now a bottoms up approach. Again, we start with the base case and build up from there. We can keep a lookup table to keep track of the previously calculated values that we’re building our final result on.

def countways_bottomsup(n):
    lookup = {}
    lookup[0]=1
    lookup[1]=1
    lookup[2]=2
    for step in range(3,n+1):
        lookup[step]=lookup[step-1]+lookup[step-2]+lookup[step-3]
    return lookup[n]

These are the basic principles of all DP problems. Figuring out the subproblems, and how to generate and combine them can get trickier in some problems. I’ll explore some of these in subsequent posts.

For completeness, here’s a solution of the steps problem with a variable number of steps that can be jumped.

def countways_bottomsup_varjumps(n,jumps):
    lookup = {}
    lookup[0]=1
    for step in range(1,n+1):
        lookup[step]=0
        for jump in jumps:
            if step-jump >= 0: lookup[step]=lookup[step]+lookup[step-jump]
    return lookup[n]

A simple Monte Carlo simulation in Python

Monte Carlo is a simulation method that can be useful in solving problems that are difficult to solve analytically.

Here’s an interesting application of the technique to estimate the value of pi.

Consider a circular dartboard placed against a square backing, with the sides of the square perfectly touching the circle tangentially at the top, bottom, left and right.

If you throw a dart blindly so it lands inside of the square, what are the chances the dart has landed inside the circle?

Given a circle radius of r, the chances are:


= area of circle / area of square
= (pi * r ^ 2) / (4 * r ^ 2)
= pi / 4

So the chances of hitting the circle are pi divided by 4.   In other words:

pi = 4 * the probability of hitting the circle.

So we can set up a Monte Carlo simulation that will give us the probability of getting the dart inside of the circle, and then we’ll be able to derive an approximation of the value of pi.

To do this, let’s think of the quarter circle below:

Note that the chances of hitting the quarter circle inside of its square is the same as hitting the larger circle inside of the bigger square.  

How can we easily simulate hitting the quarter circle?  We can obtain two random numbers between  0 and 1, representing the x and y coordinates of a point with the origin at the circle’s center.  If the distance between this point and the origin is less than 1, then we’re inside the quarter circle.  Otherwise we’re outside it.  

If we simulate this enough times, and multiply the percentage of times we’re inside the circle by 4, we should converge on the value of pi. Here’s python shell code that simulates this:

>>> from random import random
>>> len(filter(lambda x:(random()**2+random()**2)**.5<1,range(1000000)))/1000000.0*4
3.14064

Generating prime numbers using Numpy

I’ve been looking at generating primes, and using various element-wise operations in Numpy arrays to do so. To cut the the chase, prime6 below is the fastest implementation. The ones before that tell the story of how I got there.

The following function uses a for loop to generate primes less than 100k and takes about 7 secs on my laptop:

import numpy
import math
def prime2(upto=100000):
    return filter(lambda num: numpy.array([num % factor for factor in range(2,1+int(math.sqrt(num)))]).all(), range(2,upto+1))

If we remove the for loop we cut the time to about 1.3 secs:

import numpy
import math
def prime(upto=100000):
    return filter(lambda num: (num % numpy.arange(2,1+int(math.sqrt(num)))).all(), range(2,upto+1))

By not including even numbers, we can run in half the time again to about .7 secs:

import numpy
import math
def prime3(upto=100000):
    return [2]+filter(lambda num: (num % numpy.arange(3,1+int(math.sqrt(num)),2)).all(), range(3,upto+1,2))

The above is illustrative of efficiencies in using element-wise operations in Numpy arrays rather than looping.

A good way of speeding up the above code even further is by breaking after finding the first modulo equal zero condition. The code below runs in about .4 seconds:

import numpy
import math
def prime4(upto=100000):
    primes=[2]
    for num in range(3,upto+1,2):
        isprime=True
        for factor in range(3,1+int(math.sqrt(num)),2):
             if not num % factor: isprime=False; break
        if isprime: primes.append(num)
    return primes

Ok Scruffy Pete has laid down the challenge in the comments section so I need to get serious about optimization. The code below uses a few Numpy tricks for a zippy implementation and generates 1 million primes in about .03 secs, about 6X faster than Scruffy Pete’s.

import math
import numpy
def prime5(upto=100000):
    primes=numpy.arange(2,upto+1)
    isprime=numpy.ones(upto-1,dtype=bool)
    for factor in primes[:int(math.sqrt(upto))]:
        if isprime[factor-2]: isprime[factor*2-2::factor]=0
    return primes[isprime]

UPDATE 9/3 – did a half sieve style implementation of the above (see comments) which runs a little faster – under .02 secs for primes up to 1 million. Updated for Python 3 and incorporating suggestion in comments from Paul Weemaes

import math
import numpy
def prime6(upto=1000000):
    primes=numpy.arange(3,upto+1,2)
    isprime=numpy.ones((upto-1)//2,dtype=bool)
    for factor in primes[:int(math.sqrt(upto))//2]:
        if isprime[(factor-2)//2]: isprime[(factor*3-2)//2::factor]=0
    return numpy.insert(primes[isprime],0,2)

Birthday simulations using Python and Numpy

I’ve written previously about the probability of finding a shared birthday in a room full of people. I wanted to run some simulations on this using Python. As an aside, you’ll find some of the techniques below bear a similarity to that old interview question of finding if a list has duplicate elements.

The following code generates birthday samples for a given number of people, then works out the proportion of these samples that contain at least one shared birthday:

import numpy
def sharedbday1(people,simulations=50000):
    samples = numpy.random.random_integers(0,365, size=(simulations, people))
    hasnoshared=map(lambda x: len(x)==len(set(x)),samples)
    return 1-float(sum(hasnoshared))/simulations

Timing this in IPython on my laptop for 50k simulations takes about .5 secs to run. For 23 people, it returns the expected result of around .5. As we saw in a previous post on this blog, it takes 23 people in a room to have at least a 50% chance of a shared birthday.

In [579]: time sharedbday1(23)
CPU times: user 0.56 s, sys: 0.00 s, total: 0.56 s
Wall time: 0.56 s
Out[580]: 0.50678000000000001


Since we love Numpy, it’s likely we’ll have a cleverer and faster way of doing this. We can sort the array, then compare it element-wise against a slice of itself offset by one – and if we find any equivalent elements we’ve found a sample with a shared birthday. With Numpy arrays this is achieved very neatly as operations in Numpy are performed element-wise by default:

import numpy
def sharedbday2(people, simulations=50000):
    samples = numpy.random.random_integers(0,365, size=(simulations, people))
    samples.sort()
    hasshared = ((sample[1:]==sample[:-1]).any() for sample in samples)
    return float(sum(hasshared))/simulations

Well, unfortunately and somewhat surprisingly, this runs twice as slow as the original code:

In [582]: time sharedbday2(23)
CPU times: user 1.01 s, sys: 0.00 s, total: 1.01 s
Wall time: 1.02 s
Out[583]: 0.50988


I exchanged some emails with Travis Oliphant of Numpy fame, who suggested the following code which is really neat and worth spending time to understand – I think it shows off the real power of Numpy:

import numpy
def sharedbday3(people, simulations=50000):
    samples = numpy.random.random_integers(0,365, size=(simulations, people))
    samples.sort(axis=-1)
    hasshared = (samples[:,1:] == samples[:,:-1]).any(axis=-1)
    return float(sum(hasshared))/simulations

This is fast and shows a speed-up of least 5X on my original code – with a wall time of .09s:

In [585]: time sharedbday3(23)
CPU times: user 0.10 s, sys: 0.00 s, total: 0.10 s
Wall time: 0.09 s
Out[586]: 0.50119999999999998

As Travis explains it:
You create a 2-d array up-front with the simulations along one axis (i.e. every row of the 2-d array is a separate simulation). Then you can use the sample approach used above, but this time letting NumPy do the loop over the rows (down in C). The sort and any methods on NumPy arrays take an axis keyword which determines the dimension over which the operation works. The operation is then repeated implicitly over the other dimensions. So, in the 3rd version shared birthday there are no for loops. The penalty paid is memory usage: a 2-d array instead of throw-away 1-d arrays. But, of course, less memory management is also a major reason for the speed gain.


For completeness, I made some minor modifications to the original code above to simulate the probability that at least one person in a group shares your birthday (rather than of finding a shared birthday between them).

import numpy
def sharemybday(people,simulations=50000):
    mybday=0
    samples = numpy.random.random_integers(0,365, size=(simulations, people))
    hasshared=map(lambda x: mybday in x,samples)
    return float(sum(hasshared))/simulations
In [1509]: sharemybday(253)
Out[1509]: 0.50431999999999999

Dijkstra Shortest Path using Python

This post uses python and Dijkstra’s algorithm to calculate the shortest path given a start node (or vertex), an end node and a graph. The function will return the distance from the start node to the end node, as well as the path taken to get there.

The implementation below sticks pretty closely to the algorithm description in the wikipedia entry, which I turned into something a little more pseudo-code-ish to help me implement it:

Initial state:

1. give nodes two properties - node.visited and node.distance
2. set node.distance = infinity for all nodes except set start node to zero
3. set node.visited = false for all nodes
4. set current node = start node.

Current node loop:

1. if current node = end node, finish and return current.distance & path
2. for all unvisited neighbors, calc their tentative distance (current.distance + edge to neighbor).
3. if tentative distance < neighbor's set distance, overwrite it.
4. set current.isvisited = true.
5. set current = remaining unvisited node with smallest node.distance

Here’s my implementation – it’s recursive, as suggested by the algorithm description:

import sys
def shortestpath(graph,start,end,visited=[],distances={},predecessors={}):
    """Find the shortest path btw start & end nodes in a graph"""
    # detect if first time through, set current distance to zero
    if not visited: distances[start]=0
    # if we've found our end node, find the path to it, and return
    if start==end:
        path=[]
        while end != None:
            path.append(end)
            end=predecessors.get(end,None)
        return distances[start], path[::-1]
    # process neighbors as per algorithm, keep track of predecessors
    for neighbor in graph[start]:
        if neighbor not in visited:
            neighbordist = distances.get(neighbor,sys.maxint)
            tentativedist = distances[start] + graph[start][neighbor]
            if tentativedist < neighbordist:
                distances[neighbor] = tentativedist
                predecessors[neighbor]=start
    # neighbors processed, now mark the current node as visited 
    visited.append(start)
    # finds the closest unvisited node to the start 
    unvisiteds = dict((k, distances.get(k,sys.maxint)) for k in graph if k not in visited)
    closestnode = min(unvisiteds, key=unvisiteds.get)
    # now take the closest node and recurse, making it current 
    return shortestpath(graph,closestnode,end,visited,distances,predecessors)

graph = {'a': {'w': 14, 'x': 7, 'y': 9},
        'b': {'w': 9, 'z': 6},
        'w': {'a': 14, 'b': 9, 'y': 2},
        'x': {'a': 7, 'y': 10, 'z': 15},
        'y': {'a': 9, 'w': 2, 'x': 10, 'z': 11},
        'z': {'b': 6, 'x': 15, 'y': 11}}
print shortestpath(graph,'a','a')
print shortestpath(graph,'a','b')
print "\nExpected Result:\n(0, ['a'])\n(20, ['a', 'y', 'w', 'b'])"

As you can see I turned the example in wikipedia into a graph for the test case in the code above:

I also found this useful about how to represent and process graphs in python: http://ftp.ntua.gr/mirror/python/doc/essays/graphs.html. I think this was written by Guido himself.

There’s a bunch of interesting implementations on the web, including two here. Faster and more elegant perhaps, but it took me a little longer to understand them.

Cyclic Linked List – finding the start of the loop using Python

I’d blogged previously about writing a python function to find out whether a linked list is cyclic or acyclic – it’s worth a read before tackling this one.

The challenge here is to return the node that is at the start of the loop given the head node of a cyclic linked list.

A cyclic linked list would look something like this:

A->B->C->D->E->F->C

In the case above, A is the head node, and C is the node at the start of the loop.

Solution

Let’s call x the distance from the head to the start of the loop, and c the number of nodes in the loop.

We kick off a fast and a slow pointer from the head, with the fast pointer going twice as fast as the slow pointer (skipping every other node). By the time the slow pointer gets to the start of the loop, the fast pointer will have a head start of x (the fast pointer is always the same distance from the slow pointer as the slow pointer is from the head).

Given its head start of x, when will the fast pointer catch up to the slow pointer? The fast pointer always needs twice the initial distance from the slow pointer to catch up. Moving forward in the loop, the fast pointer is c-x nodes from the slow pointer, so it needs 2(c-x) nodes to catch up. In that time, the slow pointer will have managed to move c-x nodes away from the start (half what the fast pointer travelled).

The key thing to note here is that both pointers are now x nodes away from the start of the loop looking in a forwards direction. So you’ll see that to find the start of the loop the code below gets the fast and slow pointer to meet in the loop, then walks the slow pointer forward x nodes – the distance from the head to the start.

def startOfCyclicLoop(head):

    """
    >>> e1=Element(1)
    >>> e2=Element(2)
    >>> e3=Element(3)
    >>> e4=Element(4)
    >>> e5=Element(5)
    >>> e1.next=e2
    >>> print startOfCyclicLoop(e1)
    None
    >>> e2.next=e3
    >>> e3.next=e4
    >>> e4.next=e5
    >>> print startOfCyclicLoop(e1)
    None
    >>> e5.next=e3
    >>> print startOfCyclicLoop(e1)
    3
    >>> e5.next=e4
    >>> print startOfCyclicLoop(e1)
    4
    >>> e5.next=e2
    >>> print startOfCyclicLoop(e1)
    2
    """

    slow=head
    fast=head
    while fast!=None and fast.next!=None:
        slow=slow.next
        fast=fast.next.next
        if (fast==slow):
            slow=head
            while (fast!=slow):
                slow=slow.next
                fast=fast.next
            return fast
    return None

class Element:
    def __init__(self,data,next=None):
        self.data=data
        self.next=next
    def __str__(self):
        return str(self.data)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Anagrams using Python

Python can be an elegant language. This is an example – a Python function that finds the anagrams for a supplied word. For the word dictionary I found one in OS X at /usr/share/dict/words.

def anagrams(wordIn):
    f=open('/usr/share/dict/words')                                                                                                                                                                                 
    ana=dict()
    for word in f.readlines():
        ana.setdefault(''.join(sorted(word.rstrip())),[]).append(word.rstrip())
    return ana.get(''.join(sorted(wordIn)))
>>> anagrams("cat")
['act', 'cat']
>>> 

Birthday pairings

Here’s two classic birthday puzzles:

  1. How many people do you need in a room to have a better than 50% chance of finding at least one shared birthday among them?
  2. How many people do you need in a room with you to have a better than 50% chance of finding someone who shares your birthday?

Finding a solution to these is conceptually quite straightforward, but also quite instructive on various useful probabilistic calculation techniques.

Let’s take these one at a time.

  1. How many people do you need in a room to have a better than 50% chance of finding at least one shared birthday among them?

Like a lot of these kinds of problems, it can be easier to first find the chances of the proposition not happening. In this case, finding the chances of not finding a shared birthday.

I have a tendency to think of card decks in these situation – so in this case I think of having a bunch of 365-card decks. I deal one card from each deck. What are the chances that I haven’t dealt the same card after after dealing from r decks? You can see that this is an analogy for our problem. Looking back at the previous post on dealing a perfect hand may be a good primer for what follows.

The answer to our card analogy is to take the number of ways I can deal the cards (one from each deck) and get no duplicates divided by the total number of ways I can deal the cards.

From the first deck, I have 365 ways to deal one card. From the second deck, I have 364 ways to deal one card and not have duplicates. From the third deck, there are 363 ways, and so on until I get to 365-r+1 ways. So the number of ways I can deal cards without duplicates is:

= 365 * 364 * ... * (365-r+1)
= 365! / (365-r)!

Now lets work out how many ways there are to deal the cards without restrictions. The answer is simply:

= 365 ^ r

So now, the probability of not dealing any duplicate cards (and also the chances of not finding anyone with a shared birthday) is:

= (365! / (365-r)!) / (365 ^ r)

I was a little stumped at how to solve this as 365! will make any calculator blow up. But we can use Sterling’s approximation to find that:

ln(365!) = 1792.33142
ln((365-22)!) = 1663.17935
ln((365-23)!) = 1657.34162

So for r=22, our probability of no shared birthdays:

= e ^ (1 792.33142 - 1 663.17935) / (365 ^ 22) 
= 0.524310203

For r=23:

= e ^ (1792.33142 - 1657.34162) / (365 ^ 23) 
= 0.492707724

Now remember that this is for finding no shared birthdays. To have at least 1 shared birthday, we need to subract from 1.

So for r=22, the chances of finding at least 1 shared birthday are:

= 1 - 0.524310203 
= 0.475689797

For r=23, the chances are:

= 1 - 0.492707724 
= 0.507292276

So you can see that it takes at least 23 people in a room to have a greater than 50/50 chance of finding a shared birthday.

By the way, to get the Stirling approximation of ln(365!) I typed the following into google:

365 * ln(365) - (365) + ln(365)/2 + ln (2*pi) / 2)

Ok so let’s move on to our second problem:

2. How many people do you need in a room with you to have a better than 50% chance of finding someone who shares your birthday?

This one’s a little simpler than the first. Again, it’s easier to first find the probability of no one having your birthday. Let’s imagine people walking in to a room where you are. The first person has a 364/365 chance of not having your birthday. And it’s the same for all subsequent people who walk into the room. So the chances of no one sharing your birthday after r people have walked in are:

= (364/365)^r

So the chance that at least one person shares your birthday is:

= 1 - (364/365) ^ r

Let’s find r when the equation is 1/2, representing the at least 50% chance we were looking for:

1/2 = 1 - (364/365) ^ r
(364/365) ^ r = 1/2
r * log (364/365) = log (1/2)
r = log (1/2) / log (364/365)
r = 252.651989

So you need to have at least 253 people with you in a room to have a better than 50/50 chance of finding someone sharing your birthday. It’s quite amazing how different this answer is to the answer for the first part.

A perfect hand, with some combinatorial tricks

Question: You deal the first 13 cards from a well shuffled, 52-card deck. What are the chances the 13 cards you deal are of the same suit?

There’s a couple of ways to do this – both incorporate useful techniques for solving these kind of problems.

First Answer – simple probability

The first way is just to look at the first 13 cards dealt and work out how many ways there are to deal a single-suit hand, and divide this by the number of ways there are to deal any hand.

There are 52 ways you can deal a first card. There are then 12 ways you can deal a second card if we restrict it to the suit of the first card. So we now have 52 * 12 permutations. With the third card, we’re up to 52 * 12 * 11 permutations. So we can see that the ways to deal a single-suit 13 card hand is:

= 52 * 12!

Using the same logic, we can see that:

Ways to deal any 13 card hand = 52 * 51 * 50 * … * 40. This equals:

= 52! / 39!

(Hopefully this is clear – the remaining sequence to finish making 52! is 39!).

So the probability of dealing a single-suit 13 card hand:

= (52 * 12!) / (52! / 39!) 
= 12! * 39! / 51!

Is there a way to work this out without blowing up our calculator? We can go old school and get out our factorial log tables.

The log of the answer equals the log of (12! * 39! / 51!). If you remember your log rules, this is the same as log(12!) + log (39!) – log(51!). If you don’t have log of factorial tables you can type that equation into google and it magically gives you the answer. So you get:

= log(12!) + log (39!) - log(51!) 
= -11.200723

So the log of the probability is -11.200723. Now let’s take the antilog to get our answer:

= 10 ^ -11.200723
= 6.299078 * 10 ^ -12

This is the probability of dealing a perfect deck. Take the reciprocal to get the odds to 1 against:

1 in 158,753,387,100

Holy cow – that’s what in Australia we call between Buckley’s and none.

Second Answer – let’s use some combinatorial tricks!

The first thing you have to know is: if I have a group of things of size n, how many ways are there to select a subset of size r? The anser is: n! / (r! * (n – r)!).

So for example if I had 5 balls, I could arrange 2 balls out of the 5 in: 5! / (2! * 3!) = 15 ways.

As a side note – and I cover this more at the end of this post – I’m treating this as an order doesn’t matter, repetitions not allowed arrangement.

How does this help us in this problem? Let’s think of the number of ways that I can select 13 cards from this deck. The order in which the cards are selected doesn’t matter. The answer, following our example above would be 52! / (13! * 39!). Out of all of these possibilities, how many of these give us 13 cards a particular chosen suit? The answer is only 1. The reason is that there’s only 13 cards of the chosen suit, and the arrangement I’m looking for has all of them. And since order doesn’t matter, there’s only one arrangement that has all of these 13 particular cards.

So dividing the number of arrangements with 13 cards being our chosen suit by the total number of arrangements gives us: 1 / (52! / (13! * 39!))

Now we have to multiply this by 4, as we have 4 suits.

= 4 * 1 / (52! / (13! * 39!))
= 4 * 13! * 39! / 52!

We can see that this becomes our original answer above:

12! * 39! / 51!

Some useful combinatorial definitions

It’s probably a good idea at this point to formalize a little the definitions of permutations and combinations.

Permutations are when the order of the arrangement does matter – the order is a factor in making the arrangement unique.

Combinations are when we want to get the number of arrangements without regard to order. So we can treat the arrangements as buckets we can shake up and it’s still the same arrangement.

Within these, we can also allow repetitions or not. In other words, in a given arrangement, can a particular item be used more than once?

Some examples will help:

Combinations

Repetitions allowed: Let’s say you’re ordering a sandwich at Subway and can choose 3 meats out of a possible 5. If you love salami, you could chose salami, salami and salami for your meats. And we don’t really care what order we get the meats in the sandwich – for example, bologna, salami, bologna is the same as salami, bologna, bologna.

No Repetitions: Let’s say Subway had only one of each of the meats left – so you couldn’t choose the same meat twice. Another good example is a lottery where you need to match 6 numbers drawn from a pool of 40. Once a number is drawn it doesn’t go back in to the pool.

Permutations

Repetitions: the classic example is that of a combination lock (which should really be called a permutation lock!). Order does matter, and repetitions are allowed as your combination could be 7777.


No Repetitions: How many ways are there to get the first 3 positions in a 100 person race? Order matters, and you can’t have the same person in more than one position.

The formulas:

Permutation, repetition: n ^ r
Permutation, no repetition: n! / (n-r)!
Combination, repetition: (n + r – 1)! / (r! * (n – 1)!)
Combination, no repetition: n! / (r! * (n-r)!)

It’s interesting to note that in the above, the combination no repetition case is just the permutation no repetition case divided by r!. You can see this by thinking that for arrangements with 3 elements, there will be 3! more ways to arrange them if we care about order as in the permutation case than in the combination case.

The combination no repetition case is probably the most common and comes up in card-dealing problems and the like. It is sometimes called the “n choose r” case – or nCr, and also known as the “Binomial Coefficient”.

Puzzle – A safe place to drop an egg

This is a problem where it’s fairly easy to find the solution via trial and error, but not so easy to generalize.

Question

Let’s say you need to find out the highest floor in a 100 story building from which you could safely drop an egg without it breaking. What’s your strategy to minimize the worst case number of drops you have to try?

If you just had one egg with which to conduct your experiment, you’d have to start at floor one, and go up floor by floor looking for the first floor at which your egg broke until you reached floor 100. So your maximum number of drops would be 100.

What would be the worst case number of drops if you could use two eggs?

Answer

With two eggs you can be cleverer, and drop the first egg at floor 50, and effectively divide the floor space in half for your second egg.

You could also drop the first egg at floor 10, then assuming it doesn’t break, floor 20, 30, and so on. As soon as it broke, you’d try the 9 floors underneath with your second egg. So worst case if you get all the way to floor 100 before it breaks, then start at floor 91 with the second egg and get all the way to floor 99, then the total worst case drops is 19.

Can we do better?

Picking 10 as number of floors to skip with the first egg is not particularly efficient in dividing the floor space because if the first egg breaks on floor 10, the worst case is 10. But if the first egg breaks on floor 20, then the worst case is 11. As we saw, by the time you get to floor 100 your worst case is 19. The most efficient solution will be when no matter where the first egg breaks, our worst case stays the same.

To do that we could say that we’ll start with our first egg at floor 10, but the second drop will be on floor 19, preserving our 10 drop worst case. The problem you’ll find is that you end up reducing the spacing to 1 before you get anywhere close to floor 100. By trial and error you can find the best number to start with, and it turns out to be 14, giving us a 14 drop worst case:

14, 27, 39, 50, 60, 69, 77, 84, 90, 95, 99, 100.

Generalization

Is there a formula we can use to work this out, so we could do it for any number of floors?

Let’s look at the number of floors we skip at each turn:

n, n-1, n-2, n-3 … 1

Flipping that around, we have:

1, 2, 3, 4….. n.

We also know that the sum of the number of floors skipped will equal the total number of floors.

We know that the sum of a sequence like the above is (n+1) * n/2. You can see this if you pair the first and last numbers, the the second and second to last, etc.

So if we say h is the total number of floors in the building, we have:

h = (n+1) * n/2

So:

n^2 + n = 2h.

To solve for n, let’s use the completing the square trick:

n^2 + n + .25 = 2h + .25
(n+.5)(n+.5) = 2h + .25
n+.5 = sqrt (2h+.25)
n = sqrt(2h+.25) – .5

If we try this on our 100 floor problem, we get:

n = sqrt(200.25)-.5
n=13.65

So seeing as we need integer floors, we need 14 floors, which corresponds with our trial and error result.

I have to give props to Scruffy Pete for helping out on this one.

Alternating coin toss – redux

I want to revisit a probability puzzle from a previous post: see Alternating coin toss game.

You play a game where you alternate tossing a coin with a friend, and the first person to toss heads wins. But let’s make it a little more interesting by making the coin weighted, or biased, so that it lands heads only 25% of the time. If you were to start, what would be your chances of winning?

Let’s call the probability of tossing heads p and the probability of tossing tails q.

On the starter’s first toss, his chances of tossing heads are p. The chances he’ll toss heads on his second toss are q^2\times p. On his 4th toss it’s q^4\times p and so on.

So his chances of winning the game is the sum s of the geometric progression p\times q^0\;+\;p\times q^2\;+\;p\times q^4\;+

Let’s use the shifting approach to solve this. The idea is to subtract out the first element, and then shift the rest of the elements to the left. So first subtract:
s-p\times q^0=p\times q^2\;+\;p\times q^4\;+

Now to shift the terms to the left, we need to divide by q^2
\frac{s-p\times q^0}{q^2}=p\times q^0\;+\;p\times q^2\;+

So…
\frac{s-p\times q^0}{q^2}=s
s=\frac{p}{1-q^2}

This gives us a general solution to the alternating coin toss problem. If we plug in the unweighted coin probabilities, we get s=\frac{2}{3} which is what we’d worked out before. For the case where the probability of tossing a head is 1/4, then we get s=\frac{4}{7}, which is the probability of winning for the starter.

Puzzle – Take your seats

Another problem that has an immediate intuitive answer, but can be tricky if you don’t have that insight.

Question

Let’s say there are 100 people, including you, waiting to board a fully booked flight, each with an assigned seat. The first person to board has seat #1 assigned but doesn’t pay any attention to this and sits somewhere at random. Each of the subsequent people will sit in their assigned seat if they can, but if they find someone sitting there will pick another seat at random. You’re the last to board with seat #100 assigned. What’s the probability you’ll get to sit in your assigned seat?

Answer

OK so if you’re really smart you may be starting to toy with conditional probability.

But an insight that can solve this quickly is that as soon as anyone sits in seat #1, I’ll be guaranteed my seat. Conversely, as soon as anyone sits in seat #100, my seat is eliminated and I won’t get to sit on it.ย  In other words, there will be one passenger that seals my fate, and all other passengers are irrelevant.

The passenger that sealed my fate had a 50/50 chance of either guaranteeing or eliminating my seat.ย  That applies for passenger #1, who had a 1/100 chance of choosing his own seat #1, and a 1/100 chance of choosing my seat #100.ย  If he’s the one that gets to seal my fate, it’s 50/50 which way he’ll do it.ย  If it gets to passenger #99 without my fate being sealed, #99 will have the pleasure of definitely doing so with only seats #1 and #100 left, which he’ll choose at random as neither is his.

So you can see that the chance you get to sit on your own seat is 50/50, because the passenger that sealed your fate had a 50/50 chance of doing it either way and all other passengers who chose seats other than #1 or #100 are irrelevant.ย  And your fate will be definitely be sealed somewhere between passenger #1 and #99.

Anybody have a different way to get the answer – or can get the answer by looking at the sum of conditional probabilities?

Puzzle – Leaning Tower of Pisa

Let’s say you drop a ball from the Leaning Tower of Pisa, which is 179 ft high, and it bounces back 10% of the dropped height – 17.9 ft. Then on the second bounce it bounces up 10% again – 1.79 ft, and so on for ever. How far will the ball travel?

The first thing to realize is that this is a geometric progression which will yield a finite sum, even though it has an infinite number of terms. This idea of finite sums is the reason Zeno can walk across the room: http://en.wikipedia.org/wiki/Geometric_series#Zeno.27s_paradoxes.

Before answering the question it may be useful to recap geometric progressions from my previous post see the coin-toss puzzle answer.

In this tower problem, the total distance travelled by the ball will be: 179 + 179 * 2 * (1/10 + 1/100 + 1/1000…)

You may be able to see straight away that the sum of the progression is 1/9, and so the total distance 218 + 7/9.

But let’s do it using the standard approach:

Given that n in the progression will start at 1, we need to get our progression into the form:

\sum_{n=1}^\infty cr^{n-1} = \frac{c}{1-r}

When we do this, we get the same answer as above:

distance =\;179\;+\;179\;*\;2\;*\;1/10\;*\;\frac{1}{1-\frac{1}{10}}\;=\;218\;\frac{7}{9}

Puzzle – Break a stick to get a triangle

I read somewhere this is a Google interview question:

Let’s say you drop a stick, breaking it randomly in 2 places, leaving you with 3 smaller sticks.ย ย  What’s the probability you can make a triangle out of the 3 resulting sticks?

There are a number of ways to tackle this, including this very elegant solution:ย http://www.cut-the-knot.org/Curriculum/Probability/TriProbability.shtml. For the less elegantly inclined (and those as unlikely as me to be working at Google any time soon), here’s another way:

The first insight is that we can think of the breaks as two separate events.ย  The first break can happen anywhere.ย  To make a triangle, the second break has to happen on the opposite half from the first break. Otherwise we’d end up with one stick being more than half the original length, and we can’t make a triangle if that’s the case.ย  The probability of the 2nd break being on the opposite half is 50%.

But even with this first condition met, we may still end up with one stick being longer than half the original, if the two breaks are close to the ends.ย  We also need the sum of the distances of the two breaks from the ends to be greater than 1/2 the length of the original stick. What’s the probability of that? The probability that two random numbers from 0 to .5 add up to more than .5 is 50%

So that’s a slightly long-winded way of arriving at the answer – the 2nd break has to satisfy two independent 50% chances, so we have a 1 in 4 chance of being able to make a triangle.

Google, you have my number.

Puzzle – 4 Bugs chasing each other

The answer to this problem is either really obvious, or it’s not.

Say there are four bugs standing at the corners of an imaginary 1m * 1m square.  Each of the bugs is facing the bug that is on the adjacent, clockwise corner from where they are.

At the same instant, each of the four bugs starts walking directly towards the bug they are facing at 1m per hour, and keeps walking directly towards that bug as that bug itself starts moving.  So the four bugs spiral towards the center of the square in a clockwise direction.

How long does it take for the four bugs to meet?

Answer

There are two different insights one could have in thinking about this problem. 

The first insight – let’s call it the “square formation” insight is that the bugs will effectively always stay in a square formation in their path to the center. A shrinking and spiraling square, but a square that is always centered on the original center. This means that regardless of where the bugs are in their spiral towards the center, they will always be moving 45 degrees to a line connecting where they are to the center.

So one can calculate their effective velocity towards the center:

V_{center} = \cos\;45^\circ\;*\; V_{absolute}\\ V_{center} = \frac{1}{\sqrt(2)}\;m/hr

Since the total distance towards the center from the starting position is \frac{1}{\sqrt(2)}\;m, it will take the bugs 1hr to meet in the center.

Those that have the second insight – let’s call it the “perpendicular trajectory” insight – will think this problem is so obvious they’ll wonder what the fuss is about.  A bug’s movement towards the bug it is following is always perpendicular to the motion of the followed bug.  So in effect the movement of the followed bug has no effect on how far the follower has to travel to reach it or how long it will take, so the follower has to travel 1 meter, and will catch up to the followed bug in 1 hr.  A-ha!

What if the 3 bugs were in an equilateral triangles with sides 1m?

Same principle as the above:  they stay in formation as they spiral to the center.   One bug is heading towards the other at 1 meter per hour.  In this case, the followed bug is making effective progress towards the follower at  cos(60), or .5 meters per hour.   So they have a combined speed of 1.5 meters per hour towards each other, so they should cover the 1 meter between them in 40 minutes.

Puzzle – Liars and Truthtellers

This is another probability question that seems straightforward but can be deceptive…

You’re an FBI agent and have been monitoring a large gang, and you know that 1/4 of its members always tell the truth, the rest always lie.  Then you get taken hostage by the gang, and two gang members are assigned to guard you.  You ask the first: “Am I going to live?”  He answers “Yes.”  Then you ask the second the same question.  He also answers “Yes.”  What are your odds of living?

It can be tempting to say it’s the probability the first is telling the truth multipled by the probability the second is telling the truth, which would mean you have a 1/16th chance of living.

It’s not correct, however, because there’s another piece of information you have which is that they both gave the same answer.

Let’s look at the probability tree:

GUARD 1 GUARD 2 PROBABILITY
T T 1/16
F F 9/16
T F 3/16
F T 3/16

The only two rows that apply to this problem are the first two – they can either both be telling the truth, or both be lying.  The chances they’re both telling the truth is:

\frac{\frac{1}{16}}{\frac{1}{16}+\frac{9}{16}}=\frac{1}{10}

So, using the same table, what would your chances of living be if the first guard told you that you would live, but the second that you would die?

Looking at the table you’ll see that your chances are 50/50.


Puzzle – shortest road system

Let’s say you’re contracted to build roads to connect four towns that are at the corners of an imaginary square with sides 1 mile. What’s the shortest length of road system you can build so that every town can get to every other town?

The first answers that come to mind for most folks are either to do a perimeter road with one one side left off which is 3 miles or to build roads in the shape of an X – which pythagoras tells is would be 2 * sqrt(2) = 2.828 miles.

We can do better. Click on this link to see what the road system would look like, and the theory behind it.

To work out the total length we can use some basic calculus:

Call x the length of shared highway in the middle. So the total length of the road will be:

l\;=\;4\sqrt{(\frac{1}{2})^2 + (\frac{(1-x)}{2})^2}\;+\;x

Simplifying:

l\;=\;\sqrt{4 + 4(1-x)^2}\;+\;x

We’re trying to find x where l is the shortest. So we can differentiate l with respect to x, and set this equal to zero. Recall the chaining rule from high school?

\frac{dl}{dx}\;=\;\frac{1}{2}\;*\;(4 + 4(1-x)^2)^{-\frac{1}{2}}\;*\;8(1-x)\;*\;-1\;+\;1\;=\;0

\frac{1}{\sqrt{4 + 4(1-x)^2}}\;=\;\frac{1}{4(1-x)}

4 + 4(1-x)^2\;=\;\frac{1}{16(1-x)^2}

12(1-x)^2\;=\;4

1-x\;=\;\sqrt{\frac{1}{3}}

x\;=\;1-\sqrt{\frac{1}{3}}

And now we can substitute to find the total length of the road:

l\;=\;4 * \sqrt{\frac{1}{4} + \frac{1}{12}}\;+\;1\;-\;\sqrt{\frac{1}{3}}

l\;=\;4 * \sqrt{\frac{1}{3}}\;+\;1\;-\;\sqrt{\frac{1}{3}}

l\;=\;3 * \sqrt{\frac{1}{3}}\;+\;1

l\;=\;sqrt{3}\;+\;1

This works out to be 2.732 miles, a little better than our original X approach.

It’s neat that the angles turn out to be 120 degrees (as is shown in the link above). A colleague declared to me that this was ‘obvious’, but it’s not so obvious to me. Can anyone find a trigonometric way to solve this problem that doesn’t involve the calculus? Or at least provide an explanation for why the 120 degrees makes sense (other than that it’s a beautiful symmetry)?

Now, only because I’m a maniac and am weirdly enjoying doing high school Calculus for the first time in a while, I’m going to do this again this time picking x to be the distance from the shared highway to the edge of the square. So the shared highway length will be 1-2x.

l\;=\;4\sqrt{\frac{1}{4} + x^2}\;+\;1\;-\;2x

Simplifying:

l\;=\;\sqrt{4 + 16x^2}\;+\;1\;-\;2x

Now we differentiate:

\frac{dl}{dx}\;=\;\frac{1}{2}\;*\;(4 + 16x^2)^{-\frac{1}{2}}\;*\;32x\;=\;2

\sqrt{4 + 16x^2}\;=\;\frac{1}{8x}

4+16x^2\;=\;64x^2

4\;=\;48x^2

x\;=\;sqrt{\frac{1}{12}}

Now we can substitute for the length:

l\;=\;4 * \sqrt{\frac{1}{4} + \frac{1}{12}}\;+\;1\;-\;\sqrt{\frac{1}{3}}

Which we can see is identical to what we had above, so our length is again:

l\;=\;sqrt{3}\;+\;1

Puzzle – Clock

It’s 3 O’Clock.

What will be the time when the minute and hour hands are next in the same position (coincident)?

One way to solve this problem:ย  start with the minute hand at the 12 and hour hand at the 3. Let’s call x the number of minute markers from 12 O’clock to where the hands meet. The minute hand will take x minutes to get there (it travels a minute marker a minute). The hour hand will need to travel (x-15) minute markers and we know it will take (x-15)*12 minutes to get there. When the hands meet, the same time will have elapsed for both so we can solve for x:

x = (x-15)*12
x=12x-180
11x=180
x=180/11
x=16 and 4/11 minutes

There’s a smarter way to solve this problem:ย  you realize that there’s 11 intervals at which the minute hand and the hour hand coincide over the course of a 12 hour period, so each is 60/11 minute markers apart on the clock face. The one after 3 O’Clock is the third, so it’s 60*3/11 minutes after the hour, which is 180/11 minutes.

More puzzles…

Some of these are from Heard on the Street – a book of interview questions for Quants I’m trying to get through. Well worth a read if you’re into this sort of thing…

1. Two ropes of different lengths, and you know that if you were to light either of them at one end it burns through in an hour. How can use use them to measure 45 mins?

Light the first of them at both ends, and at the same time the second at one end. When both lit ends of the first meet (after 30 mins), light the second end of the second rope (which should take another 15mins to meet the other lit end).

Bonus points on this one: is it a condition for this approach to work that the ropes burn at a constant rate?

2. Two identical jugs, one with water, one with vodka. You pour a bit of vodka into the water, mix it, then pour the mix back to get the jugs to their original volumes. What’s the relationship between the new concentration of vodka in the vodka jug and water in the water jug?

Like a lot of these, getting straight into the algebra is not a good idea. The key here is that the volume of vodka and water remains the same. Any that displacement of (say) vodka from the vodka jug must have been replaced by an identical amount of water – so in effect same amounts of vodka and water have just swapped places, and the concentrations are the same.

3. Imagine you’re an ant (you can walk on walls but not fly) and you want to get from a bottom corner of a cubic room (1*1*1) to the extreme opposite corner (farthest from you). What’s your shortest path to get there?

If you said walk diagonally along the floor to the opposite floor corner, then up the joint in the walls to the ceiling corner – a distance of 1 + sqrt(2), think again. Think of the room as a box that’s been laid flat, with the sides out so it’s like the cross on the swiss flag. The distance between the two corners in question is sqrt(5).

4. Let’s say you have a bunch of 1*1*1 mini-cubes that you’ve assembled into a 10*10*10 big cube. Now let’s say the whole outer layer of the big cube becomes damaged and has to be replaced – how many new mini-cubes do you need?

Two ways of doing this – the Plodders way and then a smart way. A Plodder would say you need a top and bottom layer (10*10 + 10*10 = 200) and 8 ring layers (8 * (10+10+8+8) = 288). Total 488.

The smart way is to think about the inner cube that’s left, which is 8*8*8, or 8^3. So your answer is 10^3 – 8^3. You should be able to do this in your head: 10^3 is 1000. 8^3 is the same as (2^3)^3, or 2^9. Everyone knows 2^10 which is 1024. 2^9 is half that, 512. 1000-512 is 488.

5. Say there are 100 lights with switches (initially off), and 100 people. The first person goes through and flips every switch (so all the lights are on). The second person flips every second switch, so at this point half the lights are off again. The third person every third switch, and so on. By the time the 100th person goes through he just flips switch #100. At the end, how many light bulbs are turned on?

The trick here is to understand that the number of times a switch gets flipped depends on the number of factors that switch number has. So for example switch #6 has an even number of factors – 1,2,3 and 6 – so it ends up off. You’ll see that most numbers have an even number of factors because they have product pairs (for example, 1 and 6, and 2 and 3). Which numbers don’t have product pairs? Well, they all have product pairs, but in some cases both numbers in the product pair are the same – when the number has an integer square root. So for example the product pairs for 9 are 1 and 9, and 3 and 3. So the people who flip the #9 switch are 1, 3 and 9 – an odd number that leaves the light turned on. So the lights left on are these “perfect squares” of 1, 4, 9, 16, 25, 36, 49, 64, 81 and 100.

Mth to last – a recursive approach in Python

A collegue of mine was pondering the mthToLast problem for a LinkedList I’d blogged about previously and sent me the following IM message:

just a guess - but can you use  recursion to solve your link list m problem? 
a method to track through the list which calls itself, 
then when you get to the end start incrementing a counter, 
and when it equals M return the value?

I think this is what he meant – and it seems to work. Not as scalable as the trailing pointer approach in the previous post, but an interesting exercise in recursion. You find the size at the end, and return it all the way back. Simple, but recursion can blow your mind sometimes.

class Element:
    def __init__(self,value,next=None):
        self.value=value
        self.next=next

def findMToLast(element,m,counter=1):
    """ 
    >>> e1=Element(1)
    >>> e2=Element(2)
    >>> e3=Element(3)
    >>> e4=Element(4)
    >>> e1.next=e2
    >>> e2.next=e3
    >>> e3.next=e4
    >>> findMToLast(e1,1).value
    4
    >>> findMToLast(e1,2).value
    3
    >>> findMToLast(e1,3).value
    2
    >>> findMToLast(e1,4).value
    1
    """
    if element.next==None:
        if m==1: mtolast=element
        else: mtolast=None
        if counter==1: return mtolast
        return (counter,mtolast)
    (size,mtolast)=findMToLast(element.next,m,counter+1)
    if size-counter+1==m: mtolast=element
    if counter==1: return mtolast
    return (size,mtolast)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Reverse a string using recursion in Python

I’ve always found recursion somewhat unnatural to think about, but somehow satisfying when I can see it unfolding in my mind – this simple exercise in reversing a string using recursion is a case in point…

>>> def reverse(str):
    if len(str)==1:
        return str
    else:
        return reverse(str[1:])+str[0]

>>> reverse('hello')
'olleh'
>>> 

Of course,ย  this is just a thought experiment.ย  In real-world Python, you’d do this:

>>> 'abcdef'[::-1]
'fedcba'
>>>

Deep copy and recursive references

Quick one about coding a deep copy and avoiding recursive references…

Let’s say we have an instance a which has a reference to an instance b and we have to do a deep copy of a. To do this, we need a deep copy of b as well.

But let’s say b had a reference back to a.

Here it is in Python (I know Python has its own __deepcopy__() method but the below is for illustrative purposes).

class A:                                                                                                                                                                                                            
    def __init__(self,b=None):
        self.b=b
    def deepcopy(self):
        newA=A()
        newA.b=self.b.deepcopy()
        return newA
class B:
    def __init__(self,a=None):
        self.a=a
    def deepcopy(self):
        newB=B()
        newB.a=self.a.deepcopy()
        return newB

Now let’s execute this:

>>> a=A()
>>> b=B()
>>> a.b=b
>>> b.a=a
>>> newA=a.deepcopy()

What happens? Not pretty – a recursion limit error.

To fix this, we can keep track in a as to whether that instance has already created a copy of itself, so when it is called back from b it can just return that instance. This is sometimes called the “memo” copy. See amended code below.

class A:                                                                                                                                                                                                            
    def __init__(self,b=None):
        self.b=b
        self.memo=None
    def deepcopy(self):
        if self.memo==None:
            newA=A()
            self.memo=newA
            newA.b=self.b.deepcopy()
            self.memo=None
            return newA
        else:
            return self.memo
class B:
    def __init__(self,a=None):
        self.a=a
    def deepcopy(self):
        newB=B()
        newB.a=self.a.deepcopy()
        return newB

Now when we execute:

>>> a=A()
>>> b=B()
>>> a.b=b
>>> b.a=a
>>> newA=a.deepcopy()
>>> newA
<__main__.A instance at 0x00D53A30>
>>> newA.b.a
<__main__.A instance at 0x00D53A30>
>>> newA2=a.deepcopy()
>>> newA2
<__main__.A instance at 0x00D53AF8>
>>> newA2.b.a
<__main__.A instance at 0x00D53AF8>
>>> 

Note that the memo is removed just before the copy is handed back so that if you call deepcopy again you’ll get a new copy of a.

Bootstrapping Zero Curves

What is a yield curve

A yield curve is a representation of what interest rates you could lock in today for investments over different periods. It’s effectively a set of yields for securities of different maturities (typically cash rates at the short end, futures and then swaps at the longer maturities – see the wikipedia entry). These yields are typically calculated from market prices using standard price/yield formulas.

Why yield curves can’t directly be used in PV calculations

The problem is that the quoted rates are from coupon paying securities and tell us the value of a series of cash flows, but they don’t tell us the rate for a cash flow at the maturity point independently of the other cash flows. So these rates don’t directly imply the present value of a dollar to be received at the maturity points – that is, they cant be used directly as discount rates.

The Zero curve to the rescue

A zero curve represents the set of interest rates assuming that there are no periodic cash flows – as though the rate reflects a single payment of interest and principal made at that maturity point on the curve. As such, the rate can directly provide the present value of a dollar received at these maturity points.

Let’s take a very simple example:

A yield curve with 3 points:
Year 1: 5%
Year 2: 6%
Year 3: 7%

This means if you invest $1:

  • For 1 year, you’ll receive 5% interest plus your principal – $1.05 – at the end of year 1.
  • For 2 years, you’ll get 6% annually, with your principal included at the end of year 2. So you will get $.06 at the end of year 1, and $1.06 at the end of year 2.
  • For 3 years, you’ll get $.07 at the end of year 1, $.07 at the end of year 2 and $1.07 at the end of year 3.

The key point here is that the 2 and 3 year rates effectively represent a coupon paying security. In the 2 year rate, $1 invested now for two years represents the combined PV of two cash flows. But what is the PV of just the second cash flow? If we can work this out, we’ll have a zero (or discount) rate at 2 years, enabling us to PV any money received 2 years in the future.

Deriving Zero curves

We know the PV of $1 received at the end of the first year using our basic time value of money formula:

PV = \frac{FV}{(1 + r)^n} = \frac{1}{(1 + .05)^1} = \frac{1}{1.05}

This is effectively the 1 year discount factor. So any monies received at the end of 1 year can be multiplied by this to get the PV. Building (or “bootstrapping”) the zero curve is obtaining these discount factors for all maturity points.

To get the zero rate, or discount factor, for the 2 year point, we can say that $1 invested today will equal the PV of 1st payment plus the PV of 2nd payment. So:

PV = 1 = (\frac{.06}{1.05}) + (\frac{1.06}{(1+r)^2})

(where r equals the 2 year discount factor)

Working this out, we end up with:

r_{\tiny{Z2}}=.0603

or 6.03%

For the 3rd year zero rate, we build on the above:

PV = 1 = (\frac{.07}{1.05}) + (\frac{.07}{(1+.0603)^2}) + (\frac{1.07}{(1+r)^3})

(where r equals the 3 year discount factor)

This gives us:

r_{\tiny{Z3}}=.07097

or 7.097%

Now we have our zero curve that we can used as discount rates for cash flows at the corresponding
maturities.

So the corresponding zero curve:
Year 1: 5%
Year 2: 6.03%
Year 3: 7.097%

An observation

It’s interesting to note that the zero rates are slightly higher than the yield curve rates when the curve is sloping upwards. If you have trouble seeing why think of an extreme case of a yield curve with 0% rate in year 1 and 100% rate in year 2.

Implied forward rates

Another concept worth touching on here are the implied forward rates from these zero rates. Let’s say we invest $1 for 2 years under our zero rate, which is 6.03%. This would give us a return of:

FV = 1 * (1 + .0603)^2 = 1.124236

We could also have invested for 1 year, then take that return and lock in a forward rate for the second year (the rate at the end of year 1 for investing for year 2). Under the rules of arbitrage, this forward rate for year 2 needs to give us the same return as investing for 2 years using the 2 year zero rate. So:

FV = 1.124236 = 1.05 * (1 + r_{\tiny{1,2}})

where r_{\tiny{1,2}} is the forward rate for the second year

This gives an implied forward rate r_{\tiny{1,2}} = .070707 = 7.07%

You can keep building these implied forward rates for all future years using this technique.

The “Mathematical Constant” and Continuously Compouding Interest

One of the (many) aspects of the “Mathematical Constant” e is that:

\lim_{x\to\infty} (1+\frac{1}{x})^x = e

This property makes e very useful for working on compounding interest problems. How so?

Let’s start with the basic time value of money formula giving the relationship between the PV (present value) and FV (future value) given R (the rate of interest):

FV = PV * (1 + R)^n

In the formula above n represents the number of compounding periods. Given normal conventions, this is typical represented as m * Y where m is months (or other periods – it represents how many chunks we divide the year into for receiving interest payments) and Y is years. Also, the R (or rate) above is normally quoted in years, so we should divide this by the months: R/m.

So now we have:

FV = PV * (1 + \frac{R}{m})^{mY}

We can make use of the e formula above if we want continuously compounding interest – that is where m is as high as possible, converging on \infty.

Let’s start by making x=\frac{m}{R} and we would get:

FV = PV * (1 + \frac{1}{x})^{x * YR}

We can now drop this into our formula for e given that we want m (or x, which is effectively our proxy for m) to approach \infty:

FV = PV * [\lim_{x\to\infty} (1+\frac{1}{x})^{x}]^{RY}

We can write this simply as:

FV = PV * e^{RY}

So if we invest $100 at 6.5% continuously compounding interest for a year, we will end up with:

FV = 100 * e^{.065*1} FV = \$106.72

Puzzle – Alternating coin toss game

A really simple problem, but needs a trick or two to solve quickly.

Question: You take it in turns tossing a coin with a friend – you having the first toss – and keep going until one of you tosses heads and wins. What are the chances you’re the winner?

Answer: The chances of me winning are the sum of my chances at each toss which is: 1/2 + 1/8 + 1/32… The chances of my friend winning are 1/4 + 1/16 + 1/64…

It may be immediately obvious to you that I’ll have twice my friend’s chances of winning. So that would mean I have a 2/3 chance, and my friend a 1/3 chance.

Let’s also think about this in terms of geometric progressions. First, some basics:

when r<1:

\sum_{n=0}^\infty cr^n = \frac{c}{1-r}

and:

\sum_{n=1}^\infty cr^{n-1} = \frac{c}{1-r}

So if you can get your progression into one of the two forms above, you can just apply the formula on the right hand side.

Now start looking at this from the point of view of the guy starting second. His first toss has a probability of \small(1/2)^2, his second toss \small(1/2)^4, his third \small(1/2)^6. So we can see that on his nth toss, his probability will be \small(1/2)^{2n} with n starting at 1.

We see from the above formulas that when we have n starting at 1 we need to get our progression into the form: {\small}cr^{n-1}. So we need to expand as such: (\frac{1}{2})^{2n} = (\frac{1}{4})^n = \frac{1}{4} * (\frac{1}{4})^{n-1}. Now we have c=1/4 and r=1/4 to plug in: \frac{c}{1-r} = \frac{\frac{1}{4}} {1 - \frac{1}{4}} = \frac{1}{3}.

So of course, since someone has to win, the probability for the starter is 2/3. But let’s work this out. The starter’s probability at toss n is \small(1/2)^{2n+1} with n starting at 0. Since n starts at 0, we need to get this into the form: {\small}cr^{n}. Expanding as above: (\frac{1}{2})^{2n+1} = \frac{1}{2} * (\frac{1}{4})^{n}. Now we have c=1/2 and r=1/4 to plug in: \frac{c}{1-r} = \frac{\frac{1}{2}} {1 - \frac{1}{4}} = \frac{2}{3}.

Puzzle – Roll a die for $

This is an interesting puzzle in that it shows a basic technique used in solving these kinds of probability puzzles, as well as in models for some derivatives pricing, without the need for conditional probability calculations. Hint: start with the last event where we know the expected winnings, and work backwards.

Question: It’s your birthday, and a generous uncle tells you that he’ll give you in dollars whatever number comes up when you roll a die – so you will win between $1 and $6. He makes it more interesting – if you don’t like what you roll the first time, you can roll again. And if you don’t like that, you can roll one more time – for a maximum of three rolls. Note that if you choose to throw again, you can’t go back to an earlier better roll. What’s your strategy to maximize your winnings?

Answer: The trick here is to realize that on the third roll (if you get that far) you’re really stuck with whatever comes up, and that on average on your third roll you’ll win $3.50. So on your second roll, you should stick with anything 4 or higher, anything less you go on to the third roll. So, on average, your winnings on the second roll are ($6 + $5 + $4 + $3.50 + $3.50 + $3.50)/6 which equals $4.25. So then on your first roll, you should stick with a 5 or a 6, but go on to your second roll with anything less. On average, your winnings overall if you stick with this strategy will be ($6 + $5 + $4.25 + $4.25 + $4.25 + $4.25) /6 which equals $4.67.

Classic Probability Puzzles

1. I have two children. One of my children is a girl. What are the chances I have two girls?

The answer to this question is not 50/50. After being told that one of your children is a girl, you know there are 3 options (with equal probability each): GB, BG & GG. So the answer is a 1/3 chance. Counterintuitive perhaps because people assume the question has stated that the first child is a girl – in which case it would be 50/50.

2. There is a sixgun with two bullets in consecutive chambers pointed at your head. The bad guy spins it, then pulls the trigger. It clicks on an empty chamber. He then tells you that you have a choice: he pulls the trigger again or spins it first before pulling the trigger. Which to choose?

My immediate intuition was to spin again – it would seem that if you were lucky enough for it not to fire once, you’re pushing your luck pulling the trigger again straight away. But now let’s think about it.

The probability of getting shot if you spin again is clearly 2/6, or 1/3. What if you don’t spin? The only way you’re going to get shot is if the you had landed on the one empty chamber just before the bullets – a 1/4 chance. So, perhaps counterintuitively, you’re better off just having the trigger pulled again with no spin.

3. OK, this is a really hoary old chestnut – the Monty Hall problem.ย  You’re at a game show, and the host tells you the car is behind one of the three closed doors – and to choose a door. Before he opens your chosen door, he opens one of the two remaining doors that he tells you he knows the car is not behind, and then gives you the choice of sticking with your original pick or switching to the other unopened door. Which do you do?

It’s not 50/50, even though there’s two doors left (although the probability across both equals 1). There’s many ways to think about it, but to me the one that makes the most sense is that no matter what, the probability that you picked the door in the first place remains 1 in 3. So if you were to pick out of a million doors, the host opening 1 million minus 2 doors leaving your pick and one remaining unopened door doesn’t suddenly up your chances of having picked the door from 1 in a million to 1 in 2. In the 3 door case, if your pick still has a 1/3 chance of having the car, the other door must have a 2/3 chance – so you in effect double your chances by switching doors.

Binary Search Tree – Lowest Common Ancestor – Python

The last problem in the Trees chapter of Programming Interviews Exposed was about finding the lowest common ancestor between two nodes of a binary search tree. Starting from the head, if you find that the nodes that you’re looking for straddle the node you’re on, then you’ve found your lowest common ancestor. If they’re both on one side or the other, grab your child on that side, and recurse. The key is realizing that when you go left (for example) there will be no result on the left branch bigger than the right’s universe.

def commonAncestor(node1,node2,head):
    """ 
    >>> node1=Node(1)
    >>> node4=Node(4)
    >>> node3=Node(3,node1,node4)
    >>> node7=Node(7)
    >>> node5=Node(5,node3,node7)
    >>> commonAncestor(node1,node7,node5).value
    5
    >>> commonAncestor(node7,node1,node5).value
    5
    >>> commonAncestor(node1,node5,node5).value
    5
    >>> commonAncestor(node1,node4,node5).value
    3
    >>> commonAncestor(node7,node7,node5).value
    7
    """
    maxNode=max([node1,node2], key=lambda item: item.value)
    minNode=min([node1,node2], key=lambda item: item.value)
    if head==None: return
    if (minNode.value <= head.value) & (maxNode.value >= head.value):
        return head
    if (minNode.value <= head.value) & (maxNode.value <= head.value): 
        return commonAncestor(node1,node2,head.left) 
    if (minNode.value >= head.value) & (maxNode.value >=head.value):
        return commonAncestor(minNode,maxNode,head.right)

class Node:
    def __init__(self,value,left=None,right=None):
        self.value=value
        self.left=left
        self.right=right

import doctest                                                                                                                                                                                                  
doctest.testmod()

Tree Traversal – Python

Another question posed in the Programming Interviews Exposed book. A pre-ordered traversal of a binary tree (counterclockwise starting at root, printing nodes as you encounter them) is pretty straight forward, and a very natural thing to implement with recursion:

def traverse(node):
    """ 
    >>> node1=Node(1)
    >>> node2=Node(2)
    >>> node3=Node(3,node1,node2)
    >>> node4=Node(4)
    >>> node5=Node(5,node3,node4)
    >>> traverse(node5)
    5
    3
    1
    2
    4
    """
    if node==None: return
    print node.value
    traverse(node.left)
    traverse(node.right)
 
class Node:
    def __init__(self,value,left=None,right=None):
        self.value=value;self.left=left;self.right=right


import doctest
doctest.testmod()

A little trickier is an implementation without recursion:

def treeWalker(node):
    """
    >>> node1=Node(1)
    >>> node2=Node(2)
    >>> node3=Node(3,node1,node2)
    >>> node4=Node(4)
    >>> node5=Node(5,node3,node4)
    >>> treeWalker(node5)
    5
    3
    1
    2
    4
    """
    lifo=[]
    while True:
        print node.value
        if node.left!=None:
            lifo.append(node)
            node=node.left
        else:
            try:
                node=lifo.pop()
            except: 
                return None
            node=node.right
class Node:
    def __init__(self,value,left=None,right=None):
        self.value=value;self.left=left;self.right=right
        
import doctest
doctest.testmod()

Linked List – Cyclic or Acyclic?

Another problem in the Programming Interviews Exposed book (see previous post) is to determine whether a Linked List is cyclic or not.

The most obvious way to do this is to iterate over the list, checking whether the next element is one that you’ve seen before. So as you get to an element, you sweep back from head to current.

def isAcyclic1(head):
    """ 
    >>> e1=Element(1)
    >>> e2=Element(2)
    >>> e3=Element(3)
    >>> e4=Element(4)
    >>> e5=Element(5)
    >>> e1.next=e2
    >>> e2.next=e3
    >>> e3.next=e4
    >>> e4.next=e5
    >>> isAcyclic1(e1)
    True
    >>> e5.next=e3
    >>> isAcyclic1(e1)
    False
    >>> e5.next=e4
    >>> isAcyclic1(e1)
    False                                                                                                                                                                                                           
    >>> e5.next=e2
    >>> isAcyclic1(e1)
    False
    """
    curr=head
    while curr!=None:
        sweeper=head
        while sweeper!=curr:
            if curr.next==sweeper:
                return False
            sweeper=sweeper.next
        curr=curr.next
    return True

class Element:
    def __init__(self,data,next=None):
        self.data=data
        self.next=next

if __name__ == "__main__":
    import doctest
    doctest.testmod()

This is an O(n squared) solution. There’s a better way, of course. You start a fast pointer, skipping elements, which if acyclic will get to the end quickly. If cyclic, it will eventually trip over the slow pointer. It turns out to be an O(n) solution. Smart.

def isAcyclic(head):

    """ 
    >>> e1=Element(1)
    >>> e2=Element(2)
    >>> e3=Element(3)
    >>> e4=Element(4)
    >>> e5=Element(5)
    >>> e1.next=e2
    >>> isAcyclic(e1)
    True
    >>> e2.next=e3
    >>> e3.next=e4
    >>> e4.next=e5
    >>> isAcyclic(e1)
    True
    >>> e5.next=e3
    >>> isAcyclic(e1)
    False
    >>> e5.next=e4
    >>> isAcyclic(e1)
    False
    >>> e5.next=e2
    >>> isAcyclic(e1)
    False                                                                                                                                                                                                           
    """
    slow=head
    fast=head
    while fast!=None and fast.next!=None:
        slow=slow.next
        fast=fast.next.next
        if (fast==slow):
            return False
    return True

class Element:
    def __init__(self,data,next=None):
        self.data=data
        self.next=next

if __name__ == "__main__":
    import doctest
    doctest.testmod()

More on Linked Lists – flattening and unflattening

More from the Programming Interviews Exposed book (see last post). In this problem you take a double linked list (where each node knows its previous and next nodes) with child elements, and flatten it. Then unflatten it back to its original structure. It took me a few minutes to understand using recursion for unflattening.

class Node:
    def __init__(self,data,prev=None,next=None,child=None):
        self.data=data
        self.prev=prev
        self.next=next
        self.child=child

class ListFlattener:
    """
    >>> node1=Node(1)
    >>> node5=Node(5)
    >>> node1.child=node5
    >>> node2=Node(2)
    >>> node3=Node(3,node1,None,node2)
    >>> node1.next=node3
    >>> node4=Node(4,node2)
    >>> node2.next=node4
    >>> ListFlattener.listAsArray(node1)
    [1, 3]
    >>> ListFlattener.flattenList(node1)
    >>> ListFlattener.listAsArray(node1)
    [1, 3, 5, 2, 4]
    >>> ListFlattener.unflattenList(node1)
    >>> ListFlattener.listAsArray(node1)
    [1, 3]
    >>> ListFlattener.flattenList(node1)
    >>> ListFlattener.listAsArray(node1)
    [1, 3, 5, 2, 4]
    >>> ListFlattener.listAsArray(ListFlattener.findTail(node1),False)
    [4, 2, 5, 3, 1]
    """
    @staticmethod
    def flattenList(head):
        curr=head
        tail=ListFlattener.findTail(head)
        while curr!=None:
            if curr.child!=None:
                tail=ListFlattener.appendChildToEnd(curr,curr.child,tail)
            curr=curr.next

    @staticmethod
    def unflattenList(head):
        curr=head
        while curr!=None:
            if curr.child!=None:
                curr.child.prev.next=None
                curr.child.prev=None
                ListFlattener.unflattenList(curr.child)
            curr=curr.next

    @staticmethod
    def appendChildToEnd(node,child,tail):
        tail.next=child
        child.prev=tail
        return ListFlattener.findTail(child)

    @staticmethod
    def findTail(node):
        while node.next!=None:
            node=node.next
        return node

    @staticmethod
    def listAsArray(node,forward=True):
        list=[]
        while node!=None:
            list.append(node.data)
            if forward:
                node=node.next
            else:
                node=node.prev
        return list

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Linked List Interview questions

I hadn’t done much programming for a while and thought I’d best brush up on some basic data structures and algorithm.

The following is from Programming Interviews Exposed. I implemented some of the Linked List stuff in the first section in Python. The code came out pretty close to the solutions in the book which were mainly in C++.

class Stack:
    """
    Adapted from Programming Interviews Exposed problems
    LinkedList implementation that keeps track of head and tail

    Basic tests - push,pop,remove
    >>> s=Stack()
    >>> s.push(1)
    >>> s.push(2)
    >>> s.push(3)
    >>> s.push(4)
    >>> s.pop()
    4
    >>> e=s.head.next
    >>> s.remove(e)
    True
    >>> s.head.data
    3
    >>> s.tail.data
    1
    >>> e=s.head.next
    >>> s.remove(e)
    True
    >>> s.head.data
    3
    >>> s.tail.data
    3
    >>> e=s.head
    >>> s.remove(e)
    True
    >>> s.head
    >>> s.tail
    """

    def __init__(self):
        self.head=None
        self.tail=None

    def push(self,data):
        if self.head==None:
            self.head=Element(data,None)
            self.tail=self.head
        else:
            self.head=Element(data,self.head)

    def pop(self):
        oldhead=self.head
        if oldhead==None:
            return None
        self.head=oldhead.next
        if self.head==None:
            self.tail=None
        return oldhead.data

    def remove(self,element):
        if self.head==None:
            return False
        if self.head==element:
            self.head=self.head.next
            if self.head==None:
                self.tail=None
            return True
        curr=self.head
        while curr!=None:
            if curr.next==element:
                curr.next=curr.next.next
                if curr.next==None:
                    self.tail=curr
                return True
            else:
                curr=curr.next
                return False

    def insertAfter(self,element,data):
        """
        >>> s=Stack()
        >>> s.insertAfter(None,1)
        True
        >>> s.head.data
        1
        >>> s.tail.data
        1
        >>> s.insertAfter(None,2)
        True
        >>> s.head.data
        2
        >>> s.tail.data
        1
        >>> e=s.head
        >>> s.insertAfter(e,3)
        True
        >>> s.head.data
        2
        >>> s.tail.data
        1
        >>> s.head.next.data
        3
        >>> e=s.tail
        >>> s.insertAfter(e,4)
        True
        >>> s.head.data
        2
        >>> s.tail.data
        4
        """
        if (element==None)|(self.head==None):
            self.head=Element(data,self.head)
            if self.head.next==None:
                self.tail=self.head
            return True
        curr=self.head
        while curr!=None:
            if curr==element:
                curr.next=Element(data,curr.next)
                if curr.next.next==None:
                    self.tail=curr.next
                return True
            curr=curr.next
        return False

    def removeHead(self):
        """
        >>> s=Stack()
        >>> s.removeHead()
        >>> s.head
        >>> s.tail
        >>> s.push(1)
        >>> s.removeHead()
        >>> s.head
        >>> s.tail
        >>> s.push(1)
        >>> s.push(2)
        >>> s.removeHead()
        >>> s.head.data
        1
        >>> s.tail.data
        1
        """
        if self.head!=None:
            self.head=self.head.next
        if self.head==None:
            self.tail=None

    def findMToLastElement(self,m):
        """
        >>> s=Stack()
        >>> s.push(1)
        >>> s.push(2)
        >>> s.push(3)
        >>> s.findMToLastElement(0).data
        1
        >>> s.findMToLastElement(1).data
        2
        >>> s.findMToLastElement(2).data
        3
        >>> s.findMToLastElement(3)
        """
        mToLast=None
        curr=self.head
        count=0
        while curr!=None:
            if count==m:
                mToLast=self.head
            if count>m:
                mToLast=mToLast.next
                if mToLast==None:
                    return None
            curr=curr.next
            count+=1
        return mToLast

class Element:
    def __init__(self,data,next=None):
        self.data=data
        self.next=next

if __name__ == "__main__":
    import doctest
    doctest.testmod()