The Not Coke Machine

Posted on September 20th, 2011 in Amazon AWS, Programming | 1 Comment »

The Not Coke Machine is the product of a realization that a mini-fridge provides little benefit beyond a store for cold, cheap beer. It is an ethernet enabled kegerator with a built in flow meter that monitors how much beer has been poured through the tap. It also shows in real time how much beer is being poured as it is poured. Brohemia. (or would this be Brogeneering?)

How It Works

The CO2 tank in a kegerator keeps the keg pressurized so that the beer can keep for a longer period of time than if you were just using a hand pump. The CO2 also provides the push so that when you open the tap, beer flows automatically without needing any pumping. So, if you insert a flow sensor in the beer line, you can monitor every pour of beer through the tap and keep track of how much is left in the keg.

A small turbine provides the monitoring in a flow sensor. As liquid passes through the sensor the turbine spins and outputs an electrical signal that can be monitored using a microcontroller, the Arduino. The Arduino has attachments that allow it to be networked so that when a reading occurs a request can be sent to a server to record that reading.

Building the Kegerator

The first step in this whole ordeal was building the kegerator. Parts needed:

  • 5lb CO2 Tank
  • CO2 Regulator
  • 5/8-inch CO2 Tube
  • Keg Tap
  • 3/8-inch Beer line
  • 4 x 1/2-inch or less hose clamps
  • Keg faucet
  • Keg tap

Making it Fit in the Fridge

I went about this in kind of a roundabout, and admittedly stupid, way. I didn’t properly measure the mini-fridge before hand and just kind of assumed that I would be able to fit a pony keg in it. A pony keg is half the size of a full keg, but has the same diameter. Both are about 17 inches wide and the pony keg is half the height of a normal keg. I measured my fridge from the outside, saw that it was about 20 inches across and assumed that it would be fine so I bought this kegerator conversion kit.

Well, it turns out that the inner dimensions are much less than the outer. It’s about 17 inches exactly across on the inside, but three things really burst my bubble once I considered the inner dimensions (after already buying the conversion kit). First the temperature control was on the right-hand wall:

and it turns out that instead of being ambiently cooled, the fridge is cooled internally from the freezer compartment. The freezer compartment shelf is actually what provides all the cooling:

Through it flows the coolant that is recycled by the fridge’s motor. This design is exactly why the freezer comparment always collects frost..and..is frozen. It’s the coldest part of the fridge so you can’t get rid of that shelf. Third, the motor in the back at the bottom subtracts some of the internal volume of the fridge..that’s why there’s only a small drawer at the bottom.

Kegs come in a variety of dimensions so I thought I might be able to find one that would work with all these variables. The pony keg is just a little too wide and I didn’t want to deal with trying to move and rewire the temperature control. A 5-gallon keg has a smaller diameter (only about 9 inches) but is much taller, rising to between 23 to 26 inches. So the pony keg wouldn’t fit because of the temperature control and the five gallon keg wouldn’t fit because of the freezer compartment. The tap needed to extend up into that shelving area where the freezer was.

I figured I wasn’t the first to try converting a mini-fridge of that size so I got on the Googles and found this video:

I followed his guide pretty closely and was able to bend the freezer compartment down in the same way:

This was one of the more difficult pieces of the project because I had to be very careful not to force the main coolant line too much. Apparently it’s very easy to crack that line and it will totally ruin the fridge if you do. A pair of pliers and a flathead screw driver were enough to bend it down though. The fridge makes some funky noises now, but it’s been running fine for a few days so I guess it worked.

I also removed the door shelving in the same way shown in the video,

With the shelf bent and the door shelving gone, I had enough space to fit the keg and CO2 tank inside so I went about filling the CO2 and getting the lines connected.

Connecting and Testing the Kegerator

I knew I needed to fill my CO2 tank, but the only experience I had had with it up to that point was in playing paintball. I had always gotten my paintball tank filled at Sports Authority so I got my 5lb tank (which is significantly larger than any paintball tank) and lugged it over to the store. It turns out that Sports Authority does not have enough pressure to fill a tank of that size. The guy looked at me like I was crazy when I asked so I left the store and went back to the Googles to figure out where I could get the tank filled.

Welding supplies were my answer. It turns out you can take these big tanks to welding supply companies and trade the empty tank for a full one and just pay about $20 for the CO2. I found Madco and took my tank over and had a full one in about 10 minutes. The guy was really friendly and taught me that the thread on any gas tank actually determines what kind of gas is inside. So he could tell it was pure CO2 just by the type of fitting on the outside of the tank and that all CO2 regulators would fit those threads (God I wish programmers could be that consistent..everything would be compatible!).

Once I had the full CO2 tank I decided that I needed to get a keg. Everything I read online about kegging beer mentioned the use of so called cornelius (corny) kegs — 5 gallon soda kegs that have a special kind of tap. I actually found nothing contrary to this while I was researching so I wrongly assumed that all 5 gallon kegs were these corny kegs. This assumption would come back to bite me later, but for now I just assumed that I could get a corny keg and use it to test the kegerator.



I went to a local home brew supply store and bought a corny keg and the fittings needed to hook it up in the kegerator. The corny keg and taps and CO2 tank all fit in the fridge so I was happy with that so I decided to go about mounting the faucet.

To do this all you really need to do is saw a hole in the fridge (easier said than done). My dad had given me a drill while I was home so I had that covered and just needed to get a hole saw of the right diameter. I went to home depot and picked up a carbon hole saw which was cheaper than the alternative metal saw rated for drilling through metal. The fridge door was metal though. Doh.. well I tried it anyway and the saw was able to get through the door with lots of forcing and expletives. Now I’m just not sure it’s going to be able to get through anything else. Oh well, it was only a couple dollars so no big deal.

Alright so my tap was mounted and it was time to test the lines. I filled the corny keg with water and tapped it and was able to get water flowing through. Figuring this was good enough, I decided to hook in the flow sensor.

Arduino and the Flow Meter

I first needed to make sure I could read from the flow sensor using the Arduino so I setup a test where I could pour water in a measuring cup through the line, have it run through the sensor, and come out into a bottle. The code monitoring the flow could then be easily and repeatedly debugged.

The flow sensor turbine spins as liquid flows through it. Each time it spins, a change in its magnetic field causes an electrical pulse that, when wired to the arduino, can be detected with software running on the microcontroller. The number of pulses is proportional to the amount of liquid that passes through the tube and, with some math, you can find the volume of liquid passing through. I followed this guide on reading the liquid flow rate using the arduino.

Networking the Arduino

The next step was to get the Arduino networked so the data could be saved somewhere meaningful. To do this I used the Arduino ethernet shield, which comes with some great libraries for TCP connections. All Arduinos operate with a simple loop function that runs over and over which can be made to listen for events. At this point I was using that loop to monitor the flow so I just needed to add some code that would send a request to a server when the flow was detected to be greater than 0.

When you connect to a web server using a browser, you almost always establish a keep-alive connection. This type of connection tells the machine that you’re likely going to keep requesting pages so it should stay open to avoid the latency of having to reconnect. This connection also comes with a timeout where the connection will close if no pages are requested within some interval.

Since the sensor needed to have fairly good request resolution (i.e. make a lot of requests in a short period of time) I decided to use a keep alive connection. Every iteration of the arduino loop looks to see if the connection has closed, and if so, it reopens the connection so that when there is flow in the sensor it can be sent to the server.

When the sensor detects flow, a request telling the server the number of pulses in the last one second period is sent to the server. If this request is subsequent to a previous request within 2 seconds, then the requests together are identified as part of one pour and the recorded number of pulses can be used to incrementally show how much liquid has been poured.

Here’s the code running on the arduino:

/*
This sketch monitors a flow sensor and sends any readings greater
than 0 to a server.
 
 Circuit:
 * Ethernet shield attached to pins 10, 11, 12, 13
 * Flow sensor attached to pin 2
 
 Written by Eric Conner
 
 Adapted from code by
 David A. Mellis
 Charles Gantt - http://www.seeedstudio.com/forum/viewtopic.php?f=4&t=989&p=3632
 */
 
#include <SPI.h>
#include <Ethernet.h>
 
// Enter a MAC address and IP address for your controller below.
// The IP address will be dependent on your local network:
byte mac[] = { 0xXX, 0xXX, 0xXX, 0xXX, 0xXX, 0xXX };
byte ip[] = { 192, 168, 1, 10 };
byte server[] = { 50, 19, 92, 1 }; // notcokem
 
// Initialize the Ethernet client library
// with the IP address and port of the server
// that you want to connect to (port 80 is default for HTTP):
Client client(server, 80);
 
volatile int NbTopsFan; //measuring the rising edges of the signal
int Calc;
int hallsensor = 2;    //The pin location of the sensor
 
void rpm ()     //This is the function that the interupt calls
{
  NbTopsFan++;  //This function measures the rising and falling edge of the hall effect sensors signal
}
 
 
void setup() {
  // start the Ethernet connection:
  Ethernet.begin(mac, ip);
  delay(1000);
 
 
  pinMode(hallsensor, INPUT); //initializes digital pin 2 as an input
  Serial.begin(9600); //This is the setup function where the serial port is initialised,
  attachInterrupt(0, rpm, RISING); //and the interrupt is attached
 
  if (client.connect()) {
    Serial.println("connected");
  } else {
    // kf you didn't get a connection to the server:
    Serial.println("connection failed");
  }
}
 
void loop()
{
  NbTopsFan = 0;	//Set NbTops to 0 ready for calculations
  sei();		//Enables interrupts
  delay (1000);	//Wait 1 second
  cli();		//Disable interrupts
  Calc = (NbTopsFan / 7.5); //(Pulse frequency x 60) / 7.5Q, = flow rate in L/hour
  Serial.print (Calc, DEC); //Prints the number calculated above
  Serial.print (" L/hour\r\n"); //Prints "L/hour" and returns a  new line
 
  // consume any data on the connection
  while(client.available()) {
    char c = client.read();
    Serial.print(c);
  }
 
  if(!client.connected()) {
      client.stop();
      Serial.println("Not connected...");
      if(client.connect()) {
        Serial.println("Reconnected to server.");
      }
   }
 
  if(NbTopsFan > 0) {
    Serial.println("MAKING REQUEST!");
    Serial.println(NbTopsFan, DEC);
 
    // Make a HTTP request:
    client.print("GET /flow?freq=");
    client.print(NbTopsFan, DEC);
    client.println(" HTTP/1.1");
    client.println("Host: notcokem.com");
    client.println("Connection: keep-alive");
    client.println("Cache-Control: max-age=0");       
    client.println();
  }

Yea, it’s a GET request with side effects. And anyone can post to that URL. Meh..it’s a kegerator I don’t care much about security. Also, for what it’s worth, I’m using Django on Amazon’s EC2 (just in case i need to scale my keg up, ya know ;-P) on the backend.

At this point the keg was basically setup (only using water) so I made a video showing some of the basics:

Making it Realtime

The final thing I wanted it to be able to do was show, in real time, how much was being poured as it was poured. Here’s the finished product:

This real time push uses a technology called comet. If you use twitter, comet is used when you’re browsing tweets and you see the notification “23 new tweets” or whatever without refreshing the page. Those tweets occurred, in real time, since you started browsing. Twitter is able to push a notification to you that more tweets have been posted.

To accomplish this, I used an event driven server that interfaces well with Django called Orbited and the Stomp protocol for the socket connections that handle the real time back-and-forth. Every connection to a website uses what’s called a socket. You can think of a socket like a mailbox. You put a letter in at one end and it comes out the other. To accomplish real time interaction, multiple mail boxes (sockets) are used so that one can receive any new data.

Here is a bit of the code that handles the real time push if you are curious:

On the server side,

conn = stomp.Connection()
conn.start()
conn.connect()
conn.subscribe(destination='/pours', ack='auto')
...
pourjson = json.dumps({'pk': cur_pour.pk, 'size': cur_pour.size})
conn.send(pourjson, destination='/pours')

On the client,

 
$(document).ready(function() {
    stomp = new STOMPClient();
 
	var pourTimeout;
 
    stomp.onopen = function(){
        //console.log("opening stomp client");
    };
    stomp.onclose = function(c){
        //alert('Lost Connection, Code: ' + c);
    };
    stomp.onerror = function(error){
        alert("Error: " + error);
    };
    stomp.onerrorframe = function(frame){
        alert("Error: " + frame.body);
    };
    stomp.onconnectedframe = function() {
        console.log("Connected. Subscribing");
        //alert("subscribing");
 
        stomp.subscribe("/pours");
    };
    stomp.onmessageframe = function(frame){
        var curPour = $.parseJSON(frame.body);
	$("#messages").html("<h1>Pouring..." + curPour.size + "</h1>")
 
	// clear the previous timeout..we are still pouring
	clearTimeout(pourTimeout);
	pourTimeout = setTimeout(function() { 
	    $("#messages").html(""); 
	}, 3000)
    };
    stomp.connect('localhost', 61613);
});

Alright.. so at this point the kegerator poured, updates were recorded, and everything seemed great. So I went out at got a keg of Blue Moon to test it with.

Woe the Corny Keg

Remember earlier how I said all my research suggested a corny keg is what I would get? Well I was wrong. It turns out that breweries sell 5 gallon kegs with the normal, standard type of keg tap you’d expect. Corny kegs use a completely different system so my taps were all wrong. Luckily the tap wasn’t too hard to replace.

Woe the Foam

Kegs are finicky and the amount of foam in beer is all about temperature and pressure. Foam is just CO2 gas and while the beer is under pressure you’d like most of that CO2 to stay in solution. That is, you want it dissolved in the beer. If you have pockets of CO2 in your keg and lines then you end up with foam everywhere. As temperature is reduced, less pressure is needed to keep the right amount of CO2 in solution. You therefore need the pressure and temperature of the keg to be balanced and the beer line needs to be at the same temperature as the keg.

I had a lot of problems with foam (and am still having some). The first culprit was that my CO2 line wasn’t screwed on all the way. I found that out when I got up this morning and half of the tank was gone (a full tank should last for at least five or more kegs). I also lowered the temperature and have been playing with the pressure (and wasting a lot of beer…). It’s pretty good now, but not the best it could be.

That’s It. That’s All.

All in all it was a really fun project. The next step is probably to add a temperature monitor. I’ll get there eventually. Check it out at http://notcokem.com/.

Migrating an EC2 Image between AWS Availability Regions

Posted on August 5th, 2011 in Amazon AWS, Programming | 3 Comments »

Yesterday I got to experience the joy of migrating an ec2 image from the us-east region to the us-west region on ec2. The best way to do this was to migrate the images through S3. You’ll find some guides about using rsync and copying everything over through ssh, but I think those were written before AWS introduced ec2-bundle-vol and ec2-migrate-image.

Here are the steps we’ll go through to migrate the image:

  1. Create and attach new volume to store bundled image
  2. Bundle the current running volume into the newly attached volume
  3. Upload bundle to S3 in source region
  4. Migrate bundle to S3 in destination region
  5. Register instance in destination region
  6. Launch instance in destination region


Preparation

You’ll need a running instance of your AMI to do this, so launch one if you do not have one up. The next thing to check is to make sure the ec2 command line tools are installed on your instance (they should be there by default). Try the command,

> ec2-bundle-vol --help

to be sure.

You’ll also need the private key and certificate files for your AWS account on the server so that you can issue admin commands using the ec2 command line tools. These have to be regenerated each time you want to download them, so be sure to save them the first time you generate them. They can be found by going to My Account –> Security Credentials and clicking the X.509 Certificates tab under the Access Credentials heading.

Go ahead and upload these to your running instance:

> scp -i mykey.pem pk-xxxx.pem ubuntu@ec2-xxx-xxx-xxx-xxx.compute.amazonaws.com:
> scp -i mykey.pem cert-xxxx.pem ubuntu@ec2-xxx-xxx-xxx-xxx.compute.amazonaws.com:


Create and attach new volume

Cool. Now that everything is prepared for the migration, let’s create and attach a new volume to hold our bundled volume. Go ahead and create an empty volume and be sure to specify the same availability zone as the one in which your instance is running,

The max size of the bundle will be 10GiB so just be sure that you leave enough space for the bundled volume.

Next attach it to the running instance of the image,

Now we need to determine where the newly attached image was added. The comment in red tells us that our linux kernel may rename our device so it won’t be at /dev/sdg necessarily. SSH into the instance and type the command,

> cat /proc/partitions
 202        1    8388608 xvda1
 202       80   12582912 xvdf

Ok, so it was attached then at /dev/xdvf. Let’s mount the new volume:

  1. Create the filesystem on the partition

    > sudo mkfs -t ext3 /dev/xvdf
    
  2. Create a mount point for the new volume,

    > sudo mkdir /mnt/backup
    
  3. Mount the new volume

    > sudo mount /dev/xvdf /mnt/backup
    
  4. Check that everything worked,

    > df
    Filesystem           1K-blocks      Used Available Use% Mounted on
    /dev/xvda1             8256952   5048364   2789160  65% /
    none                    295996       112    295884   1% /dev
    none                    303236         0    303236   0% /dev/shm
    none                    303236        48    303188   1% /var/run
    none                    303236         0    303236   0% /var/lock
    /dev/xvdf             12385456    161908  11594404   2% /mnt/backup
    

Perfect. We’ve mounted a new partition to store our bundled volume.

Bundle Current Volume

Before bundling the volume, check on the architecture as we’ll need to provide this to the bundle-vol command.

> uname -a
(...etc...) x86_64 x86_64 x86_64 GNU/Linux

Ok, now we just need to issue the ec2-bundle-vol command, (the -u option is your amazon aws account number)

> sudo ec2-bundle-vol -d /mnt/backup -k pk-xxxx.pem -u xxxx-xxxx-xxxx -c cert-xxx.pem
Please specify a value for arch [x86_64]:

If the default architecture value is correct, go ahead and hit return. You’ll see output about the initialization of the bundling process which will take quite awhile to complete.

Uploading the Bundle to S3

The next step is to transfer the bundle to S3 so that it can be migrated. In this step you’ll need your aws access key and secret which can be found under Account –> Security Credentials. By default, the ec2-bundle-vol command creates a manifest file describing the image named image.manifest.xml in /mnt/backup. We’ll need this filename for the next command,

> ec2-upload-bundle -b my-us-standard-bucket -m /mnt/backup/image.manifest.xml -a access_key_id -s secret_key

All of the parts that were just created by bundling the volume will be uploaded to S3. In the meantime, figure out the region location you’d like to transfer the image to,

> ec2-describe-regions -C cert-xxxx.pem -K pk-xxxx.pem
REGION     eu-west-1          ec2.eu-west-1.amazonaws.com
REGION     us-east-1          ec2.us-east-1.amazonaws.com
REGION     ap-northeast-1     ec2.ap-northeast-1.amazonaws.com
REGION     us-west-1          ec2.us-west-1.amazonaws.com
REGION     ap-southeast-1     ec2.ap-southeast-1.amazonaws.com

In this tutorial, we’ll use us-west-1.

Migrate Bundle to S3 Bucket in Destination Region

Now that the bundle has been uploaded to S3 in your source region, ensure that you have a bucket in S3 in your destination region to receive the image. I’ll call mine my-us-norcal-bucket. Now, all we need to do is migrate the image from the source bucket to the destination bucket,

> ec2-migrate-image -K pk-xxxx.pem -C cert-xxxx.pem -o access_key_id -w secret_key --bucket my-us-standard-bucket --destination-bucket my-us-norcal-bucket --manifest image.manifest.xml --location us-west-1
Downloading manifest image.manifest.xml from my-us-standard-bucket... OK
Copying 'image.part.000' to 'my-us-standard-bucket/image.part.000'... OK
Copying 'image.part.001' to 'my-us-standard-bucket/image.part.001'... OK
Copying 'image.part.002' to 'my-us-standard-bucket/image.part.002'... OK
...


Register image in the destination region

Finally we just need to register the image:

> ec2-register -K pk-xxxx.pem -C cert-xxxx.pem --region us-west-1 my-us-norcal-bucket/image.manifest.xml --name myimage

That’s it. That’s all.

Markov Chains and PageRank

Posted on May 18th, 2011 in Programming | No Comments »

Introduction

Most modern search engines operate on a three step process,

  1. User types in a query
  2. Engine locates all pages matching that query in its index
  3. The results are ordered in a useful way and returned to the user [1]

The Google search engine’s main improvement over others was a novel method for ranking these results called PageRank. Consider a page u which has hyperlinks to some set of pages F_u. Brin and Page considered each outgoing link from u to a page in F_u as a recommendation by page u for a page in F_u (or as a citation of the page in F_u by page u). Presumably authors of important pages on a topic would only link to other important pages [2].

Every page on the internet has a set of forward links, F_u, and a set of back links, B_u. Forward links are those pages to which the page links and back links are the set of pages which link to u. The figure below illustrates this relationship.

Original Definition of PageRank

The first version of PageRank described by Brin and Page in their seminal 1998 paper is a recursive relationship in which the rank of u, R(u), is calculated based on each of the pages v in B_u,

R(u) = c \sum_{v\in B_u}\frac{R(v)}{N_v}

where N_v = |F_v|, the number of forward links from v and the constant c is a normalizing factor so that the total rank of all web pages is constant. In this way, the rank of u depends on the importance of as well as the number of pages recommended by each page in u’s set of back links. Therefore, the page u will gain a lot of importance from an important page in B_u which has just a few outward links. Conversely, a page with many forward links that isn’t very important will not contribute much to u’s rank. This ingenious definition prevents page authors from gaming the algorithm since a page’s importance depends on the network of all of its recommendations [2].

There are two problems with this simple definition. First, it is unclear how to distribute the weight of pages that are dead ends and, second, it is possible for certain loops in the graph to become rank sinks. As an example, imagine that two pages form a loop consisting of one forward link from each page to the other. Then, during an iterative calculation of the PageRank, the loop will accumulate weight since it has no outbound edges.

To overcome this problem, in their original formulation, Brin and Page define a source of rank E, a vector over pages on the internet, which is usually initialized as a uniform probability distribution over the number of pages (e.g. each entry E(i) = frac{1}{|E|}).
The resultant formula then becomes,

R(u) = c \sum_{v \in B_u} \frac{R(v)}{N_v} + cE(u)

such that c is maximized and the sum of all entries in R is 1. Intuitively, the vector E represents the probability that instead of going from u to v, we jump to a random page on the internet [2].

Markov Chain Interpretation of PageRank

The vector E also points to another way to interpret the PageRank formulation. Imagine a web surfer who starts from a random page and clicks a link at random on each new page he or she encounters. This random clicking can be modeled by a Markov chain whose states, i = 1,\dots,N are pages where the probability of going from state j to state i is given by,


p_{j,i} = \frac{1}{N_j}  \text{ if j links to i}
p_{j,i} = 0 \text{ otherwise}

The PageRank of page u is then defined by the stationary distribution of this random walk. We gauge the importance of a page by determining the probability of finding the random surfer on that page at any given time.

To handle rank sinks, we introduce a restart state i = 0 and some parameter r. Modify the random walk so that, instead of following links, the surfer moves to the restart state with probability p_{i,0} = (1-r). The probability of going from state 0 to state i is then p_{0,i} = r since we have a fixed probability 1-r of remaining in the restart state. Introducing this restart state ensures that the Markov chain is irreducible and aperiodic since any state can be reached from the restart state. Intuitively, this state corresponds to a random surfer who periodically gets bored and jumps to a URL at random on the web [5].

To handle dead ends, we assume that any page which has no forward links instead links to every page on the internet. Assuming this structure ensures that the page’s weight will be distributed evenly over all pages on the web and allow the walk to continue [5].

Now, since the chain is irreducible, aperiodic, and has a finite number of states, there exists a stationary distribution \pi on the chain that satisfies the following two equations,

\pi_i = \sum_{j = 0}^{N} \pi_j p_{j,i}
\sum_{i = 0}^{N} \pi_i = 1

Expanding this using the Markov chain we set up,

\pi_0 = \sum_{j = 0}^{N} \pi_j p_{j,0} = (1-r) \sum_{j = 0}^{N} \pi_j = (1-r)
and, for \pi_i,
\pi_i = \pi_0 p_{0,i} + \sum_{j=1}^N \pi_j p_{j,i}
= (1-r) * \frac{r}{N} + r\sum_{j \in B_i} \pi_j p_{j,i}
= (1-r) * \frac{r}{N} + r\sum_{j \in B_i} \pi_j \frac{1}{N_j}

Divide both sides by r,

\frac{\pi_i}{r} = \frac{1-r}{N} + \sum_{j \in B_i} \frac{\pi_j}{r} \frac{r}{N_j}

Now, let the page rank R(i) = \frac{\pi_i}{r}

R(i) = \frac{1-r}{N} + \sum_{j \in B_i} R(j) \frac{r}{N_j}

Hence, our Markov chain approximates the original formulation of the PageRank algorithm when E is chosen to be uniform over all web pages. The rank is proportional to the stationary distribution of the chain [5].

A Simple Example

Consider a network consisting of 3 pages A, B, and C such that A and B are linked to C which is a dead end. Here we can see that the random surfer will end up in state C and be stuck there. The stationary probability \pi_C would thus be 1 while that of A and B would be 0 as state C had accumulated all of the weight. Also, if there were an additional rank sink (such as a loop) then weight would accumulate there as well.

To remedy this, we introduce the restart state D as well as two green edges which transform C from a dead end into a universal linker. The random surfer model assumes that C, which was a dead end, instead links to every page in the network so that its weight will be distributed fairly among those pages.

In this example, we let the parameter r = 0.85, which is the value widely thought to be used at Google, so that, with probability (1 − r) = 0.15, the random walk goes to the restart state D. Otherwise, with probability r = 0.85, the random walk continues along an outgoing edge from the current page. We get a system of equations for the ranks of the pages,

R(A) = \frac{1 - 0.85}{3} + 0.85 * \frac{R(C)}{2}
R(B) = \frac{1 - 0.85}{3} + 0.85 * \frac{R(C)}{2}
R(C) = \frac{1 - 0.85}{3} + 0.85(R(A) + R(B))

and we get R(A) = R(B) = 0.29 and R(C) = 0.42, which sum to 1 and which make intuitive sense. Since both A and B link to page C, it makes sense that page C should have double the weight of both A and B.

Calculating PageRank

For larger and larger numbers of pages it is not straightforward to setup and solve a system of equations for the rankings. Performing Gaussian elimination with millions of variables would take eons, even for a very fast computer [1]. Instead, define the matrix P so that the ij entry is frac{1}{n_j} if page j has a link to page i and 0 otherwise. We can then define a vector over the rankings which satisfies the equation,

{\bf v} = P{\bf v}

Thus, the ranking vector v is an eigenvector of P with eigenvalue 1. Since some of the values in P are 0, however, this eigenvector may not be unique and there also may be many pages tied with a score of 0. To fix this, we introduce an N by N teleportation matrix each of whose entries is 1/N that corresponds to a surfer randomly choosing a URL as discussed above. With some probability r the surfer continues the random walk and with (1 − r) he chooses to teleport. We can thus define a matrix Q,

Q = rP + (1-r)T

This new matrix will not have any zeros and will have a unique eigenvector v with eigenvalue 1.

References

1] Brian White, Math 51 Lectore Notes: How Google Ranks Web Page. Stanford University, 2006.
[2] Larry Page and Sergey Brin, The PageRank Citation Raking: Bringing Order to the Web. Stanford University, January 29, 1998.
[3] Amy N. Langville and Carl D. Meyer, Deeper Inside PageRank. October 20, 2004.
[4] Nelly Litvak, Markov Chain Analysis of the PageRank Problem. University of Twente.
[5] Jia Li, Markov Chain Interpretation of Google Page Rank. December 1, 2005.
[6] Weijun Xu, Tutorial Notes, Markov Chains. January 30, 2011.

Implementing rand7() using only rand5()

Posted on April 25th, 2011 in Uncategorized | 3 Comments »

Implement rand7(), a method that returns a uniform distribution over the integers 1 to 7, using only rand5().

This question is pretty interesting and not as straightforward as it first appears. For example, a first attempt might be a simple one-liner:

int rand7() {
    return (rand5()/5) * 7;
}

The problem with this approach is that imprecision in integer division will cause us to never get certain values between 1 and 7. For example, the method above will only return 0 or 7 since anything between 1 and 4 divided by 5 will give an integer result of 0. To fix this we want to avoid any kind of integer division and instead focus on using multiplication for the transformation. One approach is to generate rand25() and then repeatedly use it until we get a value between 1 and 7:

int rand25() {
    return (rand5() - 1)*5 + rand5()
}
int rand7() {
    int n = rand25();
    while(n > 7)
         n = rand25();
    return n;
}

In the implementation of rand25() we can’t simply multiply rand5() by 5 since that would only give us the multiples of 5 between 1 and 25. Adding a second call to rand5() allows us to map the digits appropriately.

Amazon EC2, Django, MySQL from the Ground Up

Posted on April 19th, 2011 in Programming | 7 Comments »

Introduction

Here I am going to show you how to get a server set up on EC2 running mysql and django from the ground up.  I often find these kinds of guides difficult to follow because they tend to focus only on one component of the process and use setups that maybe incompatible with the tools you’re working with.  To simplify the process I am going to show a complete setup from start to finish.  There are some finer details I won’t go into like securing the server and setting up iptables, but you can head over to Slicehost for some great articles on the subject.  Ok, let’s do it.

Amazon EC2

The first step you need to take is get an Amazon AWS account.  Once you’ve done that, sign into the management console and select the EC2 tab.  Along the left you’ll see four headlines: Instances, Images, Elastic Block Store, and Networking & Security.

For now we’ll concern ourselves with the first two: instances and images.  The images tab has a sub-option: AMIs.  These Amazon Machine Images are disk images of operating systems with some useful utilities that you can launch as an instance.  An instance is just a machine image running in the elastic cloud.

Go ahead and select AMIs.  On that page from the ‘Viewing’ box select ‘EBS Images’ and filter to the platform ‘Ubuntu’.  We definitely want to set up an Ubuntu server and we chose EBS (Elastic Block Storage) because Amazon allows you to run really small instances, micro instances, that are virtually free. Let’s select the latest Ubuntu Hardy image and launch it as instance.

Select “micro” from the Instance Details modal dialog that pops up and hit Continue.

Nice.  You can hit continue on the next couple of dialogs.  These settings can be configured later and aren’t important for our purposes now.  When you get to the ‘Create Key Pair’ dialog go ahead and create and download a new key.  This pair gives you a private key that you can use to authenticate over ssh to your server the first time without a password.  I’ll call mine eric:

We need to add rules to our security group that allow access to SSH and HTTP.  The ones defined in the image below allow ssh and http access from anywhere.

On the last tab we can review the changes and then hit Launch. Wooo. Now we are getting somewhere.

It will take a minute for the new instance to launch, but once up it will show in your instances pane. This page shows the details we just configured for the instance as well as a public DNS we can use to access it.  Let’s go ahead and fire up an ssh session to our server.  Navigate to wherever you downloaded the private key and initialize the session like so:

> ssh -i eric.pem ubuntu@ec2-##-##-##-##.compute-1.amazonaws.com

Cool. Now that we have a session we’ll want to first make sure aptitude is up to date:

> sudo aptitude update

Once we’ve done that let’s get cracking and install some utilities:

> sudo aptitude install python-django
> sudo aptitude install mysql-server
> sudo aptitude install python-mysqldb
> sudo aptitude install libapache2-mod-wsgi

Create a Django Application
Run django-admin to create a new project in a directory under your home directory:

> mkdir srv
> cd srv
> django-admin startproject mysite

You’ll need to configure your django settings to suit your database and application needs.

Configuring mod_wsgi
Alright now that we have a starter django app going let’s configure mod_wsgi to serve our application. We need to enable a site on apache to do this so let’s create a virtual host.

> sudo nano /etc/apache2/sites-available/mysite.com

Copy the following into the new file:

<VirtualHost *:80>
        ServerName mysite.com
        ServerAlias www.mysite.com
        WSGIScriptAlias / /home/demo/public_html/mysite.com/mysite.wsgi
</VirtualHost>

Now that we’ve defined a wsgi script to serve our app we need to create that script,

> nano /home/demo/public_html/mysite.com/mysite.wsgi

Copy the following configuration to that file

import os
import sys

sys.path.append('/home/demo/public_html/mysite.com/')

os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'

import django.core.handlers.wsgi

application = django.core.handlers.wsgi.WSGIHandler()

Finally we need to enable the site and reload the apache configuration

> sudo a2ensite mysite.com
> sudo /etc/init.d/apache2 reload

That’s it! You should now see the django start page when you navigate to your site.
 

C++ String Concatenation

Posted on April 18th, 2011 in C++ | No Comments »

This is an interesting note about how strings are handled in C++. It turns out that if you attempt to concatenate a string literal and a character, you don’t get the result you expect.

string str = "ab" + 'c';
cout << str << endl;
 
char ch = 'c';
string str1 = "ab";
string str2 = str1 + ch;
cout << str2 << endl;

This code produces the following output:

ed before SaveGraphicsState
abc

The first value printed is obviously not the concatenation we expected. Now, this difference occurs because the C++ string class overloads the + operator. When we use the string literal "ab" the compiler interprets this value as a const char* (a C-style string). When we attempt to add 'c' to it, the compiler converts the character to its integer equivalent and adds it to the char* pointer. Our string now points at some different location in memory and when we print it out it simply prints whatever it found there up to a null value.

Phenomenal Advice from AskHN

Posted on December 22nd, 2010 in Life | No Comments »

I’m reblogging this because I think it is incredible advice:

Ask HN: What were your naivetes in your twenties?

http://news.ycombinator.com/item?id=1474094

Reply by DanielBMarkham:

- Shiny things are nowhere as much fun after you get them as before, even if they have some value. So yes, that Kindle or iPad or whatever will have a real use, and you will be marginally happier with it than without, but not as much as you think

- You can talk yourself into (or out of) anything. The only difference between smart people and other people is that smart people do this with bigger words and more complex arguments. Be confident, but also assume that you are broken in ways you can never spot. Find some ways to get a checksum on life decisions every now and then.

- You don’t need very much at all. Maybe a laptop computer and a couple changes of clothes. Pictures and videos of your life. That’s about it.

- Nothing will ever replace experiences. No matter how big the car, nice the house, or professional-looking the suit, it’s never going to be as much fun or mean as much later as the experiences you have in life. And it’s not just having the experience, it’s looking forward to them, and planning them, and making pictures, movies, and blogs out of them. The best part, oddly, may be the planning. So planning a 200-dollar trip to the beach in the Fall with people you love may give you many hours of happiness this summer — along with the fun of the trip itself.

- Learn to keep picking topics and immersing yourself in them. Most everybody will say to drop out and become part of the system — 9-5 job and TV/games/internet in the evening. If you want a life you could sleep through, that’s fine. But if you want a life you can tell stories about, keep reinventing yourself. And that means constantly learning.

- Lots of shit in life that once looked dumb or stupid opens up into this huge panorama of beauty once you learn the rules. In so many things you are like the guy who never saw a baseball game going to the world series. You kind of get it, but it all seems silly. You don’t know the rules. Decide to learn how to appreciate music, for instance. Get a few college lectures on tape, get some good music to listen to, hang out with folks who are music connoisseurs. The more you know about various art forms, the richer your life is.

- Forget philosophy and meaning-of-life shit. You’re too young. For now, you are what you do. Go do something worthwhile

- Stick to a daily exercise routine at all costs

- If you are changing and getting better, that means you are changing friends too. This was very difficult for me, but you can’t hang out with the same folks and expect to become a better person. There are exceptions, of course, but to a large degree your life is controlled by whom you choose to be friends and hang out with. Be aware that you don’t want to be the same person at 30 as you were at 20. I’m not saying be an asshole — keep being friendly by all means — but be very careful who you hold yourself up against as “normal”

- Dating is a numbers game, like a lot of other things. Learn the skills of dating and don’t sweat picking up chicks (or guys)

- Concentrate on your weaknesses. Make them stronger. When you get to your 30s you can work from your strengths, but there has to be some time in your life to work on shit you suck at, and for me it was when I had the most motivation, my 20s.

- Speaking of which, you have to learn management. No matter what you do, there will be a manager. Even if you don’t want to be one, you have to understand what the job is like to help out your manager. Being a good leader means being a good servant. This concept sounded easy (or facile) to me in my 20s, but proved hard to apply in practice.

- You are never ready for kids. Have them early while you have energy. Read all the books about kids if you must, but realize that creating a replacement is about the most biologically easy thing you could do. After all, evolution has been working on making you a great gene transferral and primate-raising machine, so don’t get paranoid and neurotic about all the latest parenting fashion. Use some sense.

- Everybody wants to be a rock star and win the lottery. Nobody ever does, and the ones that do end up destroying their life. Realize slow success is a million times better than overnight success.

- Much of the stuff in life that normal people do is geared around killing time by distracting you with shiny things of no value. You may never be able to fight this completely, but you should at least deeply understand it and how it affects your goals

- Create. With a passion. There are two major kinds of people in this world, consumers and creators. The herd will push you to consume, life will push you to consume, consumption is the easy and default path, but true joy and a full life come from creating. It does not matter one bit how many people like what you create, just create. Write. Blog. Make videos. Make a movie. Write a program. The longer the format and the more creativity involved, the more you are going to turn on and exercise key parts of your brain. Nobody wants to be 80 and only have stories of being at the office, but fuck, if you were at the office creating something at least you tried to make a difference. I’d rather be that guy than the one who watched Sumo wrestling everyday (or played 20,000 hours of WoW during his 20s) The only thing you’re going to have at the end of your life are the decisions you made, the things you created, and memories. Learn to maximize these things.

Obsession with Grades

Posted on August 7th, 2010 in Life | No Comments »

I recently read Erica Goldson’s speech and it made me think about my own speech. The quote, author anonymous, on which I centered my message valued obstacles:

For a long time it seemed to me that life was about to begin– real life.
But there was always some obstacle in the way, something to be gotten through first,
Some unfinished business, time still to be served, a debt to be paid.
At last it dawned on me that these obstacles were my life.
This perspective has helped me to see that there is no way to happiness.
Happiness is the way.

I chose this quote because, for me, most of high school seemed like waiting and I was desperately searching for a way to reconcile that feeling with my experience.   In a lot of ways, I identified very strongly with Erica Goldson’s message: high school crippled our ability to learn what our actual interests were.  And my high school was more guilty of this handicap than most.  I went to a school that focused heavily on goal-based incremental learning.  We used a system called Saxon Math that stresses incremental development through thirty question problem sets at the end of every chapter.  Each day the student learns one new concept and solves a few problems based on that concept.  The other twenty or so problems review older material.  Under each question number too the book indicates which section to go back to in order to review the concept covered by the problem.  Almost all the problems covered by the book are exact copies of previous problems with only the numbers changed.  The program acts as if no creativity exists in math, as if no insight or passion led to the development of the methods.  The books focus wholly on what you do to solve a problem rather than the elegance and intelligence of the approach.  Saxon divorces the method from the passion that created it.

When education works this way kids quickly find anything intellectual stale.  They learn to focus on goals (i.e. getting into a good college) or fixate on some vague definition of success.  They will pursue the subjects that come easily for them and go through the motions to get A’s in those that don’t.  Getting A’s feels good (I don’t care who you are) and, when it is the only emotion tied to secondary education, kids will become obsessed with the rewarding feeling they get from earning A’s.

Teachers are guilty of perpetuating this cycle. The best teachers always have two traits: they’re personable and passionate.  The best teacher I know would get genuinely giddy about Newtonian mechanics six times a day.  You could see the excitement in his face during every class at the chance to share the subject he loved with students.  These types of people are very hard to come by.  Many other teachers have found their passion but have trouble communicating it to a large audience.  Still others never find that passion.  Most of these teachers (which are most in schools today) like students who keep quiet and get A’s.  These students do not challenge the teacher’s knowledge (it takes a very special type of person to say “I don’t know” in front of a classroom) while looking good on teacher evaluations.

Is keeping quiet and getting good grades always destructive?  My sense is no.  I think that, with the right teacher, a desire to get A’s can cause a student to discover a hidden passion. My feeling, however, is that the onus should not fall on teachers to show passion to students.  The ability to grapple emotionally with intellectual topics needs to start before high school.  There needs to be active effort to give kids both the language and practice necessary to question their feelings for the subjects they study.  This ability must develop before the social pressures of fitting in take shape.  I believe that an emotional need to question would lead to the development of passion for study.

Facebook Graph API Friends Photos and Python SDK

Posted on July 20th, 2010 in Programming | 2 Comments »

I discovered today that you actually need to request four permissions to be able to get photos of a user’s friends otherwise you’ll just get an empty array back from the graph API.
With just friends_photos you’ll get

{'data': []}

when you request photos using the graph api via:

https://graph.facebook.com/[uid]/photos?access_token=[access_token]

Instead you have to get four permissions:
friends_photos,user_photos,user_photo_video_tags,friends_photo_video_tags
and then you will get back the photos you need.

With the Facebook python SDK this looks like:

facebook_obj.graph.get_connections([uid], "photos")

A Resubmittable Form with Ajax and Django

Posted on July 19th, 2010 in Uncategorized | 1 Comment »

Overview

Tonight I wanted to put together a simple form in the sidebar of a Django page that submits via ajax and then reloads with a new version of the same form.  Hunch.com sort of does this (they don’t use a form), but they have a question that, when answered, comes back with a new question so that the site can continuously learn new tidbits about you:

hunch_side

At first it seemed I would need to first pass an object to my index view and load the form template by passing along the context.  This seemed roundabout though as the index didn’t need to see the object at all.  The solution I settled on was to create one form view that returns a json object containing information that would fill the form.  I would request this via ajax at page load and with subsequent form submissions.  The code remains reusable and no efforts are duplicated.

The View
The view is pretty simple. Handle the post request if it is a post then return a json object representing our form information.

def item_form(request):
     if request.method == "POST":
         form = ItemForm(request.POST)
         ... handle form validation ...
 
     itemset = Item.objects.all()
     data = serializers.serialize("json", itemset)
     return HttpResponse(data, mimetype="application/javascript")

The Template
First we’ll write the html and then fill in the javascript. Notice that I’ve left some of the elements empty as these will be filled in when we get passed back a json object. We’ll write a javascript function submit_item_form to handle posting the appropriate form fields via ajax. It is advisable to hide the sidebar-item element to begin with and then show it after loading content so that the form appears gracefully with the page load.

<div class="sidebar-item">
	<form action="/ajax/item_form" method="post" id="item_form">
		<h2></h2>
		<img src="" class="small-item-image" />
		<input type="text" name="name" value="" />
		<input type="submit" value="Submit" 
                      onclick="submit_item_form('item_form'); return false;"/>
	</form>
</div>

Next the jquery that loads the json object. Jquery’s getJSON method automatically parses the response string into a json object.

$(function() {
	$.getJSON("/ajax/item_form", function(data) {
		fill_item_form(data);
	});
});

And then the method that loads the data returned into our prefilled form:

function fill_item_form(json)
{
	var item = json[0]; // we'll only use 1 item for this example
 
	$(".sidebar-item h2").html(item.fields.title);
	$(".sidebar-item .small-item-image").attr("src", item.fields.photo_url);
 
	$(".sidebar-item").show();
}

Now we need to handle the form submission and we’re golden:

function submit_item_form(form_id)
{
	$.ajax({
		type: "POST",
		url: "/ajax/item_form",
		data: $('#' + form_id).serialize(),
		dataType: "json",
		success: function(json){
		     fill_item_form(json);
	        }
	});
}

Now the same function that loads our content reloads the content we get back after submitting the form so that a new form can be submitted.