"so I says to Mable..."
 

Visualizing My Facebook Data

November 21, 2010

As I said in the intro to this blog, I started this blog as an alternative to my facebook account which I would be deleting. However, it seemed sort of anti-climactic simply to delete the account without some type of closure. The idea for this closure came to me after I found out that you can download all of your facebook data.

Remembering an assignment I had for my visualization class, I figured that a fitting end for my facebook life would be to sum up all the text on my wall in some sort of visualization. So, while I don't intend this to be a step by step guide on how to do the same, I will document the mini-project that was visualizing my facebook wall data.

The first step was to get all of my data from facebook. In your account settings you will see an option to do this.
Account Settings
After you assure facebook that you are who you say you are and that you want your data it will take some time to compile it all. Once this is done you will get an email with a link to your data. After downloading this file you will realize that it comes in HTML form. Basically, facebook will give you a  local version of your pages.

The file of interest here is wall.html, however, you can't really use the file as-is because it contains all the extra HTML stuff that you don't really need if you just want the actual text that you see on your wall. For example, for the simple status message "will be moving to ramosisms.blogspot.com once he deletes his facebook account" the actual html file will contain:

<div class="feedentry">
<span class="profile">Alan 'Salvacion' Ramos</span>
will be moving to ramosisms.blogspot.com once he deletes his facebook account
<div class="timerow">
<span class="time">November 18, 2010 at 10:12 am</span>


So in order to really start seeing what's been on my wall I had to get rid of all the HTML tags. My initial idea was to use python and RE to extract only the text from replies and posts. However, after much failure and insult to my coding skills, I decided I needed an easier solution. Luckily I had a light bult moment and realized that if I knew what I didn't want, I could use find and replace in TextWrangler to replace whatever I didn't need with an empty space.

TextWrangler does support RE searches which made my life a bit easier. I decided I wanted to remove all the tags, the time stamps, and the random special html entities using the following searches:

<.*?> - tags
&.*?; - special entities
\d\d?, \d\d\d\d at \d\d?:\d\d \w\w - timestamp


With my data file now cleaned up, I could proceed with making the visualization. IBM has a pretty cool site that lets you upload data and make a visualization using the styles they offer, and it's all free. Each visualization also has multiple options. For example, the one I used "word cloud" lets you remove common english words, which is very helpful. After customizing my visualization I had the following:
Words Appearing on My Wall
Originally I decided to leave in the month portion of the time stamp because I wanted to see which months were more active for my wall. However, after seeing the result, I thought that the months overshadowed everything else so I decided to remove them and create the visualization again.
Words Appearing on My Wall Sans Months

I was more happy with this result than the first. While I intended this to be more of a tech case study than an analysis of the results, I will say that the "alan" instance is the amount of times other people wrote my name on my wall since I removed all instances of my name as an author of a post.

I still think there is more value I can get out of my facebook data, perhaps next time I'll make a graph of the number of comments over time or something like that.