Remembering an assignment I had for my visualization class, I figured that a fitting end for my facebook life would be to sum up all the text on my wall in some sort of visualization. So, while I don't intend this to be a step by step guide on how to do the same, I will document the mini-project that was visualizing my facebook wall data.
The first step was to get all of my data from facebook. In your account settings you will see an option to do this.
Account Settings |
The file of interest here is wall.html, however, you can't really use the file as-is because it contains all the extra HTML stuff that you don't really need if you just want the actual text that you see on your wall. For example, for the simple status message "will be moving to ramosisms.blogspot.com once he deletes his facebook account" the actual html file will contain:
<div class="feedentry">
<span class="profile">Alan 'Salvacion' Ramos</span>
will be moving to ramosisms.blogspot.com once he deletes his facebook account
<div class="timerow">
<span class="time">November 18, 2010 at 10:12 am</span>
So in order to really start seeing what's been on my wall I had to get rid of all the HTML tags. My initial idea was to use python and RE to extract only the text from replies and posts. However, after much failure and insult to my coding skills, I decided I needed an easier solution. Luckily I had a light bult moment and realized that if I knew what I didn't want, I could use find and replace in TextWrangler to replace whatever I didn't need with an empty space.
TextWrangler does support RE searches which made my life a bit easier. I decided I wanted to remove all the tags, the time stamps, and the random special html entities using the following searches:
<.*?> - tags
&.*?; - special entities
\d\d?, \d\d\d\d at \d\d?:\d\d \w\w - timestamp
With my data file now cleaned up, I could proceed with making the visualization. IBM has a pretty cool site that lets you upload data and make a visualization using the styles they offer, and it's all free. Each visualization also has multiple options. For example, the one I used "word cloud" lets you remove common english words, which is very helpful. After customizing my visualization I had the following:
Words Appearing on My Wall |
Words Appearing on My Wall Sans Months |
I was more happy with this result than the first. While I intended this to be more of a tech case study than an analysis of the results, I will say that the "alan" instance is the amount of times other people wrote my name on my wall since I removed all instances of my name as an author of a post.
I still think there is more value I can get out of my facebook data, perhaps next time I'll make a graph of the number of comments over time or something like that.