Facebook data PCA


I wanted to do this exploration because I have noticed some posts get a huge reach with very few shares/likes/comments.  I have also noticed some posts have a large number of shares/likes/comments but reach a small number of people.

I did principal component analysis on Facebook data from our last 2,000 posts (4/16/2015 – 7/7/2015).  Principal components can reveal structure to data which you cannot see otherwise.  PCA refresher here: http://setosa.io/ev/principal-component-analysis/

There are four variables – number of likes, number of comments, number of shares, and reach. Reach is the number of unique people who saw the post.

I broke down the data by type – link, photo, status update, or video.

Below are 3 combinations of principal components. The most striking observation (to me) is that video posts (purple) can get a huge reach with very few shares/likes/comments.  However, just being a video post does not guarantee a large reach.

I know this is pretty well-known about Facebook videos.  It is neat, however, to see it in the data structure.

pc12 pc14 pc34


Code for the ggplot2 biplot: https://github.com/vqv/ggbiplot/blob/master/R/ggbiplot.r

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: