Chris Albon and I co-host a podcast about data science called Partially Derivative. Some of our episodes have been submitted to Hacker News, but we’ve never made to the front page. I wanted to figure out why.
Ah, the soft, warm glow of Hacker News fame. Coveted by all of geekdom – and for good reason. For those unfamiliar with Hacker News, it’s a simple list of links and forum posts popular with the startup community. Similar to reddit or the old Digg, it’s a place where geeks share, stumble upon and discuss interesting content.
A blog post that climbs the ranks to the Hacker News front page can mean the difference between noteriety and obscurity for the post’s author, so competition is fierce. Only stories voted on by the community – or, maybe, commented on by the community, or perhaps submitted by really popular community members, or something – ever reach the top. And there’s the rub: no one actually knows why some posts make it to the front page while others don’t. Is it quality? Is it relevance? Is it luck?
Let’s see if we can figure it out.
We scraped news.ycombinator.com and grabbed data via their API over three days. In that time we collected:
483 front-page posts
Hacker News users interact with content in a bunch of different of ways, and we’ll start with the assumption that any of these might be relevant. People can vote stories up or down (posts get +1 for an upvote, -1 for a downvote), comment on stories, vote on comments and comment on other comments. Users also get “karma,” which is earned by participating in the community, with bonus points for being well-liked (i.e., people vote for your posts, respond to your comments, and other stuff like that). It’s not entirely clear how karma is assigned but it’s safe to assume this is a measure of status. Lastly, timeliness seems important, so we’ll also keep tabs on the rate at which posts acquire points and comments.
A common perception of Hacker News is that it’s an echo chamber for Bay Area startup hipsters. Mostly young, white and asian men chatting about trendy programming languages and technology-enabled, libertarian utopia. While our sample size is limited, a quick analysis of the available post titles and user profile descriptions suggests both the content and community are not, in fact, totally homogeneous.
If the same types of posts dominated the front page, we’d expect to see post titles self-organize into distinct categories – maybe about Apple, smart phones, or popular programming languages. Similarly if the Hacker News community was limited solely to cliques promoting each other’s content, it’d be safe to assume their profile descriptions would clump together in the same way. However at least amongst the user profiles and post titles we collected, these patterns don’t emerge (see footnote about metholodogy for more about our approach to topic discovery).
More to the point, group dynamics and/or post topic have no impact on a post’s chances of making the front page.
A more thorough analysis might look at the implied relationships between users (like which users often respond to each other’s comments), but all we’re interested in is how posts make the front page, so let’s leave this (probably interesting) sociological investigation to someone else.
Democracy After All?
Even if there’s no topical litmus test, it’s still possible that a small cabal of powerful community members act as defacto gatekeepers for the Hacker News front page. After all, what’s the point of amassing karma if you can’t use your power to shape the community?
Karma isn’t meaningless. While plenty of users with low karma submitted posts that ascended to the front page, it looks like posts submitted by influencial users get to the front page more quickly (though the correlation is pretty weak).
On the other hand, posts that earn points quickly clearly have an advantage over those that gradually acquire points over time.
This seems too obvious. Could it possibly be that simple? Just to be sure let’s look at all the possible factors that might help a post escape from the forgotten wasteland of page two or three and ascend to front page glory. When comparing posts that made the front page to those that didn’t, we can look at the number of direct responses (comments on the original post), the total number of comments (comments on the original post plus responses to those comments), max and average commenter karma, and of course submitter karma and points per minute. We’ll compare these values for posts that made the front page to those posts that didn’t, and evaluate which (if any) of these calculated values is a good predictor of success.
Our initial discovery seems to be confirmed. The authority of the user submitting the post has a small impact on that post’s success, but the best predictor of a post’s success is the rate at which it acquires points.
Which is basically to say that, as best we can tell, Hacker News is a democracy. Your karma might give you a boost, but unless you’re a super famous community member, it’s not enough.
Stories that score points the fastest make it to the front page. Period.
Quick Notes About Methodology
Time permitting I’ll write a more in-depth post about how I collected, processed and explored the data. Here’s the short version:
Topic discovery on post titles and user profiles was tried using both Latent Semantic Indexing and K-means clustering.
Feature selection on calculated values was performed using a Random Forest classifier.
Data was collected by scraping pages 1-4 of news.ycombinator.com once per minute over three days with periodic interruptions due to connectivity. User profiles and comment content was collected via the Hacker News API.