Counting unofficial retweets

One way of finding out who’s influential on Twitter is to count how often people are retweeted. I did so when analysing the Twitter discussion on the election of the new president of the Dutch trade union confederation FNV.

I counted both ‘official’ retweets – retweets acknowledged by Twitter – and ‘unofficial’ retweets. Unofficial retweets may have been generated by unofficial Twitter apps (I think) or users may have typed them manually. They may have the pattern RT@username:text (which is also the pattern of official retweets), the pattern "@username:text", or the pattern text via @username (this pattern wasn’t in my original analysis). Perhaps there are more flavours around that I don’t know of.

When looking for background information, I came across a comment by an SEO analyst explaining why they don’t count unofficial retweets:

To try to count non-official RTs is a messy business, as it would require a lot more Twitter API calls for possibly negligible benefit. Why negligible? We make an assumption that non-official RTs correlate strongly with official RTs. We can then use the latter as a proxy for the former. This assumption may not be true, of course. That is, by not using non-official RTs, we may ignore pockets of users who generate many more unofficial RTs... perhaps those who ask a question, or invite a response? (comment by Pete Bray on this article)

Below is some information on how often users in my FNV sample were retweeted within that same sample.

Prevalence of types of retweets
Official retweet RT@username:text "@username:text"text via @username
Sum 3,544 113 9860

At least within this sample, unofficial retweets are not very common: they make up about seven percent of all retweets. And here’s some information on how official and unofficial retweets are correlated:

Correlations between types of retweets (spearman)
Official retweet RT@username:text "@username:text"text via @username
Official retweet 1 0.28 0.250.13
RT@username:text 0.28 1 0.140.10
"@username:text" 0.25 0.14 10.13
text via @username0.130.100.131

Users who generate more official retweets also tend to generate more unofficial retweets, but the correlation is not particularly strong. So based on this sample, it would seem conceivable that there are indeed ‘pockets of users who generate many more unofficial RTs’ – as suggested by Bray.


The sample contains close to 11,000 tweets containing the string FNV, collected between 26 April and 16 May. For background see this article; the analysis of retweets in the FNV debate is here. The code I used for the analysis above is here.

If you have a sample of tweets and you want to know how often users in that sample have been retweeted, you can only find that out for retweets that are also in the same sample. In my case that wasn’t a problem, for I was interested in who was influential within a specific discussion. However, if you’d be interested in constructing a general measure of how influential twitter users are, you’d probably need a pretty large sample of tweets.

The messiest type of retweet is probably text via @username. Often these aren’t real retweets but added by services like sharethis or AddThis or by news websites that have their own share service (I only included users if they were already in the sample, i.e. had tweeted texts containing FNV; this eliminates sharethis and AddThis tweets). I looked for the pattern via @ followed by any number of non-whitespace characters at the end of the line, or followed by any number of non-whitespace characters before the first whitespace. This method may not be 100% accurate, but I think it’ll do. The regex patterns used to find the different types of retweets are in the code.

Because the retweet counts are not normally distributed (many have a value of 0) I used spearman rank correlation; pearson’s correlation would have yielded stronger - but still not particularly strong - correlations of up to 0.5.