A while ago, Open Culture wrote about a 1955 US Army manual entitled How to spot a communist. According to the manual, communists have a preference for long sentences and tend to use expressions like:
integrative thinking, vanguard, comrade, hootenanny, chauvinism, book-burning, syncretistic faith, bourgeois-nationalism, jingoism, colonialism, hooliganism, ruling class, progressive, demagogy, dialectical, witch-hunt, reactionary, exploitation, oppressive, materialist.
What happened in the 1950s is pretty terrible, but that doesn’t mean we can’t have a bit of fun with the manual. I used the New York Times Article Search API to look up which of its writers actually use terms like hootenanny, book-burning and jingoism. The results are summarised below.
Interestingly, many of the users of «communist» terms are either foreign correspondents or art, music and film critics. While it’s possible that people who have an affinity with the arts tend to sympathise with communism, an alternative explanation would be that critics have more freedom than «regular» journalists to use somewhat exotic and expressive terms like the ones the US Army associated with communism.
Also of interest is that one of the current writers on the list is Ross Douthat, the main conservative columnist of the New York Times. In his articles, he uses terms like materialist, oppressive, reactionary, exploitation, vanguard, ruling class, progressive and chauvinism. Surely he wouldn’t be a reformed communist - would he?
The New York Times Article Search API is a great tool, but you have to keep in mind that digitising the archive isn’t an entirely error-free process. For example, sometimes bits of information end up in the lastname field that don’t belong there (e.g. "lastname": "DURANTYMOSCOW"). While it’s possible to correct some of these issues, it’s likely that search results will in some way be incomplete.
To get a manageable dataset, I looked up all articles containing any combination of two terms from the manual. I then calculated a score for each author by simply counting the number of unique terms they have used.
An alternative would have been to correct for the total number of articles per author in the NYT archive. It took me a while to figure out how to search by author using the NYT API. It turns out you can search for terms appearing in the byline using
?fq=byline:("firstname middlename lastname") - even though this option isn’t mentioned in the documentation. I’m not entirely sure such a search will return articles where the byline/original field is empty.
As you might expect, there’s a correlation between the number of articles per author and the number of unique terms this author has used.
All in all, it would be possible to calculate a relative score, for example number of terms used per 1,000 articles, but this may have unintended consequences. To take an extreme example: an author who has written one article which happened to contain three terms would get a score of 3,000 using this method, whereas an author who has thousands of articles and consistently uses a broad range of terms but not at a rate of three per article would get a (considerably) lower score.
I decided to stick with the absolute number of unique terms per author. This has the disadvantage that authors who have written few articles are unlikely to show up in the analysis, but I’m not sure that this problem can be adequately solved by calculating a relative score.
The Python and R code used to collect and analyse the data is available on Github.