Collecting data on millions of Facebook users to analyse their psychological traits

The Guardian has revealed how British academics have collected information about millions of Facebook users and used the data to score them on openness, conscientiousness, extraversion, agreeableness and neuroticism. The academics were paid by funders of the campaign of US presidential candidate Ted «Carpet Bomb» Cruz.

The fact that information from public Facebook profiles can be used to create psychological profiles is intriguing but not really new. Researchers have claimed they can assess someone’s personality reasonably well by analysing what they like on Facebook or by analysing personal information, activities and preferences, language features and internal Facebook statistics.

What was new to me (but apparently not to everyone) is how the academics connected to the Cruz campaign went about collecting people’s Facebook data. They used Amazon’s Mechanical Turk platform to recruit people to fill out a questionnaire that would give the researchers access to that person’s Facebook profile. Not only would they download data about the participants themselves, but also about their Facebook friends - even though those friends were unaware of this and hadn’t given permission. Participants were paid about $1 each for access to their Facebook network.

According to the Guardian, Facebook users had on average 340 friends in 2014. Of course, there’s considerable overlap between people’s networks so it can be assumed that the average participant would yield far less than 340 new profiles. Even so, this would seem to be a pretty efficient - if sneaky - way to collect data on Facebook users.

The Guardian doesn’t discuss whether this method would still work today, but I doubt it would. Out of concern for the privacy of its users (sure!) Facebook has cut off access to users’ friends’ data when it updated it’s API earlier this year.

Can they track you by your smartphone battery status

Who knew. Apparently, websites can collect pretty detailed information about the battery of the laptop or smartphone you’re using. They can see if you’re currently charging the battery. If you are, they can see how long it’ll take before it’s fully charged. If you aren’t, they can see how long it’ll last.

I read about this in the Guardian, which has an article about a study that apparently found that the detailed information obtained through the HTML5 battery status API can in some cases be used to identify users, at least over a short period, even if they use a VPN or Chrome’s private browsing mode.

However, the API doesn’t work in all browsers. In fact, I googled around to find out how it works and it turns out you need different code to get it working in Firefox than in Chrome. And while I got the code working on my Macbook, it didn’t work on my iPhone - not even with Chrome.

So how about your device? You can check below if the code works with your combination of browser and device. Let me know!

Your battery status

Let me know what it shows on your device:

Apparently, it’s still possible to fool Google

Researchers have found that men are almost six times more likely than women to be shown ads on news websites for a career coaching service for $200k+ executive positions. The findings suggest some of the algorithms involved in tracking internet users have discriminatory outcomes. They might lead to «deeper investigations by either the companies themselves or by regulatory bodies», the authors add (via WP).

Not just the findings are interesting, but so is the research method. The researchers created AdFisher, basically a smart web scraper built with Python. AdFisher can create large numbers of «agents», have them visit certain websites or alter their profile via the Google Ad Settings, and then see what ads it gets shown on websites like the Times of India or the Guardian. Further, it will organise these activities in such a way that experimental and control conditions can be compared, and it will even analyse the results, using machine learning to figure out what may have triggered differences in what ads are shown.

Somehow this reminded me of the patent Apple (!) obtained for a cloning service to fool the companies that are tracking you. The service would mimick some of your normal online behaviour, but also do other stuff, such as faking an interest in basket weaving. This way it would contaminate the profile these companies keep of you, perhaps to the point of making it useless.

So would you be able to get away with that? If you open a bunch of browser windows with Google searches, Google will ask you to fill out a captcha to make sure you’re human («Our systems have detected unusual traffic from your computer network. This page checks to see if it’s really you sending the requests, and not a robot»). This is a very simple example, but given the fast-developing ability to analyse patterns in online behaviour, you’d expect that companies like Google and Facebook would have become eerily accurate at identifying (real) internet users and telling them from bots.

Against that background, it’s somehow reassuring that it’s apparently still possible to fool Google by creating a fake profile.

P.s. I’ve never been shown ads for a career coaching service for $200k+ executive positions, but if they do turn up I’ll just tell Google I’m a woman.

Big Brother: state or capitalist

George Orwell’s Nineteen Eighty-Four describes a future characterized by total surveillance (with telescreens observing people in their own homes, even monitoring their heartbeat and recognizing their facial expression). This surveillance is carried out by the state and its helpers. Corporations play no role in it.

In fact, corporations and capitalism are a thing of the past in Nineteen Eighty-Four, for private property has been abolished. A children’s book explains that capitalists were rich, ugly men wearing top hats. The Party constantly emphasizes how terrible conditions were before the Revolution and how much better they are today. But the main character, Winston Smith, can’t help but wonder if things had been really that bad in the past and if capitalists had really been such terrible creatures.

The suggestion is clear: the state is using capitalists as a scapegoat to mask its own failings (in fact, if I were a member of today’s whining one percent, I'd claim that Orwell had predicted the current «rising tide of hatred of the successful one percent»).

Today, thirty years after 1984, private property hasn’t been abolished, but we are approaching a level of surveillance pretty close to what Orwell described. When we try to explain what’s going on, we frequently use the term Big Brother. But when we do, are we referring to the state, as Orwell did, or do we have capitalists in mind?

To explore this matter, I looked up how often newspaper articles mention Big Brother in combination with either the names of government agencies, or the names Google and Facebook (of course I should have included Apple, notwithstanding their smart privacy patent, but I left them out for practical reasons explained below). The results are shown in the graph below. For the non-Dutch: NRC is a Dutch newspaper and AIVD is the Dutch intelligence service.

It appears that Google and Facebook turn up in combination with Big Brother far more often than government agencies like the CIA, MI5 or AIVD. However, as the red bars show, this has changed since the revelations of Edward Snowden. Since May last year, the NSA has been mentioned in combination with Big Brother more often than Google or Facebook (in the Guardian, the same applies to the GCHQ).

So Orwell didn’t foresee the role of corporations in mass surveillance, and we used to have a blind spot for the role of the state - but Snowden seems to have fixed that.


I used the Guardian and New York Times APIs to look up how often names of selected state agencies and corporations have appeared in combination with Big Brother in articles over the past ten years. I removed the results from the Guardian media section to get rid of most references to the Big Brother TV show. I wanted to include Apple, but unfortunately, the newspaper APIs don’t distinguish between apple and Apple. I thought searching for iPhone might be a practical solution, but the Guardian results included articles containing ‘I phone’. The NRC doesn’t have an API so I looked up the terms manually; the timeline to the right of the search results makes it quite easy to count the number of post Snowden occurences. In all cases, the method to search the newspaper archives is imperfect in that it yields some unwanted results (e.g. articles mentioning somebody’s big brother which have nothing to do with Big Brother).

Protect your privacy with an online doppelgänger

Apple has obtained a patent for a rather intruiguing idea: protect your privacy by spreading personal data that are partly correct, but partly incorrect.

The idea to have a cloning service create a doppelgänger with, for example, your birth data and hair colour, but with other interests - say basket weaving. This service would make search queries and click on results, click on ads, fill out surveys, chat, send emails and place orders, all in your name. A smart cloning service could fool companies like Google and Facebook and contaminate the profiles they keep of you to the point of making them useless.

The inventor, Stephen Carter, explains in the patent filing why we need such a doppelgänger generator:

Users are growing uncomfortable with the amount of information marketers possess today about them and many feel it is an invasion of their privacy even if the marketing is currently considered to be lawful […] The electronic age has given rise to what is now known as thousands of ‘Little Brothers’, who perform internet surveillance by collecting information to form electronic profiles about a user not through human eyes or through the lens of a camera but through data collection.

But wait - isn’t that a description of what Apple does? It has already been speculated that Apple hasn’t acquired the patent to launch a product to frustrate trackers, but to prevent others from launching such a product. Or perhaps Apple wants to sabotage the business model of Google and Facebook, while continuing tracking people through their iPhones. In any case, Apple seems to think it’s possible that Carter’s idea might work.

Meanwhile, tech site the Register wonders about the practical aspects of the invention:

All we know for sure is that it’s going to be quite weird when basket-weaving kits that your anti-surveillance cloneware has ordered on eBay start arriving at your house.

Via Webwereld