Imagine that there are, as often in literature on statistical re-identification, two databases: an un-identified database and an identified database

Imagine that there are, as often in literature on statistical re-identification, two databases: an un-identified database and an identified database

Another technology is cloud computing. Cloud computing allows cheaply and efficiently running many, many face comparisons in just a few seconds, otherwise that would not be possible with normal computers.

5.Ubiquitous computing

The idea of ubiquitous computing is that I can just take my smartphone and connect it to the Internet, and although my smartphone does not have the processing power to do 500 million face comparisons in seconds, something up there in the cloud can, and I just need to connect to it to run face recognition in real time in the street.

They could come from LinkedIn; they could come from organizations and governmental databases and so forth

So this is what we are talking about: combining all these technologies, and in particular face recognition and publicly available online social network data, for the purpose of large-scale, automated, peer-based individual re-identification both online and offline; and individual informational inference, the inference of additional information about these individuals, potentially sensitive data.

In Latanya Sweeney’s example, the identified database was the voter registration list for Massachusetts voters. The un-identified database was a sensitive database of medical discharged data – obviously un-identified because hospitals wanted to share that information but they didn’t want to share it with the names of the people suffering from the different diseases. Latanya showed that you could connect the two.

So in our story the un-identified database could be images that you find on maybe match – a dating site; or maybe AdultFriendFinder – a dating ‘plus’ website; or prosper – a financial site where people look for micro funding, micro loans and often use photos because some researchers showed that profiles with photos are more likely to get funding, but they don’t use their names because they often reveal sensitive information about themselves such as their credit scores. Of course un-identified faces are also yours and mine when we walk in the street, to a stranger we are an anonymous face.

Then there are of course the identified databases, and I would argue that for most of you in this room, for most of us, there are already somewhere up in the Internet identified facial images. They may come from Facebook, if you have joined Facebook and put a photo of yourself and your real name.

Face recognition finds a match between two images of possibly the same person, and because of this we can use identified database to give the name to a record in an un-identified database. But not only that, the story we are telling is that with the personal information that you find online, once you have a name, you can look for more information such as maybe the social security number, or e out a couple of years ago showed how you could predict sexual orientation of Facebook users based on the orientation of their friends, or maybe credit scores, and because of this, if you are able to infer information for the identified database, and if you are able to find the match through recognition to the un-identified database – you close the circle, you end up connecting the sensitive data to the un-identified up till then anonymous person.

So this is the story, in a nutshell. The story is that your face is truly the veritable link between your offline persona and your online persona or personas, your many online identities. Eric Schmidt said that escort in Birmingham maybe when we turn eighteen we should be allowed to change our names to recreate from scratch our reputation. Well, but the problem is it is much harder to change your face. Your face is a constant which connects these different personas, and as I mentioned, for most of us there are already identified facial images online.