Artificial intelligence reveals more personal information about you in your uploaded photos.
It's no secret that platforms like Google and Facebook are collecting massive amounts of user data for their marketing services. However, unless you're a tech geek, you may not be aware, that not only are these platforms collecting data on what you say or do, they are also collecting information directly from the photos and videos you upload using image processing and analysis.
Unfortunately, the image processing techniques that these platforms use are a trade secret. Also, the platforms don't tell us what personal information they are collecting. However, you can get an idea of what information they are capable of gathering with Artificial Intelligence (AI) services like Amazon Rekognition. Below are some examples from images that I processed using the Amazon Rekognition Console to illustrate what information platforms might be collecting from your photos or videos.
Platforms use Text Detection to gather all the words in the photo, everything from street signs to what it says on your shirt, to help with marketing. They will also moderate what the words say by using another AI to determine the meaning of the words in the photo. Does it have a positive or negative connotation? Are you supporting a particular cause?
Thanks to TV shows like NCIS and CSI, we're used to the idea of face recognition to detect and identify a face in an image or video. With face recognition, platforms are tracking people that you associate with frequently in your photos. Also, face recognition can go beyond just identifying the person and analyze the face as well. With some accuracy, platforms will determine your sex, age, mood, whether you are smiling, wear glasses, have a beard.
Do you remember the #tenyearchallenge on Instagram? While no one knows who started the viral sensation, some suspect that it might have been started by Facebook to train an AI to recognize people ten years in the past. If correct, Facebook's AI can now track what people have been doing for the past decade on their platform.
Scenary and Object Recognition is the idea that a computer can describe what it sees. Platforms effectively train computers to learn enough specific objects, or things, to have a good idea of what is in the photo and what the surrounding scenery is.
Usually, training computers to identify things will require processing millions of photos of the same type of thing. This training is a daunting task, so platforms use other ways to have the public or their users help them. Have you used Captcha on a site that asks you to select the photos with something in the picture? That's you assisting the platform to train their AI in identifying that thing. Have you noticed that Google's Captcha has a lot of things you would encounter while driving (i.e., buses, crosswalks, traffic signs, or traffic lights)? These type of Captcha tests have some people thinking that Google is working on training an AI for a self-driving car.
To get an idea of how platforms use the information, we don't need to look any further than Facebook. In 2019 there was an image server outage at Facebook that revealed what personal information their AI found in the images. For a brief moment, we could see how Facebook tags photos with the personal information it collects.
Platforms detect explicit content, suggestive adult content, violent content, or visually disturbing content in your photos or videos. Identifying unsafe or shocking content helps platforms moderate content or filter search results that might not be safe for all audiences. With a combination of detecting nudity or sexual content with face analysis, platforms will identify if the "unsafe" content includes a minor.
There are four main reasons, in my opinion, that platforms will use image processing on the photos you upload.
Collecting the data and using it for advertisers or selling it to third parties is probably most disputed, both socially and legally. Some people are okay with platforms collecting the data to serve them ads while others are not. Some platforms have lost lawsuits because they collected data that the user did not consent to be collected, especially around facial recognition.
Data collection can go beyond just targeting you for advertisements and be used by the government or law enforcement in active investigations. Take Macy, who was falsely arrested for murder after the police got a geofence warrant and found his phone in the area of the murder. Six days later, he is released from jail, but not after losing his job and reputation due to the publicized high profile arrest. It's unfortunately too easy for law enforcement to get geofence warrants. Often, a gag order also accompanies the geofence warrants preventing platforms from notifying their users. So there is no way of knowing if your data is apart of an investigation or if law enforcement has it.
In my opinion, we should start demanding more data transparency from these platforms. We don't need to know how the platforms collect the data, but we should be aware of the information that they are gathering and how they are using it. It should not take a bug or an outage to get a glimpse of that data to be fully aware of the data associated with the images we upload. Data transparency will allow users to make informed decisions on the types of photos uploaded to the platform or if they are willing to use it.
Photoprivy is dedicated to data transparency and created the Your Data page to inform you of what we collect and how we use the data.
As a side note, if you're interested in some quick highlights on machine learning, here are two great videos by CPG Grey that I love.