It's also worth noting that IBM's original intentions may have been rooted in preventing AI from being biased against certain groups though - when it announced the collection in January, the company explained that it needed such a large dataset to help train for "fairness" as well as accuracy.
NBC reports that IBM says it will remove images from the data set at the photographer's or the photographed person's request-but they haven't made the data set public, so there's no way to see for sure whether you are actually in there.
The photos were not originally collected by IBM, but former Flicker owner Yahoo had put together the collection for research purposes.More news: Rosario Dawson confirms relationship with Cory Booker
The company extracted almost one million photos from a dataset of Flickr images originally compiled by Yahoo.
Using Creative Commons licensed pics to train image recognition tech is, thus far, legit and Big Blue doesn't appear to have breached any rules. According to NBC, which was able to view the collection, these have all been annotated according to various estimates like age and gender, as well as physical details - skin tone, size and shape of facial features, and pose.
But while IBM was using perfectly fine Creative Commons images, the company hadn't actually informed those whose faces appear in the nearly one million images what their actual faces, not just images, were being used for. Many Flickr images are published under Creative Commons licenses, and academic researchers have limited responsibilities in how they source images due to the non-commercial nature of their work.More news: Murdoch's News Corp wants Google split up in Australia
"None of the people I photographed had any idea their images were being used in this way", Greg Peverill-Conti told the news outlet. It is hard for people to know whether their photos were used for the research as IBM keeps the dataset private. "It seems a little sketchy that IBM can use these pictures without saying anything to anybody". IBM requires photographers to email links to photos they want removed, but the company has not publicly shared the list of Flickr users and photos included in the dataset, so there is no easy way of finding out whose photos are included. That's a shitty way to handle user data for research.
Mashable has reached out to IBM for comment.More news: Arsenal's 2-goal Aubameyang explains mask mystery