Human Parity in AI: Is it the end of Us?

Artificial Intelligence is a fast-emerging, much-hyped technology that, if you believe what you read, has equal parts magic-dust that will fix all our problems, and poison that will take all our jobs away in the next few months. Much of the latter stems from the hype around news articles, researchers, and tech companies claiming “Human Parity” in AI over the last few years. So, is that it? Do we all hang up our proverbial coats and slink softly into the sunset with our proverbial tail between our legs while our creations become our masters?

Not quite.

This perception comes about from the fear that occurs when we perceive a threat. But there are two very clear reasons why there is no reason to panic.

We’ve seen this movie before

The first is rather pedestrian: this is not the first technology advance that threatens to shake up jobs; it has happened many, many times throughout human history.

The printing press, Jet Airplanes, the Personal Computer… A classic example raised by the McCkKinsey Institute is the move from predominantly horse-oriented transport in the US to the automobile and internal combustion engine. Without doubt, at the time, horse-trainers, stablers, feeders, and vets felt threatened by the emergence of this new super-technology the way we feel now about AI. But history showed in this instance (as in every introduction of a technological leap before or since) that two effects come into play: firstly, there is almost a like-for-like jobs equalization over a medium-term. As the number of horse-shoe manufacturers dwindles, the number of car-tire manufacturers increases. According to the McKinsey Institute, in the United States between 1910 and 1950, approximately 623,000 jobs were lost to the horse transport industry, but approximately 7.5 million net-new jobs were gained in the automobile industry.

Secondly, the off-shoot industries enabled by this technology created many times more jobs – ten times more, according to the McKinsey report. By 1950, 11% of the civilian working population had a job in direct, manufacturing, enabling or utilizing industries.

This pattern plays itself out many, many times in history – each time a potential “human parity” technology is invented and becomes prevalent. The Personal Computer in the 70’s and 80’s threatened writers, secretaries, accountants, etc… – but McKinsey estimates that the Personal Computer resulted in 15.8 million net new jobs since 1980 in the US, even accounting for displaced and lost jobs.

And no one can argue that the personal computer revolution has not made many, many more industries available that we couldn’t have dreamed of in the 80’s. I mean, you’re probably reading this from a tweet.

Human Parity or Human Porosity?

So, is AI special? Why is “Human Parity” such a big thing? It stems from that aspect of AI that makes it so extraordinary: it is (or appears to be) good at things that only humans are good at. Those “fuzzy” tasks that traditional computer algorithms fail at. A classic example is writing a caption for a picture: “Two men, standing in front of a train in the sunlight”. When a machine does this, we are bamboozled and immediately extrapolate from this one, narrow task to Generalised AI that can Take Over The World.

But as soon as you scratch under the surface, it becomes apparent that not only is this Human Parity extremely narrow, but it is also very, very hard to define.

Face Identification

One of the most controversial aspects of AI is Face Identification (identifying who someone is from a picture, commonly referred to as Face Recognition by humans or 1:N Lookups by machines when discussing when to take over the world).

Modern Algorithms can produce error rates of just 0.08% on Face Identification – this appears to be an extraordinary achievement, and certainly better than my efforts when running into someone I met a few years ago. But there some issues with classifying this.

Comparing to humans: labelers make mistakes

The first issue we encounter is that the “ground truth” (test images) was created by humans – and humans make mistakes. The tests are run against labeled databases of images of faces. But, having been produced by humans, there is a built-in error rate. The NIST in their report attempt to quantify the error rate introduced by labeling error, and it comes close to some of the best-performing algorithms. In other words, if you remove the labeling error, then the error rate occasionally drops close to zero.

Comparing to humans: what about False Acceptance and False Rejection rates?

The next issue that we encounter has no human parallel, and that is that you can (and have to) set parameters on the algorithms to give you what you want. Here is an example: when you ask most AI algorithms to recognize a face, it will always return results (actually a list of results) – which means that you will always get a false positive when showing it a face that doesn’t exist in its database. But, it will tell you how “confident” it is in that identification – 85%, for example. To get around this, we can choose the confidence that we’re happy with: 85% is not an identification, 99% is. The higher the number, the fewer “false positives” we will have (people identified that aren’t in the database or the wrong person), but the more “false negatives” (where the algorithm actually has identified the right person but at too low a threshold for us to accept) we will have.

We can compare Face Recognition algorithms by setting the False Acceptance rate to a number, say 0.001, and then comparing the False Rejection Rate.

But what does this mean when we try to compare to human perception? We can’t evaluate or compare algorithms without this trick, but there is no equivalent in humans that we can compare to.

Comparing to Humans: Database size

Further issues are encountered when we try to figure out how this works against the number of faces you’re comparing against. Identifying a person from a set of five people is considerably easier than identifying a person from a set of a million (for humans and machines alike). But what does it mean if humans are better at identification across small databases? According to a study published in the Proceedings of the Royal Society B, the average person can identify (recognize) 5000 faces. This seems like a lot, but the NIST FRVT is performed against a database of 12 million faces. So if humans are better than machines, but machines can search for more faces, does this mean we are better or they are better?

Comparing to Humans: Use Case

This critical question leads to the most important observation: how does performance vary across situations? The FRVT shows significant degradation in performance on Face Identification when you move from high-quality, studio photographs to low-res, low-quality webcam images or images “in the wild” of people in real-life situations. But so do humans. Although the human visual system is very good at filtering out the noise, bad lighting, and other factors, and stitching together images from live experience (video-like experience) to average out imperfections, much of this is a trick of focus and removing things that don’t matter, instead of improving the results.

This implies that whether machines or human beings are better at something like face recognition depends as much on where, how and why you are applying the technique as it does the accuracy. A human being searching through 12 million images in a database would certainly never succeed, but humans watching sports fans streaming through the gates at a stadium looking for a specific person that they have known their whole life would significantly out-perform AI algorithms

So where does that lead us with Human-Parity AI?

The inescapable conclusion is two-fold. Firstly, AI is a new technology and, as we have learned from many instances in history, it will be feared, embraced, rejected, then stabilized (along the hype cycle curve) and end up creating many more jobs (and entire industries) than are ever lost in the short term.

Secondly, although in some very specific circumstances and by scoping the test to show a very specific result, AI algorithms are able to out-perform humans in some tasks – in real-world scenarios the real story is really more pedestrian and akin to the rise of personal computing in the 80’s: some tasks will be made easier, providing we choose the use-cases well, and some will not.

Conclusion

Lastly, we are very, very far away from the nightmare scenario of Terminator and other movies showing Generalised AI becoming as good as humans in a wide, diverse range of tasks – if we will ever meet it.

So are machines on Human Parity? Perhaps. I would argue that this can be better framed by asking the question “Are we able to compare parity across humans and machines, even in the narrowest of tasks?” Again, my answer would be “perhaps”, although this time with a more dubious tone of voice.