Facial Recognition: You Cannot Hide From The Machine


As humans, we normally recognise each other from faces. Our brains thus have a core ability to quickly make sense of a face, and abstract out key features. Someone’s height, their voice, their clothes, their gate, and even their smell might also help us with identifying a person, but at the core of our recognition of other people is their face. Their face is their brand, and our brain is able to store abstract representations of faces and keep these for instant recall. A major challenge with facial recognition is that a face may be at different angles to our eyesight and have different lighting, and our brains must thus remap into a simpler model of the face. To store many different images of someone’s face would be too costly in terms of storage, and would not have a great success rate for matching against the pixels in an image. Thus we extract out the key features of a face and then can match against that.

While facial recognition has been around for many years, it is only recently that it has become mainstream such as within mobile device authentication and in image recognition within video and image analysis. Our faces, though, will change over time, but some feature such as the length of our nose will vary less than our hairstyle. The challenge for a cognitive engine is to still recognise a face, even with a different hairstyle, or with or without a beard.
The human brain is thus able to abstract faces from key features, even when many of the features are different from the remembered version of the face. We can still recognise a friend instantly, even if they have grown a beard and cut their hair. This, though, is a challenge for a computer, as it needs to normalise faces to its stored model, and then determine the errors between the current face and the face it has stored. For automated login systems, such as for smartphones, the time that the device has to make a decision, is often less than a second, otherwise, the user will lose trust in the recognition method.

The key elements of biometrics are:

  • Universality. This defines that the method will be available for most users.
  • Uniqueness. This defines that the method must be unique across the population.
  • Permanence. This defines that the method does not vary over time.
  • Collectability. This define the ease of which the information can be gathered.
  • Performance. This relates to how accurate the method is for a range of situations.
  • Acceptability. This defines how acceptable the method is to a user.
  • Circumvention. This defines how easy it is to trick the system.

Behavioural biometrics

Within any biometric system, we identify success rates with true-positives (the guess was correct), and false-positives (the guess identified the wrong person). This then defines the error rate, and the higher the error rate, the lower that user confidence is likely to be. Fingerprint recognition often achieves rates of better than 3%, and uses patterns within the finger (such as arch, loop, and whorl) and minutia features (such as ridge ending, bifurcation, and short ridge). Face recognition in smartphones now has rates which can match fingers, as it uses many features of the face, and can learn these over time.

What is in a face?

The key features of a face that we can gather:

  • Traditional features. This includes the size and shape of eyes, nose, cheekbones, and jaw.
  • Three-dimensional mapping. This includes a 3D mapping of the face in order to determine its general shape.
  • Skin texture. This defines the distinctive elements of the skin, such as lines, patterns, and spots apparent. This type of analysis is often used in age determination, as age lines change our face over time.

In terms of an evaluation we can define a target user and an imposters and legitimate users. A biometric system which allows imposter to impersonate a legitimate user is obviously a security risk, while a biometric system which identifies a target user as a legitimate user is likely to reduce user trust. The main evaluation methods are then:

  • False acceptance rate (FAR). These defines the percentage of times that the system accepts an imposter as the targeted user.
  • False rejection rate (FRR). This defines the percentage of times that it identifies a target user as an imposter.
  • Receiver operating characteristic (ROC). This illustrates the trade-off between the FAR and the FRR.
  • Equal error rate (EER). This defines the point where the percentage of false acceptances is equal to the percentage of false rejections. Within a ROC curve we can find this at the point at which they cross-over, and we aim at a low value of EER as possible.
  • Authentication accuracy. This defines the percentage of identify a user correctly against both imposters and legitimate users.

Analysing a face with a cognitive engine

In Figure 1 we see a face can be split into key distinguishing features which identify the brows, eyes, nose, and mouth. In order to simplify things we can then represent the brows with a start and an end point, the eyes as a start and end, and points around the turning point of the top and bottom of the eye, the nose by top and bottom and points in between, and the mouth as a start and end point, and two points in the middle. In the end, we can then make measurements around the facial landmarks, and also between them.

Figure 1: Face mapping

We can use the Microsoft Cognitive Engine and can run the code given in the Appendix. This will return the landmarks and try and make sense of the face. If we take an image of the Barack Obama we see that it has detected the face and the key facial landmarks and within a bounding box:

Figure 2: Face landmark mapping

The result comes back as a JSON object, and which defines the key face landmarks. It identifies the face within a face rectangle, with markers for the brows, eyes, nose, and lips (Figure 3). We can also see other things returned which identify other key features of the face, including hair colour, emotion, whether wearing glasses, the pitch, roll and way, and even an estimate of the age and gender of the person.

Figure 3: Returned values for Microsoft Cognitive Engine

Detecting faces

So let’s take an example of detecting faces. In order to train the classifier, the algorithm requires a good deal of positive images of faces and negative images without faces. The system then extracts features from these using the gray scale version of the image. Each feature becomes a single value determined by subtracting the sum of pixels under white rectangle from sum of pixels under black rectangle:

Figure 4: Edge and line features in the face

We thus classify into: edge features; line features; and four-square features. David Cameron’s hair line looks very much like an edge feature, while his mouth is more like a line feature. To find features we analyse for the changes in the images, such as from dark to light. For eyes we see that the eyes are often darker that the bridge of the nose, so we can look for dark -> light -> dark:

Figure 5: Detection of eye region

and so when we run the classifier, we get:

Figure 6: Face and eye detection

where you can see it has detected David Cameron’s face and eyes, but it seems to think that we has an eye in his mouth (and ear). If we apply a cat’s face, the detection is poor as it will try to detect human faces:

Figure 7: Face classifier on a cat image

but there are a whole lot of filters which can be used, including a cat face detector. So when we use this, the cat’s face is detected:

Figure 8: Face classifier with a cat classifier

And finally we can try and detect a smile. For this we add the XML training data, and then try and detect an aspect ratio which represents a mouth (long and thin) and with a minimum size:

smile = smile_cascade.detectMultiScale( roi_gray, scaleFactor= 1.7, minNeighbors=22, minSize=(25, 25) )

The following outlines the Python code used:

import numpy as np
import cv2
import matplotlib.pyplot as plot
import sys
imfile = 'http://img.sxsw.com/2015/films/F52361.jpg'
imfile = 'F52361.jpg'
file='111.jpg'if (len(sys.argv)>1):
if (len(sys.argv)>2):
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
img = cv2.imread( imfile)gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
for (ex,ey,ew,eh) in eyes:
file1 = "filename"

and the smile part is:

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
smile_cascade = cv2.CascadeClassifier('smile.xml')
img = cv2.imread( imfile)gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
smile = smile_cascade.detectMultiScale(
scaleFactor= 1.7,
minSize=(25, 25)
for (ex,ey,ew,eh) in smile:

Demo: https://asecuritysite.com/comms/face

Pitch, yaw and roll

For example, if the width of the mouth is measured at W, and the face is at a 45-degree angle to the camera, then the actual length of the mouth is W/cos(45°), as illustrated in Figure 9. Basically, though, we have three-dimensions, this if we measure the angle for pitch as (_y), for yaw (_x) and roll (_z), the conversion is then:

Width = √[(W/cos(_x)² )+ (W/cos(_y))² + (W/cos(_z))²]

Figure 9: Normalising to a standard model

Emotion, hair and other features

Link: https://asecuritysite.com/comms/faceapi

Figure 10: Additional features from face recognition

In this case, we see that he had hints of a mustache (0.1), a beard (0.1) and sideburns (0.1), but no glasses or make-up have been detected. For hair, we see confidence metrics on baldness, hair colour (black, gray, brown, blond, red or other). But it is in the deeper analysis of age, gender and emotion that we see a deeper degree of machine learning in the analysis. In this case, the cognitive engine has determined his gender correctly (male) and has a good guess at the age of at the time of the photograph (54). It has then detected his emotions within the photograph. These include an attribute for the smile (0.371), and for the emotions of anger, contempt, disgust, feature, happiness, neutral, sadness and surprise. Overall these were defined by the facial action coding system (FACS) [1], and which includes 44 Action Units (AUs) to describe facial changes. For a smile we have a Cheek Raiser (AU 6) and Lip Corner Puller (AU12).

Figure 10: Cheek Raiser and Lip Corner methods for smile detection [1]

In many applications of AI, the detection of a smile can be important. This might relate the detection of patient happiness or in a positive effects on watching a movie. Normally smile detection is seen as a binary classification, where someone has a smile or not. In many cases, SVM and AdaBoost work well in binary classification problems, and has been used extensively within smile detection, but can sometimes struggle to produce a result in real-time. Overall, Extreme Learning Machine (ELM) has produced good results with a low costing in computation.

There are two main methods involve either a geometric-based approach or an appearance-based approach. With the geometric approach we look at deformation within the facial geometry, such as:

  • The percentage of teeth shown as related to the size of the mouth.
  • The normalized, average minor/major axis diameter of eyes.
  • The upper/lower lip curvature.
  • A check on whether there is a cheek fold. This is an effect where the line around the nose connects with the check.
  • A check on whether the forehead has wrinkles, as this can identify anger, and where the brows will be lowered.
  • The vertical distance from inner tear duct to eyebrow.

For appearance-based approaches we look at the dynamics of the facial features, and thus involves a time-based method.

Weaknesses of facial recognition

In many applications, though, machine learning is used to perform an initial filter of “faces of interest”, and then when can be checked by a human operator. A world where computers could recognise faces with a high success rate, and then match to other data, might be a world which would be unacceptable for many people.

While deep learning methods have been successful in improving face recognition, there is a worry that many of the faces that are being used for the classification are typically from white people. Success rates can thus be poorer for those from other ethnic backgrounds.

A core worry in many deep learning methods for facial recognition is that it is difficult to know how the algorithm has come to its findings, especially when it is fed with bad data. In the UK, it was found, for example, that pictures with sand dunes in them were classified as obscene images. The problem can be overcome using guided back-propagation, and where we can understand the key features used in classifying faces. Adversarial methods can also be used against the machine learning method, if the adversary knows how the algorithm is classify, and could mock up images which target key classifiers for a positive identification.

Facial recognition for good and bad?


[1] A P. Ekman and W. Friesen,Facial Action Coding System: A Technique for theMeasurement of Facial Movement. Consulting Psychologists Press, Palo Alto,1978

Professor of Cryptography. Serial innovator. Believer in fairness, justice & freedom. EU Citizen. Auld Reekie native. Old World Breaker. New World Creator.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store