Why is FRT unreliable and discriminatory?

Illustration by Alina Najlis - © 2025 INCLO

Probe image

The main input of facial recognition systems used by police forces is composed of people’s facial images that serve as probe images that police will attempt to identify. These images include faces depicted from different angles, forms, quality, pixelation, etc. and, as such, can lead to a diversity of harms ranging from live systems being fed multiple faces as they are captured in real time with almost no human intervention to carefully human-edited probe images that may be, but are not always, used in retrospective systems.

The key point here is that, in reality, facial imagery comes from many different sources and involves, at the very least, people holding many different poses, people being captured in very different lighting conditions, and faces being obscured by varying degrees due to the position of the camera, clothing or sunglasses, etc. These factors can affect the probe images:

  1. Pose: The position of the face in the probe image affects the attempted matching process.
  2. Illumination: The ambient lighting may affect the texture patterns detected in a face, including the possibility of highlighting or playing down certain traits or altering the shape of the face.
  3. Expression: Mainly impacting facial emotion recognition, the configuration of the large collection of muscles we have in our faces may also affect identification and verification.
  4. Occlusions: The use of accessories or other ways to cover part of the face, such as clothing, masks or hands, can result in part of the facial information not being available for facial recognition purposes. This may also affect the information that can be captured by altering, for example, the symmetry of a face.
  5. Imprecisely localized faces: Whereas the factors above refer to the environment where the person is, this one points towards the flaws of systems when conducting facial detection or when delineating facial features from a picture or video, making it difficult to determine where a face or a feature starts or ends.

Reference database or watchlist

The unidentified probe image is compared against a set of identified images (reference databases or watchlists) in an attempt to find a “match”. A significant factor in the system’s performance will therefore be the quality of the images in these sets and the accuracy of the information contained in the database against which probe images are compared. This means that a match depends not only on the quality of the probe image, but also on the accuracy of the details contained in the reference database or watchlist as to the targets’ identities. For example, outdated lists of fugitives in a reference database may lead to wrongful arrests.

Training dataset

Other than the probe images and the database images, there is a third set of images that needs to be considered and is accounted for in these principles: the dataset. This is the set of images that are fed into the system during its training phase, before the system’s deployment and use by the police. This training phase aims to teach the system to recognize patterns (in this case facial features) on the basis of thousands of images of people’s faces. This stage also aims at increasing the system’s robustness to overcome aforementioned challenging conditions that can affect the reliability of the system, such as images with partially covered or obscured faces.

Understanding the training process and the datasets involved in this training is key to understanding the source of some of the harms associated with FRT systems. The very nature of training datasets may be the source of the bias reflected by FRT systems via the following causes:

  • Capture bias: related to the origin of the pictures. Affected by both the device used and the collector’s preferences, and related to factors that affect reliability mentioned above such as lighting or position of the face.
  • Category or label bias: related to ambiguity and vagueness in our visual semantics and deriving both from similar images being put in different categories and, on the other hand, diverse images falling into the same category.
  • Negative bias: related to the part of the visual information left out of the dataset due to a focus on particular features.

Functioning in real life

Facial recognition systems operate in real-life scenarios and do not reflect laboratory conditions. Flaws in the datasets, reference databases/watchlists and probe images result in limited accuracy, which can lead to huge fundamental rights violations. Accuracy figures put forward by vendors of these systems and supporters of FRT are also very often based on pristine laboratory conditions and scenarios involving the comparison of high-quality, clean images with perfect lighting, where people’s faces were perfectly captured in images used for visa applications and mugshots, with very similar clean, high-quality and controlled images. However, advocates fail to mention the following shortcomings:

  • Limited technical literacy of police forces as final users:

As with other digital technologies, and more so with AI-based tools, a careful understanding of how a system works, including its failings and limitations, is paramount to mitigate risks and harms. When considering a deployment, training and procedures are key elements of the system.

  • Distorted performance metrics and figures:

As contested tools, these systems will be accompanied by metrics and figures to try to legitimize their use by either police forces or vendors of the systems. These figures and metrics should be robustly interrogated and questioned by the public, media and civil society, taking into consideration at least two issues: 1) some metrics and figures may be the result of ideal or laboratory-scenario testing as opposed to real-life settings, and 2) some metrics and figures may be in respect of a specific use case of FRT which may not be relevant to, or reflective of, the proposed use case under consideration by a police force or state passing legislation for FRT use.

In the interests of democracy and transparency, citizens and residents of a state must know what tools are used to monitor them. For a hugely controversial technological tool such as FRT, there must be robust public scrutiny regarding how the tool’s accuracy figures are measured and the conditions within which the tool is tested.

  • Multiplicity of actors leading to a chain of cumulative flaws:

Many different actors are involved in the development and deployment of facial recognition systems, at different points in time and with diverse interests that may affect the system’s final performance. Some of those actors, such as academics developing an algorithm, may be unaware of its future use.

Some other actors involved in these systems may not have the public interest as a priority. For example, data brokers (whose business is collecting personal data to sell it, or its use, to third parties) could provide training datasets, a tech provider could implement an algorithm in a commercial tool or a contractor could offer an “integral solution” (that includes parts acquired from third parties).

Finally, the policy makers who may play a role in the design of a system, the data controllers responsible for the image databases used and the police authorities who are the final users of the systems are some of the other human actors involved in the development and deployment of facial recognition systems. They may lack the technical training to understand the implications of the system they are trying to regulate.

Having a deep understanding of the diverse actors involved in the development and deployment of facial recognition systems, from their creation to their use, is of significant value for understanding why the use of FRT by police is so problematic for the protection of fundamental rights. Such knowledge is also valuable for better understanding where and why issues regarding transparency and accuracy arise.