Select Page

What is voice biometrics?

Dive into the voice biometrics world with an essential guide with which you can understand what is behind a technology that has advanced by leaps and bounds lately. 

Thanks to IA advances and natural language processing, voice biometrics have become a valuable tool for a wide range of use cases.

 

Biometrics allows corroborating the identity of a user through the analysis of one o more physical features such as the face, voice, fingerprint, iris, vein pattern or behaviour such as signature, gait, and interaction with mobile applications.

Each of those features, also known as characteristics, has its peculiarities and, when chosen, must keep in mind the security and functionality of the system where they are integrated.

But what is voice biometrics?

Voice biometrics is a technology that identifies and authenticates users by their voice. This technology considers that the human voice is unique, and each person has a distinctive frequency pattern and features in their voice.

This technology is simple: a voice shape is recorded and afterwards analysed to get a set of features to identify the person. Rhythm, pitch, frequency, and timbre are some of the characteristics used in voice analysis.

Voice biometrics systems use machine learning algorithms and voice sample databases. These algorithms analyse and compare the known voice features with those of unknown ones. If the match between the voices is enough, the system can confidently determine who is speaking. 

Proper training of the models with vast voice samples during registration is critical for getting a quality voice biometrics template

It is said that voice biometrics can lead to doubts due to voice changes by illness, fatigue, or the passing of time. Nevertheless, the use of deep learning technology and the expertise of engineers can overcome any challenge and identify a user’s voice in different use cases.

The most common cases are identity verification in a security system and user authentication in call centres, such as the success story of Santalucia. 

(Table of contents)

Terminology

Logical access. An indirect attack method. It aims at vulnerabilities in the code or hardware of a biometric system. (Source: Antispoofing.org)

Multimodal biometrics: the usage of multiple biometric indicators by personal identification systems for identifying individuals.  (Source: IEEE Xplore)

Identification (1:N): search against a biometric enrolment database to find and return the biometric reference attributable to a single individual. (Source: Biometrics Institute)

Physical Access. A direct attack method, during which an attacker directly interacts with the sensor of a biometric system. (Microphone). (Source: Antispoofing.org)

Signal processing: involves converting or transforming data in a way that allows us to see things in it that are not possible via direct observation. (Source: DEWEsoft)

SNR: or signal-to-noise rate is a measure used in science and engineering that compares the level of a desired signal to the level of background noise. SNR is defined as the ratio of signal power to noise power. (Fuente: Wikipedia)

VAD: is the detection of voice activity. Audio signal processing algorithm that automatically finds regions where speech content is present. (Source: Mobbeel)

Verification (1:1): confirm aun identity claim through biometric comparisons. (Source: Biometrics Institute)

What is voice recognition?

Voice recognition is a technology that enables a computer or an electronic device to identify the pronunciation voice of a person. It has several uses. The most important are text input on mobile devices, automatic transcription of talks and meetings, smart home devices control and information services access by phone. It is also used for virtual assistance allowing the user to interact with voice systems through voice commands. 

Voice recognition is based on voice biometrics. In other words, it is based on the measure and analysis of unique features of a human voice. Although they use similar technologies, they are not the same. They do not share the same goal. In a nutshell, voice biometrics distingue one person’s voice from another, focusing on verifying that individual’s identity. In contrast, voice recognition eases communication among users and digital devices. 

Differences between voice biometrics and voice recognition

Voice biometrics use unique characteristics of a person’s voice to trustworthy identify them. It is used to authenticate a user in a system to protect individual privacy by ensuring only they can access sensitive information or carry out specific actions.

On the contrary, voice recognition turns speech into text to be processed with a computer or digital device. IA assistants like Google Assistant, Alexa, or Siri use voice recognition. These systems allow users to perform actions and obtain information using voice commands. 

In short, voice biometrics identify people, while voice recognition processes speech and turns it into text that a computer can understand. 

As mentioned, voice biometrics systems are used in security environments. The main applications relate to users’ identity verification when logging in, approving financial transactions, and accessing call centre services. Contrary to voice biometrics, voice recognition is focused on personal assistance apps such as digital assistants and in the health industry to speed the recording of medical information.

Download the full voice biometrics guide in PDF

Download the full voice biometrics guide in PDF

Advantages of voice biometrics

Biometrics technologies based on voice analysis have plenty of benefits and are widely used in many industries. Here are the main reasons for their usage: 

  • Strength: the speech signal is quasi-stationary when analysed at short time intervals. It allows extracting features with high discriminative power to differentiate among individuals. 
  • Low intrusiveness: unlike other modalities of authentication, which require user active collaboration or specific hardware, voice biometrics only need audio capture through a microphone.
  • Availability: recording a voice clip is considered a standard identification method. People are used to this kind of method. They are not deemed unusual. Furthermore, biometrics systems that analyse the human voice are easily integrated into call centres, voice assistants, and mobile devices.
  • Secure voice authentication: voice is a unique biometric feature that can be used to authenticate a person’s identity securely. 
  • Easy to use: it is only required to speak to a microphone for the system to recognise a voice.
  • Accessibility: it is beneficial for disabled people with writing problems or difficulties using tactile devices. 
  • More convenience:  it enables users to perform tasks without typing or using touch devices.
  • High efficiency: it allows users to perform tasks faster and more efficiently, as there is no need to type up.
  • Accuracy: voice biometrics can be more accurate than writing or touch devices, preventing errors and increasing productivity.

A brief history of voice biometrics

Voice biometrics technology has significantly evolved in the last decades. Here are some of the most important milestones of its evolution: 

  • In the 1970s, researchers started investigating the possibility of using voice as an authentication method in this decade. Nevertheless, many technological limits stop the development of practical solutions. 
  • In the 1980s, the first voice recognition systems appeared. These systems used signal-processing techniques to identify users’ voice patterns. However, they were very rudimentary and had a high error rate.
  • In the 1990s, voice recognition systems became more accurate due to advances in signal-processing techniquesFurthermore, machine learning techniques started to be used to train voice algorithms. 
  • In the 2000s, expert companies developed voice biometrics systems that used numerous voice features, such as intonation, rhythm and speed, improving recognition accuracy. They also began to use emotion analysis techniques to detect fraud.
  • In the 2010s, companies developed voice biometrics systems that ran in real-time, making them ideal for mobile and online security applications. 

Voice biometrics technology keeps currently evolving, and companies are exploring new techniques to improve accuracy and efficacy. For instance, researchers are testing systems that detect illnesses through voice, such as Parkinson’s disease and depression. Furthermore, voice biometrics technology is expected to integrate with other authentication modalities. The integration with other biometrics, such as fingerprint or facial recognition, will enable to provision of multimodal authentication solutions that are highly secure. 

Voice biometrics use cases

The human voice is part of our daily. It is considered a mechanism that accesses services, apps and devices due to their ease of use. Some examples of their use are: 

  • Identity verification: many security systems, like login systems in mobile devices or online services, use voice technology to verify the identity of a user. These systems can compare a user’s voice with the voice input kept as a voice print and determine if the person who wants to access the system is the same that the reference sample.
  • Travel safety: some airports and mobility companies use voice biometrics to verify the identity of passengers and employees. It ensures that only authorised people get into planes, trains or buses.
  • Emotions and mood analysis: this kind of technology is able to check individuals’ emotional states. As a result, technology can determine if a person is exhausted, upset, or full of joy. 
  • Personal support: virtual assistants often use voice biometrics to recognise users and adapt user experience. That is to say that a digital assistant can use voice technology to know the user’s mood and adapt its answer and action to their mood. 

MobbID provides an immersive call experience for Santalucia’s customers

Voice biometrics is frictionless and highly secure and improves the user experience by simplifying the interaction processes between customers and agents in call centres.

The number of policyholders engaged with their insurance company is at most 19%, according to the STIGA Customer Experience Rate.

Santalucía, a Spanish insurance company, exceeded this rate by incorporating voice biometrics into its call centre processes. 

The company found that one of the channels most used by its customers was the phone. Nevertheless, the customer service process needed to be optimised and fully adjusted to the needs of policyholders. 

The company implemented Mobbeel’s voice biometrics into its call centre service to optimise this. Now its customers can identify themselves through their voice when calling the call centre by saying a Spanish fixed phrase previously registered in the system: En Santalucia, mi voz es mi contraseña“. 

When the customer calls, they say this sentence and the call is automatically passed on to an agent who does not have to identify the user. Our solution frees agents from time waste placing customers and improves the customer experience by removing frustration associated with cumbersome identification processes.

As a result of implementing Mobbeel’s voice biometrics technology, Santalucia achieves a 39% satisfaction rate, becoming a leading company in customer satisfaction.

Operation of a voice biometrics system

A complete voice biometrics system must be able to verify a person’s identity through their voice and guarantee that there are no attack attempts to hamper its operation mode. 

The ISO/IEC 30107-11 define an architecture adopted by Mobbeel, providing the highest level of trustworthiness when verifying user identity.

Operation of a biometric system

There are essential elements for any voice biometrics system to work:

  • A microphone: to capture the human voice. 
  • A signal processor: to convert the audio signal into a digital form that a computer can process. The most critical part is the voice modelling resulting in an attributes vector of the voiceprint. 
  • Voiceprint databases: to compare the unknown voice with the known voiceprints stored in the databases.
  • Decision-making and user interface: to allow the user to interact with the system and give results after the decision-making about the user identity or carry out some actions once identified. 

When speaking to a voice-biometrics microphone, the signal processor captures and converts the audio signal into a digital form. After that, the signal processing algorithm compares the audio with the voice models stored in the database. The system can confidently determine who is speaking if there is enough match between the unknown and known voiceprints. 

Capture system

The first step within the identity verification biometrics process is to capture the physical or behaviour attribute object of analysis. In the voice biometrics process, an audio clip with a sentence pronounced by the user is recorded.

The system developed by Mobbeel offers high versatility being able to identify the speaker regardless of the language they speak and the type of phrase, whether fixed or free. 

It allows for implementation cases where the user is asked to pronounce a fixed text or identifies them automatically based on their natural speech.

Its design is also suitable for multi-channel use as it supports audio from the telephone channel (fixed, mobile and IP networks) and high quality. Output results are related to the medium used since some channels, such as conventional telephone networks, use filters that remove signal information with consequences in the accuracy rate. 

As part of the capture module, the technology includes some algorithms in charge of assessing the quality of the input audio, determining if it meets the minimum conditions for performing biometrics operation. These checks analyse the audio quality (SNR or signal-to-noise ratio) and minimum voice content by estimating voice activity detection (VAD).

Detection of presentation attack

According to ISO/IEC 30107, a presentation attack is any attempt to interfere with the correct operation of the biometric system. A presentation attack can occur in the following situations based on Mobbeel’s expertise: 

  • Playing a voice recording of the individual to impersonate. This kind of attack is known as physical access attack (PA).
  • Playing audio generated with a synthetic voice. This type of attack is known as a logical access attack (LA).

Each element above is defined based on the standard as a presentation attack instrument (PAI). We can find another module definition in the standard, the presentation attack detection system (PAD). This module includes measures that analyse the input audio to find PA or LA signs without user active collaboration.

presentation attack detection

Signal processing

Once the voice signal is captured and checked that there is no attempt to presentation attack is the moment to go ahead with the identity verification process.

​​In the signal-processing stage, all operations focus on converting the input information to data that can be used for subsequent comparison.

The core element of this stage is vocal modelling.  

What is vocal modelling?

Vocal modelling transforms a voice recording into a mathematical representation known as a characteristics vector. These vectors shape the more significant voice features, so comparing some will make it possible to perform identity verification operations.

Vectors generation process

The vocal characteristics vector generation process starts with audio signal preprocessing. These signals have information about acoustic waves generated by an individual’s vocal tract. Preprocessing algorithms divide the voice signal into small regions over which frequency domain values are calculated.

Once the audio signal transforms into a set of input features, machine learning algorithms detect patterns in them.

Deep learning architectures refine the vector got in the preprocessing stage and achieve sturdy representations of each user’s voice biometrics. The vector is irreversible; in other words, it is impossible to recover the original audio.

process of vector generation

Comparison

It is possible to perform different biometric operations to determine the user’s identity from the characteristic vector obtained in the previous stage. Each biometric function seeks to answer a specific question; therefore, the choice of one will depend on the particular application.

Verification: is it me?

Given an input audio and an identity tag, the system must determine if the individual is who they claim to be by analysing the resemblance between the input audio and another characteristic vector stored in the system. This operation requires a pre-registration process called 1:1 verification.

 

1-1-verification

Identification: who am I?

Given an audio input, the system must determine the user’s identity by comparing the voiceprint with a voice samples database of individuals previously registered. This type of identification is known as 1:N identification.

The system’s accuracy can change depending on the input voice sample quality and the database samples’ quantity and quality. Furthermore, external factors such as environmental noise or mispronunciation of words can affect the system’s performance.

identification 1:N

Matching: are they the same person?

This step aims to determine the similarity between the two given input audios.

matching process

Decision making

The workflow of a biometric system ends with the decision-making about the user’s identity. The process is the same in any of the three operations described above: the technology calculates the distance between characteristic vectors. It returns a numerical value interpreted as the degree of resemblance between both. This data is critical for the decision. In most cases, the information must be expressed in binary terms: it is or is not the same person. For doing so, it is necessary to establish a decision threshold, in other words, to set a numerical value at which it is determined that two voice audios belong to the same individual.

 

Choosing the most suitable decision threshold is a key task since it has consequences for running the system. Values that are too strict may lead to many legitimate rejected users (usability or accessibility issues), and others that are too lax may enable not to stop too many impostors (security issues). The final choice should primarily consider the application’s security requirements where the biometrics module is integrated.

How to improve the operation of biometrics, avoiding fraud and presentation attacks?

Biometrics systems can avoid presentation attacks and attempts of identity theft by implementing the following measures: 

  • Having high-quality voice samples: to minimise breaches and identity fraud, it is vital to use a high-quality voice sample when registering a person’s identity. It means that the sample has to be sharp and free of noise and distortion. As commented, our technology includes as part of the capture stage algorithms that assess the audio quality as a filter in case not to meet minimum quality requirements.
  • Updating voice samples evenly: collecting modifications in a person’s voice is necessary. For instance, if a person suffers from an illness or their voice has changed due to stress or ageing, the samples must be updated to guarantee that the system can accurately recognise it. 
  • Using multiple authentication factors: in processes that require a high level of security, different authentication factors can be used to verify a user’s identity as 2FA. 

By considering these measures, it is possible to minimise breaches and risks of voice deepfakes, presentation attacks and voice impersonation.

Biases of voice biometrics technology

Like any other technology, voice biometrics can have biases. Some of the most common that may affect the accuracy of this type of biometrics are:

  • Databases: the most important bias comes from the database with which our model has been trained. For instance, we can find unbalanced databases regarding gender, language, or race. Therefore, it is essential to train artificial intelligence models with audio databases that are as heterogeneous as possible.
  • Accent: the technology may be less accurate for people who speak with an accent, as the system may have difficulty recognising words or phrases that do not correspond to the expected language or accent.
  • Pronunciation: the system may be less accurate for people who have pronunciation problems or speak unclearly.
  • Lack of data: if the system does not have enough voice samples from a given person or group, it may have difficulties in accurately recognising their voices.

It should be noted that different biases may appear depending on the provider and how they have developed their models.

At Mobbeel, we constantly work to minimise biases and improve our technology accuracy.

Ethical use of voice biometrics technology

It is vital to ensure that this technology is used responsibly, respecting the rights and privacy of individuals, so the following considerations should be taken into account when implementing a voice recognition system:

  • Transparency: it is essential to be transparent about how the technology is used for voice authentication and how voice data is collected and used. Individuals need to be aware of how their biometrics are being used and should be able to opt out at any moment. In that case, the system should offer an alternative authentication method.
  • Privacy: people’s privacy must be respected at all times. Voice audio files or biometric templates must be protected and should not be shared without the explicit consent of the individuals. We strictly comply with the GDPR.
  • Fairness: it is vital to ensure that technology does not exclude or discriminate against certain groups of people by avoiding bias as far as possible. For instance, if the technology is less accurate for people with an unusual accent or voice or people of a particular gender, the system should be improved to increase accuracy for these groups.
  • Responsibility: developers and users must be responsible for how the technology is used and its potential consequences, taking steps to minimise any potential harm or damage.

Therefore, these ethical considerations must be accounted for to ensure that technology is used responsibly and in a way that respects people’s rights and privacy.

How will biometrics technology evolve, and what will be its main applications in a few years?

Voice technology is expected to keep evolving and improving in the coming years. Some of these expected future trends and applications are:

  • Improvements in accuracy and adaptability: accuracy is expected to improve thanks to more advanced deep learning models and the increased data available for model training. In turn, trends indicate that the technology will be able to better adapt to individual differences in voice and usage environment.
  • Integration with other authentication systems: voice biometrics will be integrated with other authentication systems, such as facial and fingerprint recognition, to offer more secure and robust multibiometric authentication solutions.
  • Increased market adoption: its use will increase as it becomes a more accurate technology and more applications are developed. That could include voice biometrics use in financial, government, and healthcare services, to name but a few.
  • Focus on emotion and disease detection: this type of biometrics will find new applications related to emotions and disease detection, such as depression, anxiety, and Parkinson’s disease, enabling early detection of these conditions and more effective treatment.
  • Increased attention to privacy and security: it is expected that new security measures will be developed to protect users’ biometric data and that regulatory frameworks will be put in place to ensure the responsible use of this technology.

Regulations concerning the use of voice technologies

There are different regulations governing the use of biometric technologies in each country, addressing data protection, privacy, and security:

  • Personal Data Protection Regulation:  some laws regulate the collection, storage and use of personal data, including voice data, in many countries. These laws establish the rights of individuals regarding their data and set out the obligations of companies and governments when collecting and using this data.
  • Privacy protection regulation: many countries have specific rules that protect the privacy of individuals and regulate how voice data can be used and shared. These regulations include consent requirements for the collection and use of voice data and conditions to protect the security and confidentiality of voice data.
  • Security regulations: some rules establish requirements to protect voice data and reduce the risk of presentation attacks and identity theft. These requirements can include security measures such as passwords and two-factor authentication to protect access to voice data.

Regulations vary by country or jurisdiction, so you check them to comply with each market’s rules and regulations.

 

Download the voice biometrics guide in PDF

Download our voice biometrics guide and learn all about this technology, its advantages, applications, and how it will evolve in the following years

​​What will you find out in the voice biometrics guide?

  • Know in depth what voice biometrics is.
  • Discover the difference between voice biometrics and voice recognition.
  • Dive into its benefits and commercial uses.
  • Understand the bias and ethics behind the technology.
  • Delve into the technical operation of this kind of biometrics. 
  • Know about regulations affecting biometrics processes.