Voice Biometrics for Secure Authentication

Voice Biometrics for Secure Authentication

What is voice biometric authentication?

Biometrics recognition is the technology of identifying a human being by using their physical and/or behavioural characteristics. Security is the highest priority in today’s digital world and biometric authentication is one of the fastest-growing technologies in the field of secure authentication. In today’s contactless access control of applications, voice and speech recognition-based authentication are the most useful biometric systems. For example, rise in the ATM skimming, card data theft issues, online data tampering, and other intrusions are the most common incidents in today’s digital world.  Biometric authentication can increase the authenticity of the system in those application areas. This step is towards the secure and convenient user authentication of the smartphone, biometric feature-based security in smartphones, financial transactions, and many other payment applications. Figure 1 presents the architecture of the biometric system.

Voice is the most secure and highly accepted biometric recognition method among all the biometrics because of its ease of use, user acceptance, ease of implementation, and cost.

Voice Biometrics for Secure Authentication

Different types of voice-based authentication:

The conversion of spoken word into text is called speech recognition i.e. “what is being said” and based on his/her speech, the job of recognizing the speaker is called speaker recognition.

There are two sub-field in speaker recognition: one is speaker verification, and another is speaker identification. Identifying the unlabelled speech of claimed speakers from the group of reference speakers is known as speaker identification i.e. who is speaking. Accepting and rejecting the claimed person is known as speaker verification. It is a decision problem with two options. There are two methods in both speaker identification and verification: fixed phrase/input, called text-dependent, and different phrase/input in training and testing, called text-independent.

Speaker verification for voice biometric-based authentication:

Speaker verification for voice biometric-based authentication

Figure 2 shows the different steps of speaker verification. Text independent speaker verification is more convenient than text-dependent since the user can communicate freely with the system. To obtain good performance, however, it takes more time to train and test utterances. Text-dependent speaker verification provides a better success rate for comparatively short text/speech. It is used to uniquely identify individuals by their physiological characteristics or personal behavioural traits. To recognize a speaker, the work of speaker recognition entails extracting a unique speaker-specific feature from the voice signal.

Hypothetical description of speaker verification:

The theoretical review of speaker verification is explained here.

Considering the test speech sentence is Y and system has N number of speaker. S is claimed to be the speaker.

The goal of speaker verification is to identify whether Y was said by S.

Because Y is assumed to be a single speaker’s voice sample, the task is to verify single speaker.

It can be described as a fundamental hypothesis that alternates between the null (H0) and alternative (H1) hypotheses.


Application of speaker verification system:

Speaker verification can be used for authentication of any system, especially for remote access of the system, smartphone, many other payment gateways, etc. Therefore, text-dependent based speaker verification can be an important biometric authentication system to enhance the security of any application.

  • Biometric Login: Now biometric login system is widely used, thus, a voice-based biometric system can be used for person authentication, protecting the data, and for other applications also.
  • It can be used in much another payment gateway like PhonePay, Google pays, etc. in the smartphone.
  • In social media applications, it can also be used on smartphones and android phones.


Dr. Saswati Debnath
Assistant Professor
CSE Dept.
Alliance College of Engineering and Design