Comparative Study on Artificial Intelligence Based Real Time Voice Augmentation Applications
Keywords:
Audio processing, Machine learning, Raspberry pi, Voice augmentationAbstract
The demand to access information, which is dynamic, personalised and adaptive, in real time is very huge. Interacting through voice augmented systems yields efficient and faithful access to large amount of data and the obtained information becomes more meaningful and understandable because it is appropriately customised to the right spatial and time frame of the user’s need. By providing alternative methods, it helps the users in the aspect of real-time physical, digital and virtual interactions. The objective of this project is to display the interactions of voice augmentation functions and the quality of the services can be improved with the help of artificial intelligence, deep learning, neural networks, and machine learning models to illustrate the possibilities of its potential when combined, to develop latest, user-friendly, and hassle-free applications. There exist a wide range of applications which include vending machines, billing counters, home assistant, etc. In order to implement a project of this scale, it is important to refer to prior established works which play a vital role in the world today. In this report, we have researched a wide range of topics ranging from audio processing, artificial intelligence, text to speech conversion to direct applications in their respective fields have been thoroughly analyzed. Topics such as audio processing, speech to text, audio conversion, various machine learning algorithms and models, implementation using the Internet of things, and applications in various fields have been covered. The following sections shed light on various research journals referred to along with the key takeaways from each research journal. The need for acquiring knowledge and information is increasing day to day. If we go back a few decades, the one primitive method to acquire them was through books, scriptures and manuscripts. As technology has risen to a level where we can access every data in this world within a fraction of seconds, the need to access them has also increased. Now, we have all the information we want within our reach, but the methods through which we reach them are still primitive, such as typing, clicking and touch screens. The time taken to give the inputs, process them and obtain the desired output is considerably slow at this point of time when we have reached the pinnacle of computing and technology. It has been over five decades, but the mode of inputs and outputs to the machine and from the machine has been constant. One must be a good programmer to communicate in a high-level language and specify our needs in a high-level language, and then it is converted into a low-level language by another machine. It is observed that the flow of information and knowledge is higher when it is done through conversations and discussions, in this real world. So, if we can bridge the gap between us and the computers in the communication domain, where we can easily access information from the computer through normal human conversations, then the amount of information gained and the need to gain information increases rapidly. In this project we are not set only to access and gain information, but we are exploring the possibilities of acting on that information and carrying out a few tasks through normal conversations. In other words, this project’s main milestone is to be able to carry out a normal conversation with the computer, where it will not only respond with an appropriate reply, but act and perform functions just by voice commands.