Transcribed spoken user queries using Automatic Speech Recognition model i.e. Whisper model and then generated relevant answers using large language models. The generated response is then passed to the TTS (text-to-speech) model for speech synthesis.