RECORDED SPEECH - TO - TEXT SUMMARIZ ER USING NLP

Pallavi Ghimire
2020
BSc.CSIT
Semester 7
Downloads 2

Transcribing speech is expected to become a crucial capability for the upcoming IT era. Be it presentations, broadcast news, or even class lectures, the need for transcribing is rising. Even though speech is the most natural form of communication, it is not easy to process it. However, if the recordings are simply left as mere audio signals, a deeper sense of understanding to the recorded data will not be gained. These data in the form of audio can be utilized to create much more meaningful information by the process of summarization. As of today, different methods of automatic summarization are being researched on and studied. These methods include two broad divisions: extractive and abstractive summarization. Abstractive summarization is still being studied, and does not yield good result when it comes to handling a complex dataset. For the proper handling of data and effective extractive summarization of the input, Recorded Speech-to-Text Summarizer using NLP is proposed. This system utilizes the TextRank algorithm, which is an expansion to the PageRank algorithm, to generate summaries of the input that is processed. The output(s) generated by the system were compared to two categories of summaries generated; the first were summaries that were devised by the process of hand-picking lines from the input, and the second were summaries that were generated by a system that was a basic NLP processor whose main criteria to grade sentences was frequency-distribution of keywords. For the same, a group of participants was called for, and some files were fed to the system as input. Upon comparison, it suggesting that the system indeed works as supposed to.

automatic summarization
Natural Language Processing
speech-to-text summarization
extractive summarization
textrank
pagerank
tf*idf transformation

Similar Projects