SEMANTIC QUESTION PAIR MATCHING WITH DEEP LEARNING

anuska Shrestha
2020
BSc.CSIT
Semester 7
Downloads 1

Q&A forums like Quora, Stack-overflow, Reddit, etc. are highly susceptible to question pair duplication. Two questions asking the same thing could be too different in terms of vocabulary and syntactic structure, which makes identifying their semantic equivalence challenging. In this report, we explore methods of determining semantic equivalence between pairs of questions using a dataset released by Quora of more than 400,000 questions pairs through Machine Learning with Natural Language Processing. The machine learning approach is based upon Levenshtein distance between two sentences and the sentence-vector encoding using Word2Vec models to experiment with a variety of distance metrics and predict their semantic equivalence. The experimental results show that the artificial neural network with word embeddings achieves high performance, achieved an F1-score of 0.6529 with 0.7236 accuracies on the test set.

semantic analysis
duplicate questions
Natural Language Processing
ma- chine learning
word embeddings

Similar Projects