INFORMATION RETRIEVAL FROM CORPUS OF DOCUMENTS USING LATENT SEMANTIC INDEXING

Sushil Awale
2019
BSc.CSIT
Semester 6
Downloads 2

Vocabulary mismatch problem is a common phenomenon in the usage of natural languages for information retrieval. Systems that rely on matching user’s query and documents solely on the basis of vocabulary suffer from problems of synonymy and polysemy. Research shows that on average 80% of the time different people (experts in the same field) will name the same thing differently. In such scenario, information retrieval must move beyond keyword matching and find relevant documents based on the concept. Latent Semantic Indexing is one such method that identifies patterns in the relationships between the terms and the concepts contained in an unstructured collection of text. This project implemented Latent Semantic Indexing with TFIDF to retrieve relevant documents for a search query on a corpus of machine learning research papers from ArXiv.

Latent Semantic Indexing
Music Information Retrieval
TFIDF

Similar Projects