Language modeling is an essential task in natural language processing and has wider applications in downstream tasks such as speech recognition, machine translation, spelling correction, etc. Language model architectures that use word vectors to represent the vocabulary do not capture the sub-word information (i.e. morphemes) and perform poorly in case of morphologically-rich languages such as Nepali. In this project, I apply convolution to word vectors formed by concatenation of character vectors to produce feature vectors. These feature vectors capture the sub-word information of the vocabulary and are passed into an LSTM layer through a Highway network to learn a probability distribution over a set of vocabulary. The language model built in this project, achieved a perplexity score of 378.81 i.e. in each prediction the language model is equally likely to predict 379 words as the correct one.