Object Detection and Tracking detects the people available in a video frame and tracks them in subsequent frames. In this project, there are two major parts: Detection and Tracking. The detection of human shape in a video frame was detected after Histogram of Oriented Gradients (HOG) features were calculated and was classified with using Linear Support Vector Machine (SVM). The testing was done in MIT and INRIA pedestrian dataset where the MIT dataset had 509 training and 200 test images of pedestrians in city scenes. The INRIA dataset had 1805 photos with 64x128 resolution cropped from a large set of personal photos. For each image, the histogram of Oriented Gradients was calculated and the classification was done. The detection accuracy of the application was 85% based on the MIT and INRIA Person Detection Dataset. The detected human was tracked using Kalman Filter.
Braille Character Recognition (BCR) is a method to locate and recognize Braille document stored as an image, and convert the braille in image into equivalent natural language representation. BCR converts the pixel representation of an image into its equivalent character representation. Based on literature review studies and remarks it can be concluded that extracting information from braille paper requires accuracy in preprocessing stage and then the processed image is mapped into an artificial neural network in order for optimal detection. The system was tested with a variety of Braille documents written using English Braille standards. Digital image processing stages like Gray scale conversion, binary conversion; filtering and morphological operations have been applied in the preprocessing stage which results in enhanced quality of Braille dots. Furthermore, image segmentation, image cropping and resizing has been applied to the braille document in order to improve the matching method. The proposed method in this project extracts the braille dots from the braille picture, and then maps the image pixels into the machine which has been trained with the braille data training set using back propagation algorithm. And then transform the character into English text. The implemented algorithm achieved an average of 89.62% while taking 200 hidden nodes and a learning rate of 0.3 precise results when several cases have been performed with excellent recognition outcomes.
Topic Modeling is a technique for unsupervised analysis of the large collection of the document. Topic model extracts the hidden topic from the large collection document. Topic Modeling Using Latent Dirichlet Allocation is document topic modeling system that extent the Latent Dirichlet Allocation. This model is applied to a collection of 18846 document from the 20 Newsgroups (Qwone.com, n.d.). But the model was tested on few newsgroup document .The model gives the latent hidden topics cluster which is visualized in the Pie chart along with the frequency of words appearing in the document. The output of the model is tested using the term coherence which shows that a good topic model is one that has higher value and gives topic with more human interpretable. The good model with 50 iterations and topic 2 value is -12.9647523036 and the bad topic with 1 iteration and topic 2 have the value of -14.9906055583
The following document discusses on text search on audio in addition to sentiment analysis on the transcript of the audio. The audio is initially sampled at the rate of 44100 hz. The sampled audio is translated to transcript using the IBM Watson API. Audio timestamp mapping to text is done which enables text search to be performed on this audio. Additionally, sentiment analysis of the transcript is performed using Random Forest Algorithm. In order to train the model, 514,999 classified tweets, each row is marked as 1 for positive sentiment and 0 for negative sentiment was used. 69% of accuracy was obtained on training the model with 70% of data and testing with 30% of the dataset.
Giant volumes of Bank Cheque are widely used all over the world for the financial transactions, especially in developing countries like Nepal. Paper Cheque is still used for non-cash transaction even after the implementation of electronic transactions such as debit and credit card system. These Cheques are processed manually, which require extra manpower, time and cost. Our attempt is to automate this process so that this labor-intensive work can be computerized which is both time and cost effective. In this paper we purpose a mechanism of recognition of check fields like Payee Name, Amount, Date and Account number. We propose a two-stage model where first phase includes the extraction and segmentation of required object from the Cheque and second is the recognition of those extracted character. These extracted characters provide the required information about the cheque that is to be processed. To automate the processing of cheque without human intervention the extraction and recognition of this character is necessary. The first phase extraction and segmentation of required object includes all Image processing activities. Firstly, we pre-process the Cheque images. Preprocessing includes separating foreground and background; improve the interpretability or perception of information in images and to provide ‘better’ input for the other image processing technique. After that, the enhanced image was then applied for the segmentation, which was done by using Connected Component Labeling. The segmented letters and digits that were extracted were fed into the Neural Network that is developed by using Back-propagation and gradient descent Machine learning algorithm which has 99% of accuracy with 0.1 learning rate. Finally, the machine shows its predicted result that were recognized by the system which was 75% accurate.
The purpose of the proposed system is to develop a computer system that can detect and recognize a person. Face recognition is done by comparing the characteristics of face to those of known individuals. There are many techniques used for face detection and recognition. Principal Component Analysis (PCA), a technique that is used in image recognition which is simply done by transforming the face images into a small set of characteristics that are called feature images. Those feature images are hence called as eigenfaces. The purpose of this project is to implement a face detector and face recognizer in real time using Principal Component Analysis (PCA) with eigenface. This project also uses EmguCV cross platform, .Net wrapper to the Intel OpenCV, image processing library and C#, .Net library for capturing and processing image of capture device in real time. The accuracy of 73.33% was obtained for the case of face recognition.
Associative Rule Generation is the system of recommending food on the basis of occurrence of the food items with each other. The recommendation is carried out on the basis of pattern of food order in past. It is one of the application of associative rule mining that is carried out by using support and confidence. Apriori algorithm is used for the rule generations and the recommendation changes according to the change in support and confidence value.
CAPTCHA is a type of challenge-response test used to determine human users from computer ones by generating challenges that are easy to solve for humans but difficult to solve for computers. One of the most common CAPTCHAs today is text-based where a short word is placed in a jumbled image. This project intends to break simple text-based CAPTCHAs automatically by implementing image preprocessing, filtering, and image segmentation followed by back propagation algorithm of neural network for recognition. Out of 320 CAPTCHAs, 311 CAPTCHAs were successfully decoded which means 97.187% accuracy was achieved with a simple multilayer perceptron network. This result could suggest that simple test based CAPTCHA is easily breakable so CAPTCHAs that are more difficult to segment and decode must be used instead.
Automatic Word Search Puzzle solver is an automated system for solving word search problems using Linear Search, Binary Search and Tries Algorithm. In order to generate solution of the provided puzzle image processing techniques namely, grayscale conversion, thresholding, dilation and contour detection were used to horizontally and vertically segment the puzzle image into equivalent individual character images. The segmented characters were then fed into the trained neural network. Neural network training was conducted using 26416 image data for 26 different alphabet letters (A-Z). Finally, the recognized data was modeled into an equivalent puzzle matrix form. Separation of train and test dataset into ratio of 8000:26416 was carried out for neural network training and testing, where the final accuracy obtained was 87.7625 % (solely for neural network only). Similarly, for generating solution by searching dictionary words, dictionary data was collected in a text file along with its description. This data was preprocessed and the final output was a word list of 36739 words. Number of possible word combinations for equivalent puzzle matrix in horizontal, vertical and diagonal direction was calculated. 10360 total calculated word combination for 15*15 letter matrix puzzle was obtained. This combination was tested with the collected dictionary data using linear search, binary search and tries algorithm. After evaluation of three algorithms for puzzle solution generation the time complexity for puzzle (15*15 letter grid puzzle) in ascending order were obtained as; tries, binary and linear. Hence, tries implementation was found more efficient in terms of time complexity for generating puzzle solution.
This paper presents a line-up prediction system for a MOBA game (DOTA 2), using KNN algorithm. This system implements K Nearest Neighbors algorithm to find out the nearest neighbor for any given query data. A polynomial weight function is used to more elaborately find nearest neighbor for any given query. This paper also provides insights to the process of data collection and feature selection before the implementation of the algorithm.
Graviton Game is a 2D platformer game made in Unity 3D engine using C# scripting. It is a single player platformer game where the player can freely use gravitation to change their orientation and environment. Basically, the player can guide an avatar or player character to jump between suspended platforms and/or over obstacles while changing the gravitational orientation to advance the game. Other acrobatic maneuvers may factor into the gameplay as well, such as swinging, bouncing or hovering. The enemy of the game is a user implemented AI meaning the enemy behaviors and movements are maintained by the algorithm written by the developer. It was based on mechanics of games such as Super Mario Bros., Limbo (the last level). Making a game of platform genre that has engaging premise was the objective of the project. The game has been described thoroughly in this document including the major classes. The functional and non-functional requirements are given. The system design is explained. Class diagram and state diagram give a general picture of the working mechanics of the game. Corrective maintenance of the game will be carried out.
In this project a Java based application was developed that allows a Sudoku puzzle to be extracted and solved in real time using input images of Sudoku puzzles. If the input image of a Sudoku puzzle has reasonable clarity, the puzzle will be solved and the computed values will be displayed in form of a solved Sudoku puzzle. Here we have solved the Sudoku puzzles using a form of backtracking algorithm which is an efficient algorithm to solve NP-complete problems. The Sudoku Puzzle is recognized from the image using processes like Segmentation, Image processing and Optical Character Recognition. For Optical Character Recognition Artificial Neural Networks are used. The Artificial Neural Network is implemented using backtracking and Gradient Descent algorithm. The project is successfully solves all Puzzles if the input image is clear and has perfect grids. This project can be applied to solve Sudoku puzzles of any size but it is designed to solve puzzle of only 9*9 dimensions as NP complete problems suffer from combinatorial explosion with the increase of size of puzzle.
Place Recommendation System is a user-based recommendation system that recommends traveling places to the users according to their past history and similarity mapping with other users. Users can rate a particular places and are recommended other places based on the assumption that they may be interested in similar other places. Similar users are found by calculating Pearson‟s correlation coefficient and its value is used to map similar users. The places are recommended on the basis of interest of similar users
Compression is useful because it helps us to reduce the resources usage, such as data storage space or transmission capacity. Compression methods have a long list. In this paper, we shall discuss only the lossless compression techniques and not the lossy techniques as related to our work. In this, reviews of different basic lossless data compression methods are considered. The methods such as Shannon-Fano Coding, Huffman coding, Run Length Encoding and Arithmetic coding are considered. LZW is dictionary based algorithm, which is lossless in nature and incorporated as the standard of the consultative committee on International telegraphy and telephony, which is implemented in this paper. Furthermore, the code for each character is available in the dictionary which utilizes its ASCII code. In this paper, LZW data compression algorithm is implemented, thus the text data can be effectively compressed. Obtained compressed results show an improvement in lossless data compression scheme by reducing storage space about 40% to 50% depending upon the redundancy appeared.
Vehicle Recommendation System is a user-based recommendation system that recommends automobiles to the users according to their past history and similarity mapping with other users. Users can rate a particular vehicle and are recommended other vehicles based on the assumption that they may be interested in those vehicles. Similar users are found by calculating Pearson’s correlation coefficient and its value is used to map similar users. The vehicles are recommended on the basis of interest of similar users.
There are around 180 different currencies used in different countries around the world. Currency Recognition and conversion system is implemented to reduce human power to automatically recognize the amount monetary value of currency. Automatic currency note recognition invariably depends on the currency note characteristics of a particular country and the extraction of features directly affects the recognition ability. In this project, the pixel values of 4 different corners of the currency is used as a feature for robust representation of the currency image. Altogether, 2500 features are obtained and is fed to the Neural Network for training purpose. Also, Red, Green and Blue (RGB) values and the size of the currency are additional features hence making the total feature count of 2505. However, adding those features gives less accuracy as compared to the 2500 feature set. A class MLPClassifier is used which implements a Multi-Layer Perceptron (MLP) algorithm that trains using three-layer feed forward Backpropagation. Classification is accepted in seven denomination classes which are Rs.5, Rs.10, Rs.20, Rs.50, Rs.100, Rs.500, and Rs.1000 rupee notes. The currency recognition system obtains an accuracy of 94% on 2500 input nodes and 88% accuracy on 2505 input nodes which includes RGB values and the size.
With advancement of Internet and Technology, there is an exponential growth of users who are connected to Internet via some kind of electronic device. With use of Internet, various illicit activities have been conducted, which in turn has resulted to growth of criminal activities such as phishing, dissemination of malwares, financial fraud, spam-advertised ecommerce and more. URL being the entry point of all the type of activities, it can be analyzed to detect and categorize URLs into benign or malicious. URL classification has been carried out using lexical features, network traffic, hosting information, content of the URL and other strategies, in the past. Except lexical features, other techniques require time intensive lookups, which introduces delay in the classification process and thus cannot be used in real time system. This paper describes classification of URLs based on its lexical features. Classification using lexeme feature is a lightweight approach in the process for classifying malicious URLs. Analysis is carried out to find the effectiveness of the system that detects malicious URL based on lexical analysis, which in turn can be used in real time system. Process of URL classification was carried out using Neural Network where 57570 URLs were used. Feature extraction being the major task that determines the efficiency of the system, different approaches are used to extract features. Finally, comparison of the result based on the extracted features using different feature extraction approaches was carried out. The accuracy of 92.7% was obtained using test data which was a part of total dataset
Optical Character Recognition is the process of text extraction from images of handwritten text. Mathematical Equation is a statement of an equality containing one or more variables. Variables are alphanumeric characters and equation also contains mathematical symbols like +(plus), -(minus), *(multiply), /(divide) etc. OCR of mathematical equation using Multi-Layer Neural Network Perceptron is a system that locates and recognizes mathematical equation written in a paper. The classifier used to recognize the mathematical equation is Multi-Layer Perceptron Neural Network that has 784(28*28) input nodes, 500 hidden nodes and 41 output nodes. First, the Neural Network is trained with 2000 data sets of each alphanumeric digit and mathematical symbols. The images of handwritten equation are preprocessed and then fed to neural networks for recognition. The Neural Network give 96.25% efficiency in recognizing alphanumeric characters.
Recognizing text in the natural images is a foundation for solving other problems in computer vision like content based image filtering. Even though we have reached near perfect result in character recognition in documents we do not have a good system to recognize text in scenery images. Text Recognition in Natural Images Using Convolutional Neural Network is a system that detects and recognize text in natural images. It uses MSER (Maximally Stable Extremel Region) to detect text in the image and uses Convolutional Neural Network to recognize the text in natural images. The accuracy of 86.96 % in seen images and 84.05% in the unseen test images was obtained in this project.
First-Help is an offline android application that provides information on First Aid for various injuries. At present people are unaware of the various consequences and treatment of minor to major injuries before reporting to the hospital. First-Help could be one of the solution to overcome this situation. First-Help helps people be knowledgeable about the first-aid information on various injuries to prevent severe damage to the user's health and body when such accidents or injuries occur. It also has the notification feature which notifies and educates the user on a daily basis. First-Help also provides an Emergency SMS System in case of emergencies or critical injuries/cases when immediate help is needed to the user or around the user. The application gives you an option to calculate BMI, FFMI, Calories (Maintenance, Gain and Lose) and BodyFat%. The application also has a feature that will provide you with daily notification on the first-aid information stored in the application.
Electronic mail has nowadays become a convenient and inexpensive way for communication and inexpensive way for communication regardless of the distance. However, increasing volume of unsolicited emails is bringing down the productivity dramatically. There is a need for reliable anti-spam filters to separate such messages from legitimate ones. The Naïve Bayesian classification has been used in this project which is suggested to be an effective engine to pick out spam mails and separate them from hams. The algorithm was trained using Enron Dataset, a well-known spam/legitimate email dataset. I have developed this filter as a Web extension, which would consume the emails user uploads or receives and give back the predicted probability that if the given mail is spam or ham and in what degree given email is spam. The output obtained will give us the prediction if the mail is spam or ham as per the training data set. Experimental results have been collected using Enron dataset consisting of total 52076 mails including both spam and ham. The accuracy obtained using Naïve Bayes classifier is 98.32%.
Prediction is a statement of an uncertain event which uses past data, analyzes it and predicts the future. This project focuses on predicting whether a flight will be delayed or not on the basis of Logistic Regression. Due to lack of easily accessible appropriate flight data of Nepalese airways, international flight data of United States of America is used as dataset. Historical flight data were used as data sets. The input parameters chosen were Origin, Dest (destination), Unique Carrier, Day of Week, Dep hour (departure hour) and Arr Delay. All the categorical variables were converted into dummy variables. Later those data were split into training and testing sets and fed to the logistic model. The accuracy of the system was found to be 61 percent. As a separate module, this project also focus on providing real time scraped data from TIA website to user and notify them upon change on the flight’s status followed by them.
Handwritten Recognition is the process of text extraction from images of handwritten text. Devanagari numeral string is the collection of devanagari digits forming a number. In this project handwritten Devanagari numeral string recognition is done using Multilayer Perceptron Neural Network (MLP). Handwriting Recognizer a system that locates and recognizes devanagari numeral strings written in a white paper. The classifier used to recognize the Devanagari digit is Multi-Layer Perceptron Neural Network that has 1028(32*32) input nodes, 300 hidden nodes and 10 output nodes. First, the Neural Network is trained with 1700 data sets of each digit (0-9). The images of handwritten text is preprocessed and then fed to neural networks for recognition. The obtained result gives 96.5 % accuracy with a learning rate of 0.3.
Security nowadays has become an important issue, encryption has come up as a solution, and plays an important role in information security system. Image encryption and decryption is a process where people encrypt and decrypt image. There are two main methods of encrypting and decrypting which is Symmetric and Asymmetric cryptography. We have chosen AES algorithm for symmetric and RSA algorithm for asymmetric. AES256 block is implemented where block size and key length is both 256 bit while RSA is implemented using a complex prime number as input from the user and encrypting the image with public key and decrypting by private key. Hence, in this paper we implemented two encryption techniques AES and RSA algorithms and compared their performance of encryption techniques based on the analysis of its simulated time for encryption and decryption, number of pixel change rate and pixel correlation analysis. Experiments results are given to analyze the effectiveness of each algorithm. Also based on the experiments performed we have implemented the hybrid of both AES and RSA.
With an ever-increasing size of text present on the Internet, automatic summary generation remains an important problem for natural language processing. There is an increasing demand for automatic methods for text summarization and it is of great importance to solve information overload. To solve the problem a neural network has been created which generates extractive text summary. This paper describes the mechanism implemented to generate Extractive Text Summary from Single Document using Multi-Layer Perceptron. Summarization has been carried out by extracting features from each sentence, training the neural network with error back propagation technique with single hidden layer. The trained model was then tested with manually generated summary and accuracy of 85% was obtained. Finally, the trained model has been implemented to generate Extractive Summary.
Path finding and agent movement are considered to be the core of AI Movement System. Automatic Maze Solver is based on the concept of such core AI Movement System components. Automatic Maze Solver aims to obtain maze path(s) from a maze image uploaded by the user, which can be any viewable natural maze or prior produced or pressed maze. In this project, the maze image uploaded by the user is digitized by the system and preprocessed for proper maze interpretation. Median filtering, Niblack thresholding and Zhang-Suen thinning are used during the preprocessing phase. The preprocessed maze is then acted upon by mixed module implementation for solving it. Mixed module implementation in the context of the project includes Dead End Filling with Image Processing, to find all possible paths in the maze, followed by Graph Theory based approach, A*, to find the shortest path. The implementation has been tested and confirmed to work with Rectangular, Circular and Hexagonal mazes; however, since the implementation approach is generic, it works fine with other maze shapes as well provided they have only two openings (start and end) at the edge of the maze. The overall system performance has been tested using 10 different mazes and the maximum total time obtained was 34.151 seconds.
Audio fingerprinting techniques aim at successfully performing content-based audio identification even when the audio signals are slightly distorted. Different applications have delivered different working implementations of audio fingerprinting for music and song identification. This document builds on to demonstrate how audio fingerprinting can be used to store the fingerprints of the theme music and songs of various genres and how the application identifies the music based on a short audio clip containing a sample of the music. Avery Wang’s Shazam algorithm is implemented to create the audio fingerprints which have been proven to be robust against noise.
Parts-of-Speech Tagger for Nepali Text using SVM is an application that assigns parts of speech like noun, pronoun, verb, adverb and other lexical tags to each word in Nepali text based on both its definition, as well as its context. The tagger is built using the Support Vector Machine learning framework that is trained with 80,000 lemmatized words from the Nepali National Monolingual Written Corpus. The average accuracy of 88% and 72% was obtained for lemmatized text and unprocessed raw text tagging system respectively.
Tag recommendation is one of the challenging problems in data mining of text. Tagging lets users explore related content, and is very useful on software information sites like Stack Overflow, Stack Exchange sites, Quora, internet forums, etc. In this paper, I tried to develop another software information site named ASK-ME, with an improved automatic tag recommender based on historical tag assignments to software objects using combination of Bayesian Inference and Frequentist Inference collectively known as ENTAGREC. It achieves accuracy expressed in terms of recall scores of 0.805 for 5 fold cross validation and 0.868 for 10 fold cross validation, using Stack Overflow dataset.
Reinforcement learning is essential for applications where there is no single correct way to solve a problem. We show in this project that we can use reinforcement learning to effectively learn how to play Flappy Bird even when there is high dimensional input. The agent which is the bird in this case, learns the world around it on its own and uses the input to come up with an optimal strategy to navigate through the environment. We use a convolutional neural network with Q-learning for the purpose and we also discuss the potential challenges and improvements that can be made.