Easy Web is a web development platform. With the technological advancement witnessed today, it is of paramount importance that web development be made easier. The whole motive of the platform is to make web development feasible for individuals with little or no programming skills. To develop any kind of website they need a sound knowledge about the different component required. We have developed this platform which makes it optional to have a sound knowledge of the web development in order to develop websites. People with even the basic knowledge about the components can design and develop a website using this platform. We designed and built Easy Web, which is a user targeted web development platform that uses Bootstrap for the structure and layout of the web page along with jQuery for DOM manipulation. In this platform, users are presented with an editor on which they can drag and drop basic web components like containers, buttons, navigation bars to name a few. Also, users can change all the basic CSS of the selected web objects with the help of the options presented alongside the editor. Furthermore, a live code editor is also presented to facilitate individuals with some programming knowledge who want to the coding themselves. Users can also check the responsive behavior of the web page. Basically, Easy Web functions like any other photo editing software except it is used to create websites. The result of this analysis can be used to elicit what the required components and features are for the website. The analysis can be also used to form the look and feel for the website. The findings made during this project can be useful for those people who are trying to make a platform which deals with the dynamic aspect of the web development.
Communication is more effective when a speech is accompanied by the corresponding lipmovement. Lip-reading is helpful for hearing impaired person as well as for a normal hearing person in adverse acoustic condition. Speech Driven Face Animation can enhance audio based spoken communication. In this project, we have made a system that is capable of taking a raw audio as input and providing a face animation with the corresponding lip movements based on the audio. In order to obtain the face animation, we need to break the audio into respective set of phonemes and then map the phonemes to a corresponding facial expression. For the classification of phonemes a language model was built and the phonemes generated by the language model were mapped to the facial expression in Preston Blair series. For building the phoneme classifier, we need a text file which contains texts broken into their phonetic sequence. The test showed the accuracy of 70% for phoneme classification which is good enough for this application. This application can be useful in Internet telephony where only the audio is available. Further, it can be a useful application for hearing impaired person as they rely on the lip movements for better communication.
As the growth in the game development industry, the complexity in games has also grown. In order to enter this field, a designer must have core understanding of game and various techniques involved. Air Hockey: An Android based game is a simple 2D arcade game made in unity3D. It is a local multiplayer game where player tries to score in a frictionless board .Basically, the player is given a handle through which he/her tries to divert the puck in the opponent’s goal. Function of Single Player is also available where player tries to outmatch the machine. This is a fast paced game suitable for quick break alternative. The project is an attempt to be familiar with the techniques involved in game development. The main aim of the project is to provide a fast past game alternative to the end users. The product has gone through system testing and User Experience test where it was rated as 6.8 out of a possible 10.
There is an undeniable communication problem between the Deaf community and the hearing majority. Innovations in automatic sign language recognition try to tear down this communication barrier. My contribution to this domain, Nepali Sign Language Recognition, is an automated system for recognizing the gestures of Nepali Sign Language using 2D Convolutional Neural Networks (CNN). In order to recognize the provided gesture, image processing techniques namely, grayscale conversion, thresholding, edge and contour detection were used in order to create shape files of individual hand gesture images. These segmented characters were then fed into the trained CNN which contains a further 4 layers: convolution layers, pooling/subsampling layers, nonlinear layers, and fully connected layers. The ReLU activation function was applied to the output to introduce nonlinearities into the model. The Neural Network training was conducted using 1200 images each for 37 alphabets and 10 numbers of Nepali Language i.e. a total of 56,400 images. Furthermore the images of every single character were flipped along the vertical axis and were added to the training model. Finally, a set of 1200 blank images were also trained to the model which increased my final accuracy from 82.4450% to 92.4568%.
Nowadays, learning to play games has been one of the popular topics researched in AI. Learning to play games using game theory/ search algorithms require careful domain specific feature definitions, making them averse to scalability. The goal of this project is to develop a more general framework to learn game specific features and solve the problem. The game I am considering for this project is the popular board game- Checkers. Since the environment of Checkers is deterministic, there is no need of using neural network to learn the environment. This problem was solved using naive RL implementation but required good feature definitions to set up the problem. The variation of reinforcement learning algorithm that was used is Qlearning. Here the main aim was to create a Q-learning agent and to make it learn to play game of checkers by playing it against search algorithm. Minimax algorithm was used to train the Qlearning agent. The Q-learning AI agent was trained by playing multiple games with Mini-Max agent. The Qlearning AI agent won only 29 games but was able to draw 19,500 games while playing 50,000 games with Mini-Max agent.
Maze generator and solver is based on graph theory and different kinds of searching and path finding algorithms. It aims to generate different kinds of maze (i.e. maze with dead ends, mazes without dead ends and mazes with few dead ends) by the implementation of depth first search with recursive backtracking algorithm for maze with dead ends and with additional array implementation for mazes without dead ends and few dead ends. The generated maze can be solved by user or can be automatically solved by the use of different informed (i.e. Greedy and A*) and uninformed (i.e. Depth first, Breadth first) search. The generation and solving of the maze can be visualized and the size of the maze to be generated can also be provided by the user. The implementation has been tested and confirmed to work for the mentioned search algorithm for maze generation and for solving it. The maze generator and solver only generates and solves square maze. Since, maze generator and solver can be used to visualize the maze generation and solving of maze, the time take for generating and solving maze drastically increases as the size of maze increases.
In the last decade, solving the Sudoku puzzle has become every one’s passion. The simplicity of puzzle’s structure and the low requirement of mathematical skills caused people to have enormous interest in accepting challenges to solve the puzzle. Therefore, developers have tried to find algorithms in order to generate the variety of puzzles for human players so that they could be generated and solved by computer programming. In this report, I have presented two algorithm called Shift and Backtracking. The purpose is to implement a more efficient algorithm and to generate and solve Sudoku. These algorithm are general algorithm that can be employed in to any problems. Firstly, we make a 9 * 9 matrix and generate matrix using shift algorithm. Then, we pop up the numbers randomly according to the complexity of level. After that, to solve the matrix we read the blank space within the matrix and use Backtracking algorithm to solve it and obtain the accurate result. The results have proved that the Shift algorithm generates and Backtracking algorithm solves the puzzle faster and more effectively.
There are undeniably large number of videos being produced every day in various different formats. All the formats are not supported by all the media players across all the devices. Furthermore, there might be cases when a user wants to preserve large number of video files due to sentimental importance or even for research purposes. In such cases, they will require a huge amount of storage space. This project can be used to convert the videos from one codec to another codec, which will result in the video files getting compressed which in turn, will save valuable storage space. The project even provides facilities like altering the bitrates and fps on which the video is to be played to its users. In order to transcode the provided video, intra-predictive encoding algorithm is used where the current frame of the video will be taken as an input parameter to predict the next frame. De-blocking filters such as Chroma and Luma are used to compensate for lost frames and make the resulting video appear smooth. Three test cases with five tests each were carried out in order to test the efficiency of the system. The test cases involved testing in same device with constant framerate and variable bitrate, testing in different device with constant framerate and bitrate and finally comparing the system with HandBrake which is a popular video transcoding software. The compression rate increased steadily until input bitrate of 1000 kb/s and it doubled at 4500 kb/s bitrate.
MCQ checker is an automated system for evaluating and marking different types of MCQs by using machine learning, contour detection and filtering, template matching and various other IP functions. In order to process the images, various preprocessing techniques such as resizing, thresholding, grayscale conversion, dilution, and erosion were used along with contour detection, which segmented the image both vertically as well as horizontally. For the first type of MCQ, referred to as ‘Bubble Sheet’, two approaches, template matching and contour detection and filtering, were used. In template matching, two preprocessed images, correct answer and answer sheet, were fed to the system which resulted in common features of both the images which were then counted to extract the total number of correct answers. In contour detection and filtering, the marked bubbles were compared with the result set to get the correct number of answers. For the second type of MCQ, referred to as ‘Written Answers’, machine learning was used. The preprocessed answer sheet was fed to the neural network which read the characters in the image. These characters were compared to the correct answer and total number of answers were calculated. The accuracy of the neural network was 91.66% for 10 digits (0-9) and 5 letters (AE). So, by using the system, we can correctly and efficiently evaluate the answers of the MCQ answer sheets.
Speech Recognition refers to the process of converting analog speech signals into text. A lot of work has undergone in this particular field. The complex nature of speech due to its contextual meaning, dialects, accents as well as the environment makes the task of recognizing speech very difficult. A lot of research is currently under progress to increase the accuracy of this system. Although extensive research has been done on other languages, the field of automated speech recognition in Nepal is at its early level. With almost 29 million Nepali speakers all over the world, there can be a lot of applications of speech recognition. Therefore, the aim of this report is to show a way to develop an application that will be able to recognize the Nepali language Nepali Bhasa Recognizer will make it easier for a large number of disabled people to create a text file. Also since a lot of the Nepali people are not literate, this application could be handy to create a text document. According to the dataset we have created with 33 sentences in Nepali language with the speech corpus of 2 persons, we obtained around 85% of accuracy. This accuracy can be obtained more if the speech and text corpus are increased.
Hand gesture is very popular and powerful form of input nowadays. Using hand gestures as an input device will be easy and convenient way of interaction between the system and human and does not require any extra input device. The project is a desktop application developed in Python using Flash Framework which uses various image processing techniques. The purpose of this project is to be able to use various hand gestures as a means of input for media player control. For this process, Motion and contour feature detection technique is used to continuously monitor the video taken from the webcam for the hand gestures and obtaining the number of fingers present in the gesture and perform the corresponding control events associated with the particular gesture. Moreover, Haar Cascade classifiers are used for face and eye detection, which does not allow the user to miss parts of the video when he/she moves away from the screen. The project can be implemented to effectively control the media player using hand gestures and view media content without missing any parts of it. This efficiency of the project can be further enhanced by using machine learning or machine training for more accurate hand gesture recognition.
This report deals with automatic image colorization. Image colorization is an ill-posed problem that usually requires user intervention to achieve high quality. Two basic fully automatic approaches are proposed that are able to produce realistic colorization of an input grayscale image. Firstly, the automatic colorization was proposed using Support Vector Regression and Markov Random Field. The second approach is based on the convolutional neural network which is motivated by the recent success of deep learning techniques in image processing. Support vector regression is used to predict the U and V color channels for given pixel value, whereas a feed forward convolutional neural network is used in the second approach that predicts the a and b color channel values for lab color space of the input pixel which will be finally converted to RGB. Going through both approaches for around 200 trained data, if input image contains multiple objects, output from CNN seems closer to original image than output from SVR. The analysis is based on histogram comparison using various methods like Correlation, Chi-square, Intersection and Bhattacharya distance (also known as Hellinger).
We often encounter problem while we need to capture picture with large frame. Such images are captured by taking multiple images of a single file which makes us difficult to see and analyze such pictures. This problem could be solved if the all those images are merged into a single image. Image Assembly is a web application developed in Python using Flash Framework which aims to solve therefore mentioned problem. The purpose of this project is to stitch multiple images into a single image having similar corresponding features. For this process, SURF algorithm was implemented, which first extracts the keypoints and descriptors from the given images, then corresponding features among the given images is searched from which the features are matched among the given images. The result obtained from the project is a stitched or merged image of the given images. The result obtained from the project was merged successfully. The project can be implemented to generate the maps, panoramic images which cannot be captures by the normal camera in one image. This project can widen into the mobile versions for better results.
The purpose of this project is to create a language independent application that can correctly identify emotions in a speaker’s voice. The emotions considered in this project are: Happy, Sad, Angry and Fearful. This voice emotion recognition project uses the RAVDESS professional voice actor dataset to use as the input speech signal. The acoustic characteristic of the speech signal is feature. Feature extraction is the process that extracts a small amount of data from the speech signal. Many feature extraction methods are available and Mel Frequency Cepstral Coefficient (MFCC) is the commonly used method. This project MFCC cepstral coefficient along with frequency domain FFT of speech signal are extracted. The features are used to create a classifier using SVM with linear kernel. As a result of completing the above procedure, it is evident that MFCC with FFT works best to classify happiness and sadness in speech. The classifier performs reasonably well with the emotion of anger but sparsely with the emotion of fear, thereby, giving an accuracy of 75%. These findings may be useful in understanding the use of an SVM classifier by extracting MFCC and FFT from speech in the field of emotion recognition. They can be implement to create smart devices that can process human emotions and respond accordingly; innovating the way humans and machines interact with each other.
Research suggests that starting a day with a bad news does not have a good impact on readers. This has introduced the need of creating an application that can let a reader know if the news is good or bad. This project presents and evaluates a classification approach using news articles from a major English-language newspaper published in Nepal. In this project, a system has been built for the prediction of sentiment of news articles. In order to extract news articles, web crawler, porter’s algorithm, logistic regression have been used. Similarly, machine learning has been used for news classification purpose. This machine learning approach classifies a news article by analyzing its headline. An authentic site has been chosen in order to implement these classifications. For the training purpose, 2000 datasets have been used. Similarly, for the test purpose, 200 datasets have been used. The test showed the accuracy of 73% and 80% in the news classification using Logistic Regression and Naïve Bayes Classification respectively. Upon knowing the nature of the news, a reader can go through the content simply by clicking on the link that has been displayed along with an emoticon that specifies the nature. Despite expectations from the application to have a good impact on people, this project cannot cover all of the news sites. However, it is expected to help the public get updates such that it does not carry any negative impact on their health.
Image Enhancement means improving the quality of image over input or initial image. There are different methods in image processing which performs operation on image, in order to get enhanced image of the input image. Different image processing methods are used to remove noise, sharpen, or brighten image. Convolution Neural Networks are category of Neural Network that have proven very effective in areas such as image recognition and classification. CNNs have been successful in identifying faces, objects and traffic signs which therefore have become most important tool for machine learning. This document builds on to demonstrate how CNN can be used in digitized image to get a clean and understandable output text image from noisy ones. Convolution Neural Network along with naïve gradient descent learning algorithm is implemented which have been proven to be robust against noise.
Vocal Separations refers to the problem of trying to separate vocals from the instrumentals in a song in order to produce an acappella that contains only vocal or an instrumental track containing only the instruments. Over the years, this has been a problem of greats interest for musicians as well as researcher studying Music Information Retrieval (MIR). Karaoke Generator allows the user to quickly and efficiently get the instrumental track. All that is required by the user is to upload the chosen music file and the application will remove the vocal track and return the track without vocal. The Project was able to extract the instrumental from the song but was not 100% accurate as there were still the presence of noise from the vocal.
The field of Human-Computer Interaction (HCI) has seen a great amount of growth in the last couple of decades. People use mouse, keyboard, track pad, joysticks to communicate with the computer. Nowadays, they have used a wireless devices example wireless mouse or wireless keyboard. One disadvantage of these devices is required Bluetooth hardware attached with computer and respective software device also. The new technology touch screen comes in the market which is used in HCI (Human Computer interaction) but a cost of a touch screen is not cheap. That’s why the use of such technology is limited. The proposed system has no such disadvantage. The system goes through two phases color calibration and mouse events. Main intention of this project is to develop Human Computer Interaction device to use efficiently, where user can control mouse button click and moving pointer position using the color detection.
Digital watermarking is a technology that is used for the protection of digital media such as image, audio, video, etc. Watermark is the secret information embedded in digital media. This project deals with the digital watermarking of the photographs of the mark sheets given by the educational institutes. Duplication and copyright issues had been a major trouble in the present days. Unauthorized use of the digital media is one of the concerns for the media creators. This project only deals with watermarking the images which further helps in the authentication of the photographs. In this project an image goes through the encoding process to digitally watermark the image without changing the actual image. The original image which is in spatial domain is converted into frequency domain using FFT. The frequency of original image is manipulated using watermarked image and then IFFT is used to retrieve the new image in which watermark is embedded. Similarly, both original and embedded image are converted to frequency domain using FFT and compared to extract watermarked image. The original image was successfully embedded using watermark image. And, watermark was successfully extracted from embedded image. This project can be used for the authentication of the images which different people/ organization publish on the internet or provide to authorized person.