Movie Recommendation System Using Content Based Filtering - Jetir

1y ago
3 Views
2 Downloads
810.84 KB
9 Pages
Last View : 2m ago
Last Download : 3m ago
Upload by : Farrah Jaffe
Transcription

2022 JETIR April 2022, Volume 9, Issue 4 www.jetir.org (ISSN-2349-5162) MOVIE RECOMMENDATION SYSTEM USING CONTENT BASED FILTERING 1 A Kiran Kumar,2Gourab Pal Chowdhury,3Sruti Bihani ,4 Rimpi Datta, 1 Electronics and Communication Engineering, Narula institute of Technology, Kolkata, India 1 Abstract: A recommendation system basically provides suggestions to the user for certain resources such as movie, books, and songs and so on. A movie recommendation system plays a very important role in our social life as it provides enhanced entertainment. It suggests a set of movie to user based on their interests. It filters or predicts a movie for the user according to the attributes that lies in the previous movies. This paper contains the content based and techniques based recommendation system. Index Terms–Recommendation system, cosine similarity, tmdb dataset, content based, streamlit , Heroku. I. INTRODUCTION Recommender systems are the systems that are designed to recommend things to the user based on many different factors. These systems predict the most likely product that the users are most likely to purchase and are of interest to. We all have used services like Netflix, Amazon, and YouTube. These services use very sophisticated systems to recommend the best items to their users to make their experiences great. Recommendation systems are becoming increasingly important in today’s extremely busy world. People are always short on time with the myriad tasks they need to accomplish in the limited 24 hours. Therefore, the recommendation systems are important as they help them make the right choices, without having to expend their cognitive resources . 3.2 Purpose and use-cases The purpose of a recommender system is to suggest relevant items to users. [1]To achieve this task, there exist two major categories of methods: 1.collaborative filtering methods 2.content based methodsMoreover, it involves a number of factors to create personalized lists of useful and interesting content specific to each user/individual. Recommendation systems are Artificial Intelligence based algorithms that skim through all possible options and create a customized list of items that are interesting and relevant to an individual. These results are based on their profile, search/browsing history, what other people with simi lar traits/demographics are watching, and how likely are you to watch those movies. This is achieved through predictive modeling and heuristics with the data available. While surfing on the web or looking at your favorite site, you might have noticed some usual messages or suggestions like: Products you may like People are also searching for Customers who bought this also bought All these are examples of personalization that brands offer to their valuable customers. And how do they do it? Simple, with a recommendation system JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c93

2022 JETIR April 2022, Volume 9, Issue 4 The Recommendation system can be classified mainly into two types: www.jetir.org (ISSN-2349-5162) (1)Contentbased filtering-The point of content-based filtering system is to know the content of both user and item. Usually it constructs and then compare user-profile and item-profile using the content of shared attribute space. For example, for a movie, you represent it with the movie stars in it and the genres (using a binary coding for example). For user profile, you can do the same thing based on the users likes some movie stars/genres etc (2) Collaborative filtering-Collaborative algorithm uses “User Behavior” for recommending items. They exploit behavior of other users and items in terms of transaction history, ratings, selection and purchase information. Other users behavior and preferences over the items are used to recommend items to the new users. In this case, features of the items are not known. 3.3 Content based filtering The content-based approach uses additional information about users and/or items. [2]This filtering method uses item features to recommend other items similar to what the user likes and also based on their previous actions or explicit feedback. If we consider the example for a movies recommender system, the additional information can be, the age, the sex, the job or any other personal information for users as well as the category, the main actors, the duration or other characteristics for the movies i.e the items. The main idea of content-based methods is to try to build a model, based on the available “features”, that explain the observed useritem interactions. Still considering users and movies, we can also create the model in such a way that it could provide us with an insight into why so is happening. Such a model helps us in making new predictions for a user pretty easily, with just a look at the profile of this user and based on its information, to determine relevant movies to suggest. We can make use of a Utility Matrix for Content-Based Methods. A Utility Matrix can help signify the user’s preference for certain items. With the data gathered from the user, we can find a relation between the items which are liked by the user as well as those which are disliked, for this purpose the utility matrix can be put to best use. We assign a particular value to each user-item pair, this value is known as the degree of preference and a matrix of the user is drawn with the respective items to identify their preference relationship. For example-Suppose I am a fan of the Harry Potter series and watch only such kinds of movies on the internet. When my data will be gathered from Google or Wikipedia, it will be found out that I am a fan of fantasy movies. Therefore, my recommendation will be filled with fantasy movies. Among all the movies, the ones best for me will be curates and then recommended to me. Challenges with content based filtering Content-based methods seem to suffer far less from the cold start problem than collaborative approaches because new users or items can be described by their characteristics i.e the content and so relevant suggestions can be done for these new entities. Only new users or items with previously unseen features will logically suffer from this drawback, but once the system is trained enough, this has little to no chance to happen. Basically, it hypothesizes that if a user was interested in an item in the past, they will once again be interested in the same thing in the future. Similar items are usually grouped based on their features. User profiles are constructed using historical interactions or by explicitly asking users about their interests. There are other systems, not considered purely content-based, which utilize user personal and social data. 3.4 Collaborative filtering The Collaborative filtering method for recommender systems is a method that is solely based on the past interactions that have been recorded between users and items, in order to produce new recommendations. Collaborative Filtering tends to find what similar users would like and the recommendations to be provided and in order to classify the users into clusters of similar types and recommend each user according to the preference of its cluster. The main idea that governs the collaborative methods is that through past user-item interactions when processed through the system, it becomes sufficient to detect similar users or similar items to make predictions based on these estimated facts and insights. Such memory-based approaches directly work with the values of recorded interactions or data and are essentially core based on nearest neighbors search, i.e. finding the closest users from a user of interest and suggest the most popular items among these neighbors. The created model approaches assuming there is an underlying “generative” insight that explains the user-item interactions and tries to discover it in order to make new predictions. It recommends an item to user A based on the interests of a similar user B. Furthermore, the embeddings can be learned automatically, without relying on hand-engineering of features. The collaborative filtering method does not need the features of the items to be given. Every user and item is described by a feature vector or embedding. The standard method used by Collaborative Filtering is known as the Nearest Neighborhood algorithm. There are several types of filtering such as user-based and Item-based Collaborative Filtering. Considering an example of User-based Collaborative Filtering, If we have an n m matrix of ratings, with user u, i 1 n, and item p, j 1 m. and we want to predict the rating r if the target JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c94

2022 JETIR April 2022, Volume 9, Issue 4 www.jetir.org (ISSN-2349-5162) user i did not watch/rate an item j. The process is to calculate the similarities between target user i and all other users will be to select the top X similar users and take the weighted average of ratings from these X users with similarities as weights. For example, the system can identify all of the products a customer and users with similar behaviors have purchased and/or positively rated. Challenges with collaborative filtering The only issue with this method is that the prediction of the model for a given user, item pair is the dot product of the corresponding embeddings. So, if an item is not seen during training, the system cannot generally create an embedding for it and hence cannot query the model with this item. This issue is known as the cold-start problem. 3.5 Theoretical framework This paper presents a framework for presenting, developing and evaluating a recommendation system. Recommendation systems are a subclass of information systems whose aim is to provide the most relevant information to a user by discovering patterns in a dataset. The main two things in recommendation system are dataset and an algorithm to perform that work. There are several recommendation algorithms but here we have used Cosine Similarity algorithm. A data set is a collection of data pieces that can be treated by a computer as a single unit for analytic and prediction purposes. Here we use a dataset from Kaggle. Cosine similarity is a metric that measures the cosine of the angle between two vectors projected in a multi-dimensional space. Mathematically we can say that the division between the dot product of vectors and the product of the Euclidean norms or magnitude of each vector. We modified the previous algorithms and we use a list of movies instead of genre combinations. To perform all this we need a platform and a technology for controlling it. As a technology here we used python and one of its libraries Streamlit for creating a platform. Python is a computer programming language often used to build websites and software, automate tasks, and conduct data analysis. sit is high-level, general-purpose programming language. Streamlit is an open-source python library for creating and sharing web apps for data science and machine learning projects. Algorithm used and Equations The algorithm used in this movie recommendation system is The Cosine Similarity.[3]Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It is defined to equal the cosine of the angle between them, which is also the same as the inner product of the same vectors normalized to both have length. We can use the Cosine Similarity algorithm to work out the similarity between two things. We might then use the computed similarity as part of a recommendation query. For example, to get movie recommendations based on the preferences of users who have given similar ratings to other movies that you’ve seen. Cosine similarity computes the L2-normalized dot product of vectors. That is, if x and y are row vectors, their cosine similarity k is defined as cos(X, Y) X . Y / X * Y This is called cosine similarity, because Euclidean (L2) normalization projects the vectors onto the unit sphere, and their dot product is then the cosine of the angle between the points denoted by the vectors. This kernel is a popular choice for computing the similarity of documents represented as tf-idf vectors. Cosine similarity accepts scipy.sparse matrices. (Note that the tf-idf functionality in sklearn.feature extraction.text can produce normalized vectors, in which case Cosine similarity is equivalent to linear kernel, only slower.) JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c95

2022 JETIR April 2022, Volume 9, Issue 4 I. www.jetir.org (ISSN-2349-5162) RESEARCH METHODOLOGY The methodology section outline the plan and method that how the study is conducted. This includes Universe of the study, sample of the study,Data and Sources of Data, study’s variables and analytical framework. The detailsare as follows; 3.1Data collection The dataset for the project is the tmdb dataset which is available on kaggle. The dataset contains columns which are :Budget-The budget of the film Genre-Genre of the film whether it is action, comedy etc. Homepage- movie website Id- movie id Keywords Original language- in which language the movie is based on Original title- title of the movie Overview- a short summary of the movie Popularity-the popularity of the movie Production companies Production countries Release date Revenue Runtime Spoken languages Status Tagline Title Vote average Vote count This is the first step in any machine learning problem i.e., the data collection. After collection of the data, the data is imported in the jupyter notebook via pandas library of python 3.2Exploratory data analysis It is the process in which the data is analyzed to get the summarization of the main characteristics of the data. It helps us to get more insights of the data and how one column is correlated with the other columns. It also helps us to determine if the statistical techniques for data analysis are correct or not. The tools which are used for data visualization are seaborn and matplotlib, which are the inbuilt libraries of python The main visualization techniques which we use areUnivariate analysis In univariate analysis, the summary statistics of all the features are performed separately and we can analyze each features by their distributions whether it is normal, left-skewed or right-skewed JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c96

2022 JETIR April 2022, Volume 9, Issue 4 www.jetir.org (ISSN-2349-5162) Multivariate analysis In multivariate analysis, we can determine how each features are correlated with each other and also how each features are related to the target feature, if two features are highly correlated with each other or with target variable then any one of the feature can be dropped or deleted as it won’t help much because both of them are equally contributing to the target feature. The libraries like seaborn and matplotlib makes us easier to do the exploratory data analysis and data visualization 3.3Feature Engineering It is an important part in machine learning where we check the number of missing values in the data and the decisions such as removing the missing values or filling the missing values with mean, median are taken in this part. In order to make model work on unknown data it is very necessary to train the data well and it can also produce new features with the goal of speeding up data transformations and increasing the accuracy. Feature engineering is very much essential in machine learning as a terrible feature in the data will impact the model. By the process of feature engineering , new artificial features are designed into an algorithm, this is because the new artificial features can improve the model performance Outlier detection An outlier is a value which lies far away from the other values in a random sample from the population. Due to this outliers the mean of the sample gets deviated resulting in a skewed distribution and due to this the model Performance also gets affected.However, in some cases outliers also play a vital role in model’s prediction(example of such a case is credit card fraud detection).The linear regression model is more susceptible to outliers. There are some methods to handle the outliers: Removal- As the name suggests in this method the outliers are removed ,but in some cases if the outliers are removed there will be a lot of data which will be lost which will result in decrease In the model accuracy. Capping- Using a random value from distribution to replace the maximum and minimum values. Discretization- This is the most used method for handling outliers and as the name suggests it actually converts Continues variables, models or functions into discrete ones. This part is actually done by making bins within a specific range. 3.4 Modeling In this step first the data is split into training and testing set and then the model is trained using the train set and the testing of the model is done using the test set. In this project the cosine similarity algorithm used which is used to work out the similarity between two things. 3.5 Evaluation After training the model using the train set and testing the model using test set , the accuracy is calculated which is the difference between the predictions and the original values. This accuracy of the model can be increased by evaluating the model using hyperparameter tuning. Hyperparameter tuning refers to selecting the best parameters for the model so that the accuracy of the model increases but it does not overfits JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c97

2022 JETIR April 2022, Volume 9, Issue 4 www.jetir.org (ISSN-2349-5162) Hyperparameter tuning can be done in two methods: 1 GridSearchCV- It helps to loop through predefined hyperparameters and fit your estimator (model) on your training set. So, in the end, you can select the best parameters from the listed hyperparameter. 2RandomizedSearchCV- In randomizedsearchcv, instead of providing a discrete set of values to explore on each hyperparameter, we provide a statistical distribution or list of hyper parameters. Values for the different hyper parameters are picked up at random from this distribution. 3.6 Website After saving and dumping the model by pickling, a frontend website is made using Streamlit library of python. Streamlit is an open source python library for creating web apps specially made for data science and machine learning projects. To run a streamlit python file the following command is used: streamlit run filename.py 3.7 Deployment The website is hosted on the Heroku cloud platform so that the link is live and everyone has access to it To deploy a website following steps are followed: (1) (2) (3) (4) Run the streamlit app locally Create a github repository Create a requirements.txt, setup.sh and Procfile Connect to Heroku and execute the commands written over there and after successful execution of all commands the website link will be generated which can be shared to everyone JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c98

2022 JETIR April 2022, Volume 9, Issue 4 The Flowchart of the above processes is given below www.jetir.org (ISSN-2349-5162) Importing the dataset from the kaggle Loading and reading the dataset Exploratory Data Analysis and Feature Engineering Removing the outliers from the dataset Splitting and modelling the data Predicting the test data and comparing it with the original test data Evaluation of the model Saving and exporting the data Making a website where the we can see the practical application of the project Deploying the model on the Heroku Cloud platform JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c99

2022 JETIR April 2022, Volume 9, Issue 4 www.jetir.org (ISSN-2349-5162) IV. RESULTS AND SIMULATION 4.1 Snapshots of the frontend web page Page-1(Before Prediction) Page-2(After prediction) Simulation process Step-1: Click on the link the website will load and display the page of the movie recommendation system. Step-2: Go to the search bar and type the movie name or you can click on the suggestion of the movie name Step-3: After selecting the movie click on the related movies Now, Five similar related movies will be displayed based on the choice of the movie you have given in the input. Conclusion Movie Recommender System plays a very important role for users to recommend the movie based on the choice of the genre of the movies the users like.It is very powerful to extract valuable information which benefits both the business and users.In the future it will continue to be researched and developed more to bring a good experience to the users. This movie recommender system is based JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c100

2022 JETIR April 2022, Volume 9, Issue 4 www.jetir.org (ISSN-2349-5162) on content based recommendation system and based on the user’s previous choice and their rating movies are recommended to the user.This actually helps in improving the accuracy of the model more and more. REFERENCES [1] Ashrita Kashyap, 2020, A Movie Recommender System: MOVREC using Machine Learning Techniques [2]Rahul pradhan,2021, A Study on Movie Recommendations using Collaborative Filtering [3] Faisal Rahutomo,2014, Semantic Cosine Similarity JETIR2204215 Journal of Emerging Technologies and Innovative Research (JETIR) www.jetir.org c101

content-based, which utilize user personal and social data. 3.4 Collaborative filtering The Collaborative filtering method for recommender systems is a method that is solely based on the past interactions that have been recorded between users and items, in order to produce new recommendations. Collaborative Filtering tends to find what similar

Related Documents:

The Italian Job showtimes at an AMC movie theater near you. Get movie times, watch trailers and buy tickets. All Hindi . job full movie hindi dubbed 300mb, italian job full movie in hindi dubbed dailymotion, hollywood movie hindi dubbed the italian job, italian movie in hindi dubbed list, the italian j

First Steps in Windows Movie Maker This Windows Movie Maker tutorial will also show you how to import pictures to begin your movie. 1. Movie Task View Links to the various tasks to create your movie. 2. Collections View A list of all imported components for your movie - photos, videos or sounds. 3. The Preview Screen 4.

Windows Movie Maker. This section includes an explanation of important concepts, system requirements, supported file types, and a list of shortcut keys available in Windows Movie Maker. Understanding the Windows Movie Maker interface. Describes the different elements in the Windows Movie Maker user interface. Using Windows Movie Maker.

2003: 4.) However, a successful movie poster should still be able to convey the general message of the movie as well as the emotions the movie itself conveys on the screen. The viewer needs to be able to look at the poster and relate to the movie through it. A Science Fiction movie should preferably, although not exclusively, attract Sci-Fi fans.

Windows Movie Maker Tutorial 9 Making the Movie 1) Select File / Save Movie File and use the wizard to save the movie file to your local computer, perhaps your hard drive or USB key drive. Select My Computer as the movie setting . Microsoft Word - wm_tutorial.doc Author:

Recommendation: All NUIC vehicles to have a licensedo perator 178 Recommendation 50. 179 Recommendation: Requirements for being a NUIC operator 179 Good repute 180 Financial standing 180 An establishment in Great Britain 180 Demonstrating professional competence 181 Recommendation 51. 182 Recommendation 52. 182 Recommendation 53. 182

CS 229 Project Final Writeup Shujia Liang, Lily Liu, Tianyi Liu December 4, 2018 Introduction We use machine learning to build a personalized movie scoring and recommendation system based on user’s previous movie ratings. Di erent people have di erent taste in movies, and this is not re ec

necessity towards booking a ticket to a movie is possible with easier manner. This study focused on how far the customer prefers online movie booking facility and their satisfaction level. Keywords: Internet access, Preference and Satisfaction. I. INTRODUCTION Online movie ticket booking system is based on Internet. By this methodology the movie