Car Price Prediction Using Machine Learning

2y ago
24 Views
2 Downloads
880.87 KB
7 Pages
Last View : 22d ago
Last Download : 2m ago
Upload by : Matteo Vollmer
Transcription

International Journal of Computer Sciences and Engineering Open AccessResearch PaperVol.-7, Issue-5, May 2019E-ISSN: 2347-2693Car Price Prediction Using Machine LearningAshish Chandak1*, Prajwal Ganorkar2, Shyam Sharma3, Ayushi Bagmar4, Soumya Tiwari 51,2,3,4,5Information Technology, Shri Ramdeobaba College of Engineering,Rashtrasant Tukadoji Maharaj Nagpur University, Nagpur, India*Corresponding Author: chandakav@rknec.edu, Tel.: 8237851429DOI: https://doi.org/10.26438/ijcse/v7i5.444450 Available online at: www.ijcseonline.orgAccepted: 20/May/2019, Published: 31/May/2019Abstract— Because of new computing technologies, machine learning today is not like machine learning of the past. It wasborn from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks;researchers interested in artificial intelligence wanted to see if computers could learn from data. The iterative aspect of machinelearning is important because as models are exposed to new data, they are able to independently adapt. They learn fromprevious computations to produce reliable, repeatable decisions and results. It’s a science that’s not new – but one that hasgained fresh momentum. While there is an end number of applications of machine learning in real life one of the mostprominent application is the prediction problems. There are various topics on which the prediction can be applied. One suchapplication is what this project is focused upon. Websites recommending items you might like based on previous purchases areusing machine learning to analyze your buying history – and promote other items you'd be interested in. This ability to capturedata, analyze it and use it to personalize a shopping experience (or implement a marketing campaign) is the future of retailKeywords—Environment Quality, Data Analysis, Business Intelligence, Power BI, SQL Server 2016, Air Quality, WaterQuality, Tree Cover, Forest Cover, Predictions, NLP, forecasting, k-means clustering, ARIMA.I.INTRODUCTIONFrom a long time since being, a continuous paradigm oftransactions of commodities has been into existence. Earlierthese transactions were in the form of barter system whichlater was translated into a monetary system. And withconsideration into these, all changes that were brought aboutthe pattern of re-selling items was affected as well. There aretwo ways in which the re-selling of the item is carried out.One is offline and the other being online. In offlinetransactions, there is a mediator present in between who isvery vulnerable to being corrupt and make overly profitabletransactions. The second option is online wherein there is acertain platform which lets the user find the price he mightget if he goes for selling Kilometers traveled – We know that the number ofkilometers traveled by a vehicle has a huge role toplay while putting the vehicle up for sale. The morethe vehicle has traveled, the older it is.Fiscal power – It is the power output of the vehicle.More output yields better value out of a vehicle. 2019, IJCSE All Rights Reserved Year of registration – It is the year when the vehiclewas registered with the Road Transport Authority.The newer the vehicle is; the better value it willyield. By every passing year, the value willdepreciate. Fuel Type – There were two types of fuel typespresent in the dataset that we had. Petrol and Diesel.It was relatively less dominant.It's due to the above factors that we need a system that candevelop a self-learning machine learning-based system. Thiswas the basis on which a set of objectives was supposed to beformulated. One thing that was pre-determined was that thisis going to be a real-time project.OBJECTIVE To build a supervised machine learning model forforecasting value of a vehicle based on multipleattributes The system that is being built must be feature basedi.e. feature wise prediction must be possible.444

International Journal of Computer Sciences and Engineering Providing graphical comparisons to provide a betterview.Vol.7(5), May 2019, E-ISSN: 2347-2693predict the price of a car, bike, electric vehicle and hybridvehicle. This app can predict the price of any vehicle becauseof the smartly optimized algorithm.MOTIVATIONCARWALEThe automotive industry is composed of a few top globalmultinational players and several retailers. The multinationalplayers are mainly manufacturers by trade whereas the retailmarket features players who deal in both new and usedvehicles. The used car market has demonstrated a significantgrowth in value contributing to the larger share of the overallmarket. The used car market in India accounts for nearly 3.4million vehicles per year.CarWale app is one of the top-rated car apps in India for newand used car research. It provides accurate on-road prices ofcars, genuine user and expert reviews. It can also comparedifferent cars with the car comparison tool. this app alsohelps you to connect with your nearest car dealers for thebest offers available.CARTRADEFEATURESThere will be majorly two features provided in the projectnote that this will be not Re-sale platform: A centralized platform for car resale that will predict prices. Feature selection:prediction.Feature-basedsearchandSection I contains the introduction of our module, thenobjective, motivation and features of our model, Section IIcontains Literature Review, Section III contain the varioustechnologies in machine learning, Section IV explains themethodology, section V describes the results and discussion,Section VI contains the conclusion and future work.II.LITERATURE REVIEWIn this chapter, we discuss various applications and methodswhich inspired us to build our project. We did a backgroundsurvey regarding the basic ideas of our project and used thoseideas for the collection of information like the technologicalstack, algorithms, and shortcomings of our project which ledus to build a better project.CARS24Cars24 is a web platform where seller can sell their used car.It is an Indian Start-up with a simplified user interface whichasks seller parameters like car model, kilometers traveled,year of registration and vehicle type (petrol, diesel)[1]. Theseallow the web model to run certain algorithms on givenparameters and predict the price.GET VEHICLE PRICEGet Vehicle Price is an android app which works on similarparameters as of Cars24. This app predicts vehicle prices onvarious parameter like Fiscal power, horsepower, kilometerstraveled. This app uses a machine learning approach to 2019, IJCSE All Rights ReservedCarTrade is web and Android platform where user canresearch New Cars in India by exploring Car Prices, CarSpecs, Images, Mileage, Reviews, and Car Comparisons. Onthis app one can Sell Used Car to genuine buyers with ease.One can list their used car for sale along with the details likeimage, model, and year of purchase and kilometers so that itis displayed to lakhs of interested car buyers in their city.User can read user reviews and expert car reviews withimages that help in finalizing a new car buying decisionIII.TECHNOLOGY USEDPython was the major technology used for theimplementation of machine learning concepts the reasonbeing that there are numerous inbuilt methods in the form ofpackaged libraries present in python. Following areprominent libraries/tools we used in our project.NUMPYNumPy is a general-purpose array-processing package[1]. itprovides a high-performance multidimensional array objectand tools for working with these arrays. It is the fundamentalpackage for scientific computing with Python. Besides itsobvious scientific uses, NumPy can also be used as anefficient multi-dimensional container of generic data.Arbitrary data-types can be defined using Numpy whichallows NumPy to seamlessly and speedily integrate with awide variety of databases.SCIPYSciPy is a free and open-source Python library used forscientific computing and technical computing. SciPycontains modules for optimization, linear algebra,integration, interpolation, special functions, FFT, signal andimage processing, ODE solvers and other tasks common inscience and engineering.SciPy builds on the NumPy array object and is part of theNumPy stack which includes tools like Matplotlib, pandas,445

International Journal of Computer Sciences and Engineeringand SymPy, and an expanding set of scientific computinglibraries. This NumPy stack has similar users to otherapplications such as MATLAB, GNU Octave, and Scilab.The NumPy stack is also sometimes referred to as the SciPystack[2]. The SciPy library is currently distributed under theBSD license, and its development is sponsored and supportedby an open community of developers. It is also supported byNumFOCUS, a community foundation for supportingreproducible and accessible science.SCIKIT-LEARNScikit-learn provides a range of supervised and unsupervisedlearning algorithms via a consistent interface in Python. It islicensed under a permissive simplified BSD license and isdistributed under many Linux distributions, encouragingacademic and commercial use. The library is builtJUPYTER NOTEBOOKThe Jupyter Notebook is an open-source web application thatallows you to create and share documents that contain livecode, equations, visualizations, and narrative text. It includesdata cleaning and transformation, numerical simulation,statistical modeling, data visualization, machine learning,and much more.The Jupyter Notebook is an open-source web application thatallows you to create and share documents that contain livecode, equations, visualizations and narrative text[3]. Itincludes data cleaning and transformation, numericalsimulation, statistical modeling, data visualization, machinelearning, and much more.ENTHOUGHT CANAOPYEnthought Canopy is a Python for scientific and analyticcomputing distribution and analysis environment, thispackage manager uses jypter notebook as a presentationlayer. Anaconda tries to solve the dependency hell in pythonwhere different projects have different dependency versions,so as to not make different project dependencies requiredifferent versions, which may interfere with each other.IV.METHODOLOGYIn this chapter, we discuss various algorithms and therequired dataset that were implemented to build this module.A dataset containing more than 3 lakh tuples will be used fortraining the model. Attributes such as kilometers traveled,year of registration, fuel type and fiscal power determine theworth of an automobile. Since this is a classificationproblem, we have implemented two algorithms – K NearestNeighbour (KNN) and Classification and Regression Trees 2019, IJCSE All Rights ReservedVol.7(5), May 2019, E-ISSN: 2347-2693(CART) and compared the two on different models ofvehicles.To implement these algorithms we use Enthought Canopy.Enthought Canopy is a Python for scientific and analyticcomputing distribution and analysis environment, thispackage manager uses the juypter notebook as a presentationlayer [4]. Anaconda tries to solve the dependency hell inpython where different projects have different dependencyversions, so as to not make different project dependenciesrequire different versions, which may interfere with eachother.K-MEANS ALGORITHMK-means clustering is one of the simplest and popularunsupervised machine learning algorithms. Typically,unsupervised algorithms make inferences from datasets usingonly input vectors without referring to known or labelledoutcomes. A cluster refers to a collection of data pointsaggregated together because of certain similarities. You'lldefine a target number k [5], which refers to the number ofcentroids you need in the dataset. A centroid is the imaginaryor reallocation representing the center of the cluster. Everydata point is allocated to each of the clusters by reducing thein-cluster sum of squares. In other words, the K-meansalgorithm identifies k number of centroids, and then allocatesevery data point to the nearest cluster, while keeping thecentroids as small as possible. The ‘means' in the K-meansrefers to averaging of the data; that is, finding the centroid.To process the learning data, the K-means algorithm in datamining starts with a first group of randomly selectedcentroids, which are used as the beginning points for everycluster, and then performs iterative (repetitive) calculationsto optimize the positions of the centroids It halts creating andoptimizing clusters when either: The centroids havestabilized — there is no change in their values because theclustering has been successful. The defined number ofiterations has been achieved.DECISION TREE REGRESSIONDecision tree learning uses a decision tree (as a predictivemodel) to go from observations about an item (represented inthe branches) to conclusions about the item's target value(represented in the leaves). It is one of the predictivemodeling approaches used in statistics, data mining andmachine learning. Tree models where the target variable cantake a discrete set of values are called classification trees; inthese tree structures, leaves represent class labels andbranches represent conjunctions of features that lead to thoseclass labels. Decision trees where the target variable can takecontinuous values (typically real numbers) are calledregression trees. The goal is to create a model that predictsthe value of a target variable based on several input variablesdecision tree learning is a method commonly used in datamining[6]. The goal is to create a model that predicts the446

International Journal of Computer Sciences and Engineeringvalue of a target variable based on several input variables.An example is shown in the diagram at right. Each interiornode corresponds to one of the input variables; there areedges to children for each of the possible values of that inputvariable. Each leaf represents a value of the target variablegiven the values of the input variables represented by thepath from the root to the leaf. A decision tree is a simplerepresentation for classifying examples [7] [8].V.Vol.7(5), May 2019, E-ISSN: 2347-2693-IMPLEMENTATIONIn this chapter, we discuss the steps and implementedmethods used in our module, it includes the statisticalanalysis of our dataset through various scattered graphs,violin graph, comparison charts, and bar graph to study thebest algorithm.We first perform pre-processing and data cleaning on ourdataset. We found that 15% of the tuples had null values andwe pruned those tuples. We built a heat map comparingkilometers traveled, year of registration, price and fiscalpower.The dataset was split into 80% for training and 20% fortesting. Using the Scikit learn library in python, we build theKNN (k 7) and CART models for predicting the value of avehicle. The value for the desired k was not directly decidedrather we try to run the prediction model while assumingdifferent values for k and compare them amongst themselves.The year of registration was slightly more dominant.Figure 4.2: Frequency of TOP 20 Brands DistributionFigure 4.2 represents the frequency of top 20 branddistribution in our dataset, it is observed that the Volkswagenbrand contains the most cars in the database followed byBMW, Mercedes Benz, and Audi.Figure 4.3: Scatter Plot of price and fiscal powerFigure 4.3 represents the Scatter Plot of Price and Fiscalpower, it is observed that the car price ranging between 0 to20000 gives a fiscal power of 100 to 600.Figure 4.1: Scatter Plot of price and year of modelFigure 4.1 represents a scatter plot of price and year model,it is observed that the price between 0 to 20000 contains themost used car between the years 1990 to 2010.Figure 4.4: Strip Plot of price versus fuel typeFigure 4.4 represents the strip plot of price versus fuel type,it is observed that our dataset contains mostly petrol and 2019, IJCSE All Rights Reserved447

International Journal of Computer Sciences and EngineeringVol.7(5), May 2019, E-ISSN: 2347-2693diesel cars ranging price till 20000 followed by LPG andhybrid cars.Figure 4.8: Code SnippetFigure 4.5: Correlation MatrixFigure 4.5 represents a correlation matrix between differentattributes of the dataset. Positive relationships exist betweenthe year of registration and price and an inverse relationbetween price and kilometer traveled.Figure 4.8 represents a code of KNN algorithm, here we setthe value of k 7 which results in the selection of sevenneighbors for learning the dataset and predict the value.Figure 4.9: Code Snippet 2Figure 4.6: Snapshot of the dataset headerFigure 4.6 represents a snapshot of dataset header which hasattributes after pre-processing which means data cleaninghas been performed and it does not contain any null value inthe data set.Figure 4.9 represents code for Decision tree regressionalgorithm, here we use Scikit-learn library to implementthese algorithms. This code plots a graph for a predictedvalue and residual value. It learns the dataset and thenapplies it to predict values.Figure 4.10: Residual vs Predicted comparison graphFigure 4.7: Accuracy mapping for different k-valuesFigure 4.7 represents accuracy mapping for different kvalues against root mean square. For different neighbors kvalue, it is observed that they do not deviate a lot from eachother. 2019, IJCSE All Rights ReservedFigure 4.10 represents a comparison graph between Residualvalue vs predicted value, the dotted points from 0 th linerepresent the deviation from actual against predicted fordecision regression algorithm.448

International Journal of Computer Sciences and EngineeringVol.7(5), May 2019, E-ISSN: 2347-2693Classification and Regression Trees (CART) are comparedon two different models of vehicles.We found that the root means square error for KNN with k 7 is 5581.96 and for CART is 4961.64 and actual price was4999.Figure 4.11: Residual vs Predicted graph 2Figure 4.11 represents a comparison graph between Residualvalue vs predicted value, the dotted points from 0th linerepresent the deviation from actual against predicted forKNN algorithm.Figure 5.1 KNN vs CART vs Actual PriceFigure 4.12: Different statistical aggregations in datasetFigure 4.12 represents a statistical aggregation in the dataset,describing the mean, count, standard deviation, and the minpercentage of the dataVI.CONCLUSIONFUTURE SCOPEAs a part of future work, we aim at the variable choices overthe algorithms that were used in the project. We could onlyexplore two algorithms whereas many other algorithmswhich exist and might be more accurate.More specifications will be added in a system or providingmore accuracy in terms of price in the system i.e.1) Horsepower2) Battery power3) Suspension4) Cylinder5) TorqueAs we know technologies are improving day by day andthere is also advancement in car technology also, so our nextupgrade will include hybrid cars, electric cars, and DriverlesscarsREFERENCESIn this chapter, we discuss the results and observation we didwhile implementing this module. We successfullyimplemented the machine learning algorithmic paradigmsusing prominent algorithms from libraries in python. We firstperform pre-processing and data cleaning on our dataset. Wefound that 15% of the tuples had null values and we prunedthose tuples. The results showed that there is a positivecorrelation between price and kilometers traveled, year ofregistration and kilometers traveled and a negativecorrelation between price and year of registration.Positive correlation basically relates to the concept of directproportion whereas Negative correlation relates to theconcept of inverse proportion. Three lakh tuples were usedfor training the model. The year of registration was slightlymore dominant. K Nearest Neighbour (KNN) and 2019, IJCSE All Rights Reserved[1].M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J.Cochran, Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis,D. Kumar, C. Lever, Z. Ma, J. Mason, D. Menscher, C. Seaman, N.Sullivan, K. Thomas, and Y. Zhou, "Understanding the mirai botnet,"in

International Journal of Computer Sciences and Engineering Open Access Research Paper Vol.-7, Issue-5, May 2019 E-ISSN: 2347-2693 . Shyam Sharma3, Ayushi Bagmar4, Soumya Tiwari 5 1,2,3,4,5Information Technology, Shri Ramdeobaba College of Engineering, Rashtrasant Tukadoji Maharaj Nagpur University, Nagpur, India

Related Documents:

9/8/2022 Club Car Wash Sites of Tidal Wave Express Car Wash 8 8/29/2022 Take 5 Car Wash Soft Touch Car Wash, Auto Oasis Car Wash, Clearwater Car Wash and Birdie's Car Wash 5 8/25/2022 WhiteWater Express Geaux Clean Car Wash 7 8/19/2022 ModWash Home Team Car Wash 3 8/18/2022 Splash In ECO Car Wash (Wills Group) Blue Hen Car Wash 2

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

Support vector machine (SVM) is a new technology in data mining, machine learning and artificial intelligence. It belongs to nonlinear prediction model and is suitable for the modeling and prediction of stock price fluctuation system [2-4]. Francis (2011) used the support vector machine model to realize the prediction of financial time series. He

dependent variable which is predicted, and this price is derived from factors like vehicle's model, make, city, version, color, mileage, alloy rims and power steering. General Terms Machine Learning. Keywords Multiple Linear regression, Car Price, Regression model. 1. INTRODUCTION Vehicle price prediction especially when the vehicle is used

last minute cruise deals -58.50% Car Rental Queries WoW Change car rental -43.80% rental cars -46.30% car rentals -40.60% cheap car rentals -48.00% car rentals cheapest rates -52.20% rent a car- 40.30% cheap rental cars -45.60% rental car -41.80% car rental deals -49.30% rental cars lowest price -53.90% Flight Queries WoW Change cheap flights .

B-ATSF-2 LEFT SIDE: Your choice of slogans. Box car red car w/white letters, blk & white Herald. Price Code C B-ADN- 1 Green car w/white letters, multi Herald. Price Code C B-ATSF-3 LEFT SIDE: The Route of the Super Chief. Box car red car w/white letters, blk & white Herald. Yellow DF. Price Code C

car price differences in Europe and find strong evidence that car manufacturers price dis-criminate by manipulating the menu of included car features available in each country. Such bundling decisions sustain cross-country price differences of up to 13%. Although prices adjust to shocks within a few month, relative car prices show no sign of .

your Infant Car Seat, as described in the instruction manual provided by the Infant Car Seat manufacturer. † WHEN USING ONLY ONE INFANT CAR SEAT ADAPTER OR TWO FOR TWINS, THE FOLLOWING INFANT CAR SEATS CAN BE USED: † If your Infant Car Seat is not one of the models listed above, DO NOT use your infant car seat with this car seat adapter.