Credit Card Fraud Detection Using Machine Learning

2y ago
31 Views
2 Downloads
253.20 KB
5 Pages
Last View : Today
Last Download : 3m ago
Upload by : Jewel Payne
Transcription

IJSART - Volume 7 Issue 5 – MAY 2021ISSN [ONLINE]: 2395-1052Credit Card Fraud Detection Using Machine LearningAlgorithmsDr.A.S.MuthanandhaMurugavel1, P.Jeevitha2, R.Rajana3, R.M. Danushree41Assistant Professor, Dept of Information Technology2, 3, 4Dept of Information Technology1, 2, 3, 4Dr.Mahalingam College of Engineering and Technology, Pollachi .Abstract- Credit card frauds are easy and friendly targets. Itrefers to loss of sensitive credit card information. E-commerceand many other online sites have increased the online paymentmodes, increasing the risk for online frauds. Many machinelearning algorithms can be used for detection of fraud. Thisresearch shows several algorithms that can be used forclassifying transactions as fraud or genuine one. Credit CardFraud Detection dataset was used in the research. The mainaim of the paper is to design and develop a fraud detectionmethod for Transaction Data by analysing the pasttransaction details of the customers. This paper investigatesand checks the performance of Decision tree, An ArtificialNeural Network (ANN), XG Boost and Logistic Regressionalgorithms on highly skewed credit card fraud dataset. Theresults indicate about the accuracy for Decision tree, AnArtificial Neural Network (ANN), XG Boost and LogisticRegression algorithms classifiers are 90.6, 88.3, 96.2 and 97.5respectively.Credit Card Fraud Detection identifies the transactions that arefraudulent into two classes of legit class and fraud classtransactions. They are several techniques that are designed andimplemented to solve to credit card fraud detection by usingmany techniques such as genetic algorithm, migrating birdsoptimization algorithm, local outlier factor. Machine LearningAlgorithms like, Isolation Forest Algorithm, Forest ArtificialNeural Network , Fuzzy Logic , Genetic Algorithm , LogisticRegression ,Decision Tree , Support Vector Machines,Bayesian Networks , Hidden Markov Model ,K-NearestNeighbour. These algorithms are employed to analyse all theauthorized transactions and report the suspicious ones. Thesereports are investigated by professionals who contact thecardholders to confirm if the transaction was genuine orfraudulent. The investigators provide a feedback to theautomated system which is used to train and update thealgorithm to eventually improve the fraud-detectionperformance over time. In the end of this paper, concludesabout results of algorithms are made and collated.Keywords- Credit Card fraud detection, logistic regression,XGBoost, Decision tree, ANNI. INTRODUCTIONCredit Card Fraud Transactions are unauthorized andunwanted usage of an account by someone other than theowner of that account. In Today’s world high dependency oninternet technology has enjoyed increased credit cardtransactions but credit card fraud had also accelerated asonline and offline transaction. Credit card frauds are easytargets. Without any risks, a significant amount can bewithdrawn without the owner’s knowledge, in a short period.Fraudsters always try to make every fraudulent transactionlegitimate, which makes fraud detection very challenging anddifficult task to detect. Necessary prevention measures canbetaken to stop this abuse and the behaviour of suchfraudulent practices can be studied to minimize it and protectagainst similar occurrences in the future. There are many frauddetection solutions and software which prevent frauds inbusinesses such as credit card, retail, e-Commerce, Insurance,and Industries.Page 80Figure 1II. LITERATURE SURVEYMultiple Supervised and Semi-Supervised machinelearning techniques are used for fraud detection , In this paperwe have compared certain machine learning algorithms fordetection of fraudulent transaction and find accuracy of eachalgorithms.Many Supervised machine learning algorithms likewww.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021Isolation Forest Algorithm, Forest Artificial Neural Network ,Fuzzy Logic , Genetic Algorithm , Logistic Regression,Decision Tree , Support Vector Machines ,BayesianNetworks , Hidden Markov Model ,K-Nearest Neighbour areused to detect fraudulent transactions in real-time datasets.Feedback mechanism to solve the problem of concept drift. Byusing Naïve Bayes classifier the size of the training dataset isaggregate model when compare to other model. Two methodsunder random forests are used to train the large forbehavioural features of normal and abnormal transactions.They are Random-tree-based random forest andCART-based. Even though random forest obtains good resultson small set data, there are still some problems in case ofimbalanced data. The future work will focus on solving theabove-mentioned problem.The algorithm of the random forest itself should beimproved. Performance of Logistic Regression, K-NearestNeighbour, and Naïve Bayes are analysed on highly skewedcredit card fraud data where Research is carried out onexamining meta-classifiers and meta-learning approaches inhandling highly imbalanced credit card fraud data. Throughsupervised learning methods can be used there may fail atcertain cases of detecting the fraud cases.ISSN [ONLINE]: 2395-1052IV. METHODOLOGY4.1 Reading dataset and preprocessing Using read. csv function, the csv dataset credit card isread into the dataframe variable creditcard data. We have used head function and tail function todisplay the first 5 rows and last five rows.We have data composed of attributes with varyingscale. So we have to re-scale the attributes to samerange.we will scale our data using the scale() function. Wewill apply this to the amount component of ourcreditcard data amount. Scaling is also known asfeature standardization. With the help of scaling, thedata is structured according to a specified range.Therefore, there are no extreme values in our datasetthat might interfere with the functioning of ourmodel.scale() function in R Langauge is a generic functionwhich centers and scales the columns of a numericmatrix. The center parameter takes either numeric alikevector or logical value. If the numeric vector is provided, then eachcolumn of the matrix has the correspondingvalue from center subtracted from it. III. DATASET DESCRIPTIONThe dataset that is used in this paper is obtained fromKaggle .The datasets contains transactions made by creditcards in September 2013 by European cardholders.This dataset presents transactions that occurred intwo days, where we have 492 frauds out of 284,807transactions. The dataset is highly unbalanced, the positiveclass (frauds) account for 0.172% of all transactions.It contains only numerical input variables which arethe result of a PCA transformation. Unfortunately, due toconfidentiality issues, we can use an Artificial neural network(ANN)to provide the original features and more backgroundinformation about the data. Features V1, V2, V28 are theprincipal components obtained with PCA, the only featureswhich have not been transformed with PCA are 'Time' and'Amount'. Feature 'Time' contains the seconds elapsed betweeneach transaction and the first transaction in the dataset. Thefeature 'Amount' is the transaction Amount, this feature can beused for example-dependant cost-sensitive learning. Feature'Class' is the response variable and it takes value 1 in case offraud and 0 otherwise.Page 81Figure 2 Reading dataset4.2 Splitting into training and testing sets We split the dataframe into training set and testing setusing the function sample.split.sample.split function splits the data using given ratioin our case 80% for training set and 20% for testingset.www.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021 4.3ISSN [ONLINE]: 2395-1052Dim() is a function returns the dimension of thetraining set and testing set.The set. seed() function sets the starting number usedto generate a sequence of random numbers – itensures that you get the same result if you start withthat same seed each time you run the same process.Training the modelFigure 6 Training the dataset for XGBoost algorithm The model is trained using four different machineLearning algorithms namely,Xgboost, decision tree,Artificial Neural networks (ANN) and logicalregression algorithms.Glm() is used to fit generalized linear models,specified by giving a symbolic description of thelinear predictor and a description of the errordistribution.The parameter family provides the convenient way tospecify the details of models used by functionsRPART -Recursive Partitioning And RegressionTrees.predict () is used to returns vector of predictedresponses of Rpart object. Figure 3 Training the dataset for decision tree algorithmFigure 4 Training the dataset for logistic regression algorithmThe approach that this paper proposes, uses the latestmachine learning algorithms like Decision tree, An ArtificialNeural Network (ANN), XGBoost and Logistic Regressionalgorithms. First of all, we obtained our dataset from Kaggle, adata analysis website which provides datasets where we have492 frauds out of 284,807 transactions.In this section of the Rdata science project,using read. csv function, the csv datasetcredit card is read into the dataframe variable creditcard data.We have used head function and tail function to display thefirst 5 rows and last five rows. Functions like table(),summary(), names(), var(), sd() are also used.We have datacomposed of attributes with varying scale. So we have to rescale the attributes to same range.To accomplish that we usescale().scale() function in R Langauge is a generic functionwhich centers and scales the columns of a numeric matrix. Thecenter parameter takes either numeric alike vector or logicalvalue. If the numeric vector is provided, then each column ofthe matrix has the corresponding value from center subtractedfrom it.We split the dataframe into training set and testing setusing the function sample.split().This sample.split() function splits the data usinggiven ratio in our case 80% for training set and 20% fortesting set.Dim() is a function returns the dimension of thetraining set and testing set.The set. seed() function sets thestartingnumberusedtogenerateasequenceof random numbers .It ensures that you get the same result if you startwith that same seed each time you run the same process.Themodel is trained using four different machine Learningalgorithms namely, XGBoost, decision tree, Artificial Neuralnetworks (ANN) and logical regression algorithms.Figure 5 Training the dataset for ANN algorithmPage 82www.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021ISSN [ONLINE]: 2395-1052Figure 9 AUC for ANN algorithmFigure 7 decision treeGlm() is used to fit generalized linear models,specified by giving a symbolic description of the linearpredictor and a description of the error distribution.Theparameter family provides the convenient way to specify thedetails of models used by functions. predict () is used toreturns vector of predicted responses of Rpart object.Thisfunction creates Receiver Operating Characteristic (ROC)plots for one or more models.A ROC curve plots the false alarm rate against the hitrate for a probabilistic forecast for a range of thresholds. Thearea under the curve is viewed as a measure of aaccuracy.Higher the AUC, the better the model is at predicting0s as 0s and 1s as 1s.This are the process takes place in ourproject to calculate the accuracy level.Figure 10 AUC for Logistic regression algorithmFigure 11 AUC for Decision tree algorithmV. RESULTFigure 8 AUC for XGBoost algorithmPage 83Since the entire dataset consists of transactionrecords, its only a fraction of data that can be made available ifthis project were to be used on a commercial scale. Where thewww.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021ROC curve is plotted with TPR against the FPR where TPR ison the y-axis and FPR is on the x-axis.Being based onmachine learning algorithms, the program will only increaseits efficiency over time as more data is put into it.AUC for our model isa.b.c.d.An artificial neural network (ANN) -88.3XGBoost algorithm -96.2Logistic regression algorithm -97.5Decision tree algorithm -90.27Figure 12 Comparision of algorithmVI. CONCLUSIONCredit card fraud lead to loss of sensitive informationand its considered as act of criminal dishonesty. This articlehas listed out the most common methods of fraud along withtheir detection methods. This paper has also explained indetail, how machine learning can be applied to get betterresults in fraud detection along with the ndexperimentation results. This paper proves that the LogisticRegression algorithm reach over 97.5% accuracy which ishigher than XGBoost(96.2),Decision tree(90.27) and anArtificial neural network (ANN) (88.3).This high percentageof accuracy occurredin spite of imbalance between the numberof valid and number of genuine transactions in the dataset.Being based on machine learning algorithms, the program willonly increase its efficiency over time as more data is put intoit.We finally observed that logistic regression gave betterresults.ISSN [ONLINE]: 2395-1052[2] “Credit Card Fraud Detection: A Realistic Modeling and aNovelLearning Strategy” published by IEEE transactionsonneural networks and learning systems, vol. 29, no.8,august 2018[3] Sam maes,Karl Tuyls,Bram Vanschoenwinkel. “CreditCard Fraud Detection Using Pipeling and EnsembleLearning ”.researchgate.net ICITETM 2020[4] Toluwase Ayobami Olowookere,Olumide SundayAdenwale. “A Framework for Detecting Credit CardFraud withCost- Sensitive Meta- LearningEnsembleApproach”. Elsevier.com/locatw/sciaf 2020[5] Dejan Varmedja, Mirjana Karanovic, Srdjan,” CreditCard Fraud Detection - Machine Learning Methods”INFOTEH-JAHORINA, 20-22 March 2019[6] Philip K. Chan, Salvatore J. Stolfo,” Credit Card FraudDetection using Non-uniform Class and CostDistributions” Florida Institute of Technology 2019[7] Vaishnavi Nath Dornadulaa, Geetha Sa ,” Credit CardFraud Detection using Machine Learning TAC 2019[8] S. Dhankhad, B. Far, E. A. Mohammed, “SupervisedMachine Learning Algorithms for Credit Card FraudulentTransaction Detection: A Comparative Study”, 2018IEEE International Conference on Information Reuse andIntegration (IRI) pp. 122-125. IEEE.[9] “Credit Card Fraud Detection Based on TransactionBehaviour –by John Richard D. Kho, Larry A. Vea”published by Proc. of the 2017 IEEE Region 10Conference (TENCON), Malaysia, November 5-8,2017[10] J. O. Awoyemi, A. O. Adentumbi, S. A. Oluwadare,“Credit card frauddetection using Machine LearningTechniques: A ComparativeAnalysis”, ComputingNetworking and Informatics (ICCNI), 2017InternationalConference on pp. 1-9. IEEE.[11] N. Malini, Dr. M. Pushpa, “Analysis on Credit CardFraud IdentificationTechniques based on KNN andOutlier Detection“, Advances inElectrical, Electronics,Information, Communication and Bio-Informatics(AEEICB), 2017 Third International Conference on pp.255-258. IEEE.REFERENCES[1] Jiang, Changjun et al. “Credit Card Fraud Detection: ANovel Approach Using Aggregation Strategy andFeedback Mechanism.” IEEE Internet of Things Journal 5(2018): 3637-3647.Page 84www.ijsart.com

II. LITERATURE SURVEY Multiple Supervised and Semi-Supervised machine learning techniques are used for fraud detection , In this paper we have compared certain machine learning algorithms for detection of fraudulent transaction and find accuracy of each algorithms.Many Supervised

Related Documents:

Card Fraud 11 Unauthorised debit, credit and other payment card fraud 12 Remote purchase (Card-not-present) fraud 15 Counterfeit Card Fraud 17 Lost and Stolen Card Fraud 18 Card ID theft 20 Card not-received fraud 22 Internet/e-commerce card fraud los

Hidden Markov Model (HMM), Advanced Hidden Markov Model (AHMM), Hill Climb, and credit card fraud detection . 1. Introduction . an HMM-based credit card fraud detection system that . An unauthorized account movement by a person for whom the account was not be set to can be referred as credit card fraud.

Types of economic crime/fraud experienced Customer fraud was introduced as a category for the first time in our 2018 survey. It refers to fraud committed by the end-user and comprises economic crimes such as mortgage fraud, credit card fraud, claims fraud, cheque fraud, ID fraud and similar fraud types. Source: PwC analysis 2

Types of economic crime/fraud experienced Customer fraud was introduced as a category for the first time in our 2018 survey. It refers to fraud committed by the end-user and comprises economic crimes such as mortgage fraud, credit card fraud, claims fraud, cheque fraud, ID fraud and similar fraud types. Source: PwC analysis 2

TECHNIQUES FOR DETECTING CREDIT CARD FRAUD 1. Hidden Markov Model (HMM) Hidden Markov Model is the simplest models which can be used to model sequential data. In markov models, the state . Application of Neural Network Model as Credit Card Fraud Detection Method There is a fixed pattern to how credit-card owners consume their credit-card on .

required to have the Credit Card Credit permission to access the Apply Credit Card Credit. The patient transactions that appear in the Credit Card Credit page are limited to charges with a credit card payment. This can be any credit card payment type, not just Auto CC. To apply a credit card credit: 1.

Based on Hidden Markov Model The credit card fraud detection system is not require fraud signatures and still it is able where to sense frauds just by accepting in mind a cardholder's spending routine [9]. The exacting of acquired items in single transactions is broadly unidentified to any Credit card Fraud Detection System management .

Detection of Fraud Schemes Fraud is much more likely to be detected by tips than by any other method. 2012 Association of Certified Fraud Examiners, Inc. 26 Detection of Occupational Frauds 2012 Association of Certified Fraud Examiners, Inc. 27 Why Employees Do Not Report Fraud According to a Business Ethics Study (Association of Certified Fraud Examiners), employees do not .