Credit Card Fraud Detection Using Machine Learning

2y ago

31 Views

2 Downloads

253.20 KB

5 Pages

Last View : Today

Last Download : 3m ago

Upload by : Jewel Payne

Report this link

Download PDF

Transcription

IJSART - Volume 7 Issue 5 – MAY 2021ISSN [ONLINE]: 2395-1052Credit Card Fraud Detection Using Machine LearningAlgorithmsDr.A.S.MuthanandhaMurugavel1, P.Jeevitha2, R.Rajana3, R.M. Danushree41Assistant Professor, Dept of Information Technology2, 3, 4Dept of Information Technology1, 2, 3, 4Dr.Mahalingam College of Engineering and Technology, Pollachi .Abstract- Credit card frauds are easy and friendly targets. Itrefers to loss of sensitive credit card information. E-commerceand many other online sites have increased the online paymentmodes, increasing the risk for online frauds. Many machinelearning algorithms can be used for detection of fraud. Thisresearch shows several algorithms that can be used forclassifying transactions as fraud or genuine one. Credit CardFraud Detection dataset was used in the research. The mainaim of the paper is to design and develop a fraud detectionmethod for Transaction Data by analysing the pasttransaction details of the customers. This paper investigatesand checks the performance of Decision tree, An ArtificialNeural Network (ANN), XG Boost and Logistic Regressionalgorithms on highly skewed credit card fraud dataset. Theresults indicate about the accuracy for Decision tree, AnArtificial Neural Network (ANN), XG Boost and LogisticRegression algorithms classifiers are 90.6, 88.3, 96.2 and 97.5respectively.Credit Card Fraud Detection identifies the transactions that arefraudulent into two classes of legit class and fraud classtransactions. They are several techniques that are designed andimplemented to solve to credit card fraud detection by usingmany techniques such as genetic algorithm, migrating birdsoptimization algorithm, local outlier factor. Machine LearningAlgorithms like, Isolation Forest Algorithm, Forest ArtificialNeural Network , Fuzzy Logic , Genetic Algorithm , LogisticRegression ,Decision Tree , Support Vector Machines,Bayesian Networks , Hidden Markov Model ,K-NearestNeighbour. These algorithms are employed to analyse all theauthorized transactions and report the suspicious ones. Thesereports are investigated by professionals who contact thecardholders to confirm if the transaction was genuine orfraudulent. The investigators provide a feedback to theautomated system which is used to train and update thealgorithm to eventually improve the fraud-detectionperformance over time. In the end of this paper, concludesabout results of algorithms are made and collated.Keywords- Credit Card fraud detection, logistic regression,XGBoost, Decision tree, ANNI. INTRODUCTIONCredit Card Fraud Transactions are unauthorized andunwanted usage of an account by someone other than theowner of that account. In Today’s world high dependency oninternet technology has enjoyed increased credit cardtransactions but credit card fraud had also accelerated asonline and offline transaction. Credit card frauds are easytargets. Without any risks, a significant amount can bewithdrawn without the owner’s knowledge, in a short period.Fraudsters always try to make every fraudulent transactionlegitimate, which makes fraud detection very challenging anddifficult task to detect. Necessary prevention measures canbetaken to stop this abuse and the behaviour of suchfraudulent practices can be studied to minimize it and protectagainst similar occurrences in the future. There are many frauddetection solutions and software which prevent frauds inbusinesses such as credit card, retail, e-Commerce, Insurance,and Industries.Page 80Figure 1II. LITERATURE SURVEYMultiple Supervised and Semi-Supervised machinelearning techniques are used for fraud detection , In this paperwe have compared certain machine learning algorithms fordetection of fraudulent transaction and find accuracy of eachalgorithms.Many Supervised machine learning algorithms likewww.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021Isolation Forest Algorithm, Forest Artificial Neural Network ,Fuzzy Logic , Genetic Algorithm , Logistic Regression,Decision Tree , Support Vector Machines ,BayesianNetworks , Hidden Markov Model ,K-Nearest Neighbour areused to detect fraudulent transactions in real-time datasets.Feedback mechanism to solve the problem of concept drift. Byusing Naïve Bayes classifier the size of the training dataset isaggregate model when compare to other model. Two methodsunder random forests are used to train the large forbehavioural features of normal and abnormal transactions.They are Random-tree-based random forest andCART-based. Even though random forest obtains good resultson small set data, there are still some problems in case ofimbalanced data. The future work will focus on solving theabove-mentioned problem.The algorithm of the random forest itself should beimproved. Performance of Logistic Regression, K-NearestNeighbour, and Naïve Bayes are analysed on highly skewedcredit card fraud data where Research is carried out onexamining meta-classifiers and meta-learning approaches inhandling highly imbalanced credit card fraud data. Throughsupervised learning methods can be used there may fail atcertain cases of detecting the fraud cases.ISSN [ONLINE]: 2395-1052IV. METHODOLOGY4.1 Reading dataset and preprocessing Using read. csv function, the csv dataset credit card isread into the dataframe variable creditcard data. We have used head function and tail function todisplay the first 5 rows and last five rows.We have data composed of attributes with varyingscale. So we have to re-scale the attributes to samerange.we will scale our data using the scale() function. Wewill apply this to the amount component of ourcreditcard data amount. Scaling is also known asfeature standardization. With the help of scaling, thedata is structured according to a specified range.Therefore, there are no extreme values in our datasetthat might interfere with the functioning of ourmodel.scale() function in R Langauge is a generic functionwhich centers and scales the columns of a numericmatrix. The center parameter takes either numeric alikevector or logical value. If the numeric vector is provided, then eachcolumn of the matrix has the correspondingvalue from center subtracted from it. III. DATASET DESCRIPTIONThe dataset that is used in this paper is obtained fromKaggle .The datasets contains transactions made by creditcards in September 2013 by European cardholders.This dataset presents transactions that occurred intwo days, where we have 492 frauds out of 284,807transactions. The dataset is highly unbalanced, the positiveclass (frauds) account for 0.172% of all transactions.It contains only numerical input variables which arethe result of a PCA transformation. Unfortunately, due toconfidentiality issues, we can use an Artificial neural network(ANN)to provide the original features and more backgroundinformation about the data. Features V1, V2, V28 are theprincipal components obtained with PCA, the only featureswhich have not been transformed with PCA are 'Time' and'Amount'. Feature 'Time' contains the seconds elapsed betweeneach transaction and the first transaction in the dataset. Thefeature 'Amount' is the transaction Amount, this feature can beused for example-dependant cost-sensitive learning. Feature'Class' is the response variable and it takes value 1 in case offraud and 0 otherwise.Page 81Figure 2 Reading dataset4.2 Splitting into training and testing sets We split the dataframe into training set and testing setusing the function sample.split.sample.split function splits the data using given ratioin our case 80% for training set and 20% for testingset.www.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021 4.3ISSN [ONLINE]: 2395-1052Dim() is a function returns the dimension of thetraining set and testing set.The set. seed() function sets the starting number usedto generate a sequence of random numbers – itensures that you get the same result if you start withthat same seed each time you run the same process.Training the modelFigure 6 Training the dataset for XGBoost algorithm The model is trained using four different machineLearning algorithms namely,Xgboost, decision tree,Artificial Neural networks (ANN) and logicalregression algorithms.Glm() is used to fit generalized linear models,specified by giving a symbolic description of thelinear predictor and a description of the errordistribution.The parameter family provides the convenient way tospecify the details of models used by functionsRPART -Recursive Partitioning And RegressionTrees.predict () is used to returns vector of predictedresponses of Rpart object. Figure 3 Training the dataset for decision tree algorithmFigure 4 Training the dataset for logistic regression algorithmThe approach that this paper proposes, uses the latestmachine learning algorithms like Decision tree, An ArtificialNeural Network (ANN), XGBoost and Logistic Regressionalgorithms. First of all, we obtained our dataset from Kaggle, adata analysis website which provides datasets where we have492 frauds out of 284,807 transactions.In this section of the Rdata science project,using read. csv function, the csv datasetcredit card is read into the dataframe variable creditcard data.We have used head function and tail function to display thefirst 5 rows and last five rows. Functions like table(),summary(), names(), var(), sd() are also used.We have datacomposed of attributes with varying scale. So we have to rescale the attributes to same range.To accomplish that we usescale().scale() function in R Langauge is a generic functionwhich centers and scales the columns of a numeric matrix. Thecenter parameter takes either numeric alike vector or logicalvalue. If the numeric vector is provided, then each column ofthe matrix has the corresponding value from center subtractedfrom it.We split the dataframe into training set and testing setusing the function sample.split().This sample.split() function splits the data usinggiven ratio in our case 80% for training set and 20% fortesting set.Dim() is a function returns the dimension of thetraining set and testing set.The set. seed() function sets thestartingnumberusedtogenerateasequenceof random numbers .It ensures that you get the same result if you startwith that same seed each time you run the same process.Themodel is trained using four different machine Learningalgorithms namely, XGBoost, decision tree, Artificial Neuralnetworks (ANN) and logical regression algorithms.Figure 5 Training the dataset for ANN algorithmPage 82www.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021ISSN [ONLINE]: 2395-1052Figure 9 AUC for ANN algorithmFigure 7 decision treeGlm() is used to fit generalized linear models,specified by giving a symbolic description of the linearpredictor and a description of the error distribution.Theparameter family provides the convenient way to specify thedetails of models used by functions. predict () is used toreturns vector of predicted responses of Rpart object.Thisfunction creates Receiver Operating Characteristic (ROC)plots for one or more models.A ROC curve plots the false alarm rate against the hitrate for a probabilistic forecast for a range of thresholds. Thearea under the curve is viewed as a measure of aaccuracy.Higher the AUC, the better the model is at predicting0s as 0s and 1s as 1s.This are the process takes place in ourproject to calculate the accuracy level.Figure 10 AUC for Logistic regression algorithmFigure 11 AUC for Decision tree algorithmV. RESULTFigure 8 AUC for XGBoost algorithmPage 83Since the entire dataset consists of transactionrecords, its only a fraction of data that can be made available ifthis project were to be used on a commercial scale. Where thewww.ijsart.com

IJSART - Volume 7 Issue 5 – MAY 2021ROC curve is plotted with TPR against the FPR where TPR ison the y-axis and FPR is on the x-axis.Being based onmachine learning algorithms, the program will only increaseits efficiency over time as more data is put into it.AUC for our model isa.b.c.d.An artificial neural network (ANN) -88.3XGBoost algorithm -96.2Logistic regression algorithm -97.5Decision tree algorithm -90.27Figure 12 Comparision of algorithmVI. CONCLUSIONCredit card fraud lead to loss of sensitive informationand its considered as act of criminal dishonesty. This articlehas listed out the most common methods of fraud along withtheir detection methods. This paper has also explained indetail, how machine learning can be applied to get betterresults in fraud detection along with the ndexperimentation results. This paper proves that the LogisticRegression algorithm reach over 97.5% accuracy which ishigher than XGBoost(96.2),Decision tree(90.27) and anArtificial neural network (ANN) (88.3).This high percentageof accuracy occurredin spite of imbalance between the numberof valid and number of genuine transactions in the dataset.Being based on machine learning algorithms, the program willonly increase its efficiency over time as more data is put intoit.We finally observed that logistic regression gave betterresults.ISSN [ONLINE]: 2395-1052[2] “Credit Card Fraud Detection: A Realistic Modeling and aNovelLearning Strategy” published by IEEE transactionsonneural networks and learning systems, vol. 29, no.8,august 2018[3] Sam maes,Karl Tuyls,Bram Vanschoenwinkel. “CreditCard Fraud Detection Using Pipeling and EnsembleLearning ”.researchgate.net ICITETM 2020[4] Toluwase Ayobami Olowookere,Olumide SundayAdenwale. “A Framework for Detecting Credit CardFraud withCost- Sensitive Meta- LearningEnsembleApproach”. Elsevier.com/locatw/sciaf 2020[5] Dejan Varmedja, Mirjana Karanovic, Srdjan,” CreditCard Fraud Detection - Machine Learning Methods”INFOTEH-JAHORINA, 20-22 March 2019[6] Philip K. Chan, Salvatore J. Stolfo,” Credit Card FraudDetection using Non-uniform Class and CostDistributions” Florida Institute of Technology 2019[7] Vaishnavi Nath Dornadulaa, Geetha Sa ,” Credit CardFraud Detection using Machine Learning TAC 2019[8] S. Dhankhad, B. Far, E. A. Mohammed, “SupervisedMachine Learning Algorithms for Credit Card FraudulentTransaction Detection: A Comparative Study”, 2018IEEE International Conference on Information Reuse andIntegration (IRI) pp. 122-125. IEEE.[9] “Credit Card Fraud Detection Based on TransactionBehaviour –by John Richard D. Kho, Larry A. Vea”published by Proc. of the 2017 IEEE Region 10Conference (TENCON), Malaysia, November 5-8,2017[10] J. O. Awoyemi, A. O. Adentumbi, S. A. Oluwadare,“Credit card frauddetection using Machine LearningTechniques: A ComparativeAnalysis”, ComputingNetworking and Informatics (ICCNI), 2017InternationalConference on pp. 1-9. IEEE.[11] N. Malini, Dr. M. Pushpa, “Analysis on Credit CardFraud IdentificationTechniques based on KNN andOutlier Detection“, Advances inElectrical, Electronics,Information, Communication and Bio-Informatics(AEEICB), 2017 Third International Conference on pp.255-258. IEEE.REFERENCES[1] Jiang, Changjun et al. “Credit Card Fraud Detection: ANovel Approach Using Aggregation Strategy andFeedback Mechanism.” IEEE Internet of Things Journal 5(2018): 3637-3647.Page 84www.ijsart.com

II. LITERATURE SURVEY Multiple Supervised and Semi-Supervised machine learning techniques are used for fraud detection , In this paper we have compared certain machine learning algorithms for detection of fraudulent transaction and find accuracy of each algorithms.Many Supervised

Related Documents:

FRAUD THE FACTS 2019 - UK Finance

Card Fraud 11 Unauthorised debit, credit and other payment card fraud 12 Remote purchase (Card-not-present) fraud 15 Counterfeit Card Fraud 17 Lost and Stolen Card Fraud 18 Card ID theft 20 Card not-received fraud 22 Internet/e-commerce card fraud los

82 Views

2y ago

A parameter optimized approach for improving credit card fraud detection

Hidden Markov Model (HMM), Advanced Hidden Markov Model (AHMM), Hill Climb, and credit card fraud detection . 1. Introduction . an HMM-based credit card fraud detection system that . An unauthorized account movement by a person for whom the account was not be set to can be referred as credit card fraud.

5 Views

8m ago

2020 Economic crime - PwC

Types of economic crime/fraud experienced Customer fraud was introduced as a category for the first time in our 2018 survey. It refers to fraud committed by the end-user and comprises economic crimes such as mortgage fraud, credit card fraud, claims fraud, cheque fraud, ID fraud and similar fraud types. Source: PwC analysis 2

45 Views

1y ago

2020 Economic crime - Corruption Watch

41 Views

1y ago

Credit Card Fraud Detection using Hidden Morkov Model and ... - IJANA

TECHNIQUES FOR DETECTING CREDIT CARD FRAUD 1. Hidden Markov Model (HMM) Hidden Markov Model is the simplest models which can be used to model sequential data. In markov models, the state . Application of Neural Network Model as Credit Card Fraud Detection Method There is a fixed pattern to how credit-card owners consume their credit-card on .

8 Views

8m ago

Integrated Credit Card Processing - Helpjuice

required to have the Credit Card Credit permission to access the Apply Credit Card Credit. The patient transactions that appear in the Credit Card Credit page are limited to charges with a credit card payment. This can be any credit card payment type, not just Auto CC. To apply a credit card credit: 1.

95 Views

1y ago

A Novel Hidden Markov Model for Credit Card Fraud Detection

Based on Hidden Markov Model The credit card fraud detection system is not require fraud signatures and still it is able where to sense frauds just by accepting in mind a cardholder's spending routine [9]. The exacting of acquired items in single transactions is broadly unidentified to any Credit card Fraud Detection System management .

7 Views

8m ago

Monitoring Practices and Fraud Detection & Prevention

Detection of Fraud Schemes Fraud is much more likely to be detected by tips than by any other method. 2012 Association of Certified Fraud Examiners, Inc. 26 Detection of Occupational Frauds 2012 Association of Certified Fraud Examiners, Inc. 27 Why Employees Do Not Report Fraud According to a Business Ethics Study (Association of Certified Fraud Examiners), employees do not .

94 Views

3y ago

Recent Views

Columbus,Ohio 1890

Slicing Steaks 3563 Beef Tender, Select In Stock 3852 Angus XT Shoulder Clod, Choice In Stock 3853 Angus XT Chuck Roll, Choice 20/up In Stock 3856 Angus XT Peeled Knuckle In Stock 3857 Angus XT Inside Rounds In Stock 3858 Angus XT Flats, Choice In Stock 3859 Angus XT Eye Of Round, Choice In Stock 3507 Point Off Bnls Beef Brisket, Choice In Stock

2y ago

268 Views

Buying Your First Stock - Stock-Trak

Stock Market Game Time: 15 Minutes Requires: StockTrak Curriculum , Computer Access Buying Your First Stock This lesson is an introduction to buying a stock. Students will be introduced to basic vocabulary that is involved with a buying and owning a stock. Stu-dents will be going through the entire process of buying a stock from looking

1y ago

164 Views

TRAINING - CamInstructor

Mastercam Training Guide Mill-Lesson-4-9 6. Change the parameters to match the Stock Setup screenshot below: Stock Setup Stock Origin The stock origin is the X-Y-Z coordinate position of the point indicated by the cross in the picture of the stock model. Use it so Mastercam knows where your stock model is located relative to your part and

3y ago

242 Views

WPX Energy, Inc. - Feltl and Company

WPX Energy, Inc. Common Stock We are offering 27,000,000 shares of our common stock. Our common stock is listed on the New York Stock Exchange under the symbol “WPX.” On July 10, 2015, the last reported sale price for our common stock on the New York Stock Exchange (the “NYSE”) was 11.22 per share.

3y ago

172 Views

Spray 2020 Corporate Profiles - industry-publications

Custom plastic tubes (mono & multi-layer, ABL and Polyami) Stock and custom plastic, metal, and wood caps and closures Stock and custom fine mist, treatment and lotion pumps Stock and custom droppers Stock and custom rollerballs/roll-ons Stock sampler bottles and vials Stock German Quality cosmetic pencil sharpeners

2y ago

180 Views

The Stock Market Profits Blueprint - Liberated Stock Trader

The stock market profits blueprint has been hand crafted to enable you to understand all the factors that play on the stock market. It is called a blueprint because a blueprint is in effect an architectural document to show how something is designed. The Blueprint will show you a powerful way to envisage how the stock market and the stock market

1y ago

181 Views

The Impact of Persian News on Stock Returns Through Text Mining Techniques

Persian news - on the stock prices has been neglected. Consequently, this study aimed to fill this gap. To this aim, the stock index values were collected from the Tehran Stock Exchange along with the . Stock market prediction is a way to understand the future fluctuations of a company's stock price (Jishag et al., 2020). Generally, two .

1y ago

225 Views

Stock Market Uncertainty and the Stock-Bond Return Relation

implied volatility and stock turnover may prove useful for ﬁnancial applications that need to under-stand and predict stock and bond return co-movements. Finally, our empirical results suggest that the beneﬁts of stock-bond diversiﬁcation increase during periods of high stock market uncertainty. This study is organized as follow.

1y ago

158 Views

Operation of Stock Exchange - Williams College

Class Notes Operation of Stock Exchange - 3 - Buying on Margin "Margin" is borrowing money from your broker to buy a stock and using your invest-ment as collateral. Example Buy paying full price Buy stock at 60. Stock price goes to 90. Return (90 - 60)/60 50% Buy on "margin" Buy stock at 60. Borrow 30; you pay 30.

1y ago

138 Views

Stock Market Development and Economic Growth: Empirical Evidence from China

measures used to proxy for stock market size and the size of real economy. Most of the existing studies use stock market index as a proxy for measuring the growth and development of stock market in a country. We argue that stock market index may not be a good measure of stock market size when looking at its association with economic growth.

1y ago

263 Views

A Hybrid Prediction Method for Stock Price Using LSTM and . - Hindawi

the relationship between stock prices and these factors. Although these factors will temporarily change the stock price, in essence, these factors will be reﬂected in the stock price and will not change the long-term trend of the stock price. erefore, stock prices can be predicted simply with historical data.

1y ago

159 Views

A voyage to more stable safety stock and service levels - apics

safety stock targets. Most enterprise resources planning (ERP) systems perform a safety stock calculation. But very few include in the system all sources of variability as inputs to the safety stock formula. Furthermore, ERP tools rarely calculate accurate safety stock inputs or correct erroneous data. Figure 1 shows 13 basic safety stock inputs.

1y ago

149 Views

Factors Affecting Performance of Stock Market: Evidence from . - HRMARS

We used the data of Colombo Stock Exchange (CSE) for Sri Lankan stock market in this research which is the main stock exchange of Sri Lanka. The market capitalization of CSE is over 20 billion USD. Colombo stock exchange is the first south Asian region stock market and overall 52nd who obtain the membership of World Federation of Exchanges.

11m ago

103 Views

Forecasting Stock Price Turning Points in The Tehran Stock Exchange .

Forecasting Stock Price Turning Points in the Tehran Stock Exchange Using Weighted Support Vector Machine. Journal of Entrepreneurship Education, 25(5), 1-12 . 2 1528-2651-25-5-797 Citation Information: Sayrani., M & Sharif, J.S. (2022). Forecasting Stock Price Turning Points in the Tehran Stock Exchange Using Weighted Support Vector Machine. .

7m ago

96 Views

Water Physical Stock Account: 1995-2010 - Tableau Public

10 Water physical stock account for year ended June 2003, by region . 11 Water physical stock account for year ended June 2004, by region . 12 Water physical stock account for year ended June 2005, by region . 13 Water physical stock account for year ended June 2006, by region . 14 Water physical stock account for year ended June 2007, by region

3m ago

22 Views

Credit Card Fraud Detection Using Machine Learning

It looks like you're using an ad-blocker