Inside The SCAM Jungle: A Closer Look At 419 Scam Email .

2y ago
40 Views
2 Downloads
3.39 MB
8 Pages
Last View : Today
Last Download : 3m ago
Upload by : Brady Himes
Transcription

2013 IEEE Security and Privacy WorkshopsInside the SCAM Jungle:A Closer Look at 419 Scam Email OperationsJelena Isacenkova , Olivier Thonnard† , Andrei Costin , Davide Balzarotti , Aurelien Francillon Eurecom, France† Symantec Research LabsAbstract—Nigerian scam is a popular form of fraud inwhich the fraudster tricks the victim into paying a certainamount of money under the promise of a future, largerpayoff.Using a public dataset, in this paper we study how theseforms of scam campaigns are organized and evolve overtime. In particular, we discuss the role of phone numbersas important identifiers to group messages together anddepict the way scammers operate their campaigns. In fact,since the victim has to be able to contact the criminal,both email addresses and phone numbers need to beauthentic and they are often unchanged and re-used fora long period of time. We also present in details severalexamples of Nigerian scam campaigns, some of which lastfor several years - representing them in a graphical wayand discussing their characteristics.I. I NTRODUCTIONNigerian scam, also called “419 scam” as a referenceto the 419 section in the Nigerian penal code, hasbeen a known problem for several decades. Originally,the scam phenomenon started by postal mail, and thenevolved into a business run via fax first, and email later.The prosecution of such criminal activity is complicated [4] and can often be evaded by criminals. As aresult, reports of such crime still appear in the socialmedia and online communities, e.g. 419scam.org [1],exist to mitigate the risk and help users to identify scammessages.Nowadays, 419 scam is often perceived as a particular type of spam. However, while most of the spamis now sent mainly by botnets and by compromisedmachines in bulk quantities, Nigerian scam activitiesare still largely performed in a manual way. Moreover,the underlying business and operation models differ.Spammers trap their victims through engineering effort,whereas scammers rely on human factors: pity, greedand social engineering techniques. Scammers use veryprimitive tools (if any) compared with other form ofspam where operations are often completely automated.Even though today 419 scam messages are eclipsed bythe large amount of spam sent by botnets, they are stilla problem that causes substantial financial losses for anumber of victims all around the world. 2013, Jelena Isacenkova. Under license to IEEE.DOI 10.1109/SPW.2013.15A distinctive characteristic of email fraud is thecommunication channel set up to reach the victim: fromthis point of view, scammers tend to use emails and/orphone numbers as their main contacts [5], while otherforms of spam are more likely to forward their victimsto specific URLs. For instance, a previous study of spamcampaigns [9] (in which scam was considered a subsetof spam) indicates that 59% of spam messages containa URL.The traditional spam and scam (non-Nigerian) scenarios have been already thoroughly studied (e.g. [9], [3]).Costin et al. [5] describe the use of phone numbers ina number of malicious activities. The authors show thatthe phone numbers used by scammers are often activefor a long period of time and are reused over and overin different emails, making them an attractive featureto link together scam messages and identify possiblecampaigns. In this work, we test this hypotesis by usingphone numbers and othe email features to automaticallydetect and study scam campaigns in a public dataset.In particular, we apply a multi-dimensional clusteringtechnique to group together similar messages to identifycriminals and study their operations. To the best ofour knowledge, this is the first in-depth study of 419campaigns.Our analysis identifies over 1,000 different campaignsand, for most of them, phone numbers represent thecornerstone that allows us to link the different piecestogether. Our experiments also show that it is possibleto identify macro-clusters, i.e. large groups of scamcampaigns probably run by the same criminal groups.The rest of the paper is organized as follows. We startby describing the scam dataset (Section III), to whichwe apply our cluster analysis technique to extract scamcampaigns, and compare the usage of email addressesand phone numbers (Section IV). In Section V wefocus on a number of individual campaigns to presenttheir characteristics. Finally, we draw our conclusionsin Section VI.II. R ELATED WORKScammers employ various techniques to harvestmoney from ingenuous victims. Tive [14] introduces143

the tricks of Nigerian fee fraud and the philosophy oftricksters behind. Stajano and Wilson [10] studied anumber of scam techniques and showed the importanceof security engineering operations. A brief summary ofNigerian scam schemes was presented by Buchanan andGrant [4] indicating that Internet growth has facilitatedthe spread of cyber fraud. They also emphasize thedifficulties of adversary prosecution - one of the mainreasons why Nigerian scam is still an issue today. Amore recent work by Oboh et al. [8] discusses the sameproblem of prosecution in a more global context takingthe Netherlands as an example.Another work by Goa et al. [7] proposes an ontologymodel for scam 419 email text mining demonstratinghigh precision in detection. A work by Pathak et al. [9]analyses email spam campaigns sent by botnets, describing their patterns and characteristics. The authorsalso show that 15% of the spam messages containeda phone number. A recent patent has been publishedby Coomer [2] on a technique that detects scam andspam emails through phone number analysis. This isthe first mentioning of phone numbers being used foridentifying scam. Costin et al. [5] studied the role ofphone numbers in various online fraud schemes andempirically demonstrated it’s significance in 419 scamdomain. Our work extends Costin’s study by focusingon scam campaign characterization, and relies on phonenumbers and email addresses used by scammers.III. DATASETIn this section we describe the dataset we usedfor analyzing 419 scam campaigns and provide someinsights into the scam messages. There are varioussources of scam often reported by users and aggregatedafterwards by dedicated communities, forums, and otheronline activity groups. The data chosen for our analysiscome from 419scam.org - a 419 scam aggregator as it provides a large set of preprocessed data: emailbodies, headers, and some already extracted emails attributes, like the scam category and the phone numbers.We downloaded the data from s website for a periodspanning from January 2009 until August 2012.The resulting dataset consists of 36,761 419 scammessages with 11,768 unique phone numbers. The general statistics of the data are shown in Table I. A firstthing to notice is that the number of messages is threetimes bigger than the number of phone numbers. Wedid not notice any significant bursts of scam messages(verified on a monthly basis) during the three year span,suggesting that the email messages were constantlydistributed over time. It is also important to note that thedataset is mostly limited to the European and Africanregions (with also a few Asian samples), which isdue to the way the website owners are collecting andclassifying the data.Table I: General statistics tableDescriptionScam messagesUnique messagesTotal email addressesUnique email addressesTotal phone numbersTotal unique phone numbersNumber of 76812Phone numbers can also be used to identify a geographical location, typically the country were the phoneis registered. Although it does not prove the origin ofthe scam, it still references a country and provides acertain level of confidence in the message content totheir victims. For example, receiving a new partnershipoffer from UK could seem strange if the phone contacthas a Nigerian prefix. Moreover, as shown in a previousstudy [5], mobile phone numbers are precise in indicating the country of residence of the phone owner asfew roaming cases were found. Therefore, the phoneattribute is precise in indicating geographical originsand could reliably be used in the study of 419 scam.We then look at the time during which emails andphones were advertised by scammers in scam messages.71% of the email addresses in our dataset were usedonly during one day. The remaining were used for anaverage duration of 79 days each. Phone numbers havea longer longevity than email addresses: 51% of thephone numbers were used only for one day. The restof phone numbers were used on average for 174 days(around 6 months). This is an important feature in ourdata clustering analysis.Table II summarizes the phone number geographicaldistribution. UK numbers are twice as common asNigerian, and three times more common than the onesfrom Benin, the third biggest group. Netherlands andSpain are the leading countries in Europe. Note thatUK should be considered as a special case. As reportedby 419scam.org and Costin et al. [5], all UK phonenumbers in this dataset belong to personal numberingservices – services used for forwarding phone calls toother phone numbers and serving as a masking serviceof the real destination for the callee. In our dataset thereare 44% of such phone numbers (all with UK prefix),another 44% are mobile phone numbers and 12% offixed lines [5].The dataset is also labeled with a scam category.Around 64% of the emails are assigned to the category “419 scam” (general scam category). Most ofthe remaining emails (24%) belong to “Fake lottery”.144

Table II: Phones by countriesCountryTotal phonesTotal in %1%0.5%0.1%0.01%United KingdomNigeriaBeninSouth AfricaSpainNetherlandsIvory CoastChinaSenegalTogoIndonesia " " " Σ Figure 2: TRIAGE workflow example on scam datasetthat takes advantage of multi-criteria data analysis togroup events based on subsets of common elements(features). Thanks to this multi-criteria clustering approach, TRIAGE identifies complex patterns in data,unveiling even varying relationships among series ofconnected or disparate events. TRIAGE is best describedas a security tool designed for intelligence extractionhelping to determine the patterns and behaviors of theintruders, and highlighting “how” they operate ratherthan “what” they do. The framework [11] has alreadydemonstrated its utility in various analyses threats, e.g.,rogue AV campaigns [6], spam botnets [13] and targetedattacks [12].Figure 1: Scam email categories over timeHowever, this distribution has changed over time asshown in Figure 1. Especially, a big difference can beobserved between 2009 and 2011, where in 2011 the“419 scam” became a dominant category. As of August2012, there was 5 times more emails of “419 scam”than of “fake lottery” letters. This might be due to anoutdated categorization process, as scam topics - likespam - may evolve over time. For this reason, in thenext section we describe our process to automaticallyidentify the scam topic based on the frequency of wordsin the messages. We also observe that most of the “fakelottery” scams are associated with European phonenumbers, therefore suggesting a more targeted audience.In the majority of “419 scam” cases, scammers useAfrican phone numbers with UK share being equivalentto Nigerian.IV. DATA ANALYSISA. Scam email clusteringTo identify groups of scam emails that are likelypart of a campaign orchestrated by the same groupof people, we have clustered all scam messages usingTRIAGE– a software framework for security data miningFigure 2 illustrates the TRIAGE workflow, as appliedto our scam data set. First, we select the email features,defined as decision criteria for linking the emails. Inour experiment we used the sender email address (theFrom), email subject, date, Reply-To address, scammerphone number and email address found in the messagebody. Then, relationships among all email samples arebuilt with respect to the selected features using appropriate comparison methods integrated in the framework.At the third step, the aggregation model fuses allfeatures based on a set of weights defined to reflectfeature importances and interactions during data fusion.We define parameters weighting thanks to the insightsgained from previous study of scam phone numbers [5].Hence, we assigned higher importance to phone, subjectand reply address, and a lower importance to the emailfound in the body and the sending date.As outcome, TRIAGE provides multi-dimensionalclusters (MDC) of scam emails linked by at least anumber of common traits. As explained in [11], theuser can specify a threshold at which a link betweenclusters is created and that controls the relevance of thedata within the same cluster. In our analysis, we choosea threshold of 0.30 by which any group of emails linkedby a coalition of two similar features that includes atleast the phone number, or by at least three similarfeatures (no matter which combination), will exceed thethreshold and thus create a cluster.145

Table III: Global statistics for the top 250 clustersStatisticNr emailsNr fromNr replyNr subjectsNr phonesDuration (in days)Nr dates 95.0Table III provides some global statistics computedacross the top-250 largest scam campaigns. In overhalf of these campaigns, scammers are using only twodistinct phone numbers, but they still make use ofmore than 5 different mailboxes to get the answersfrom their victims. Most scam campaigns are ratherlong-lived (lasting on average about a year). We notethat cluster sizes are small on average indicating thatthere are many small, isolated campaigns and only afew dozens of messages belong to the same campaign.This might be also an artefact of the data collectionprocess; nevertheless, we anticipate that this could alsoreflect the scammers’ behavior who may want to stayuntraceable “by the radar”. Indeed, bulk amounts of thesame emails would have more potential to compromisetheir scamming operations, as this would become toovisible to content-based scam filters and, hence, wouldget blocked on earlier stages of email filtering.To confirm our intuition about the importance ofcertain features (phone numbers, and to a lesser extent,email addresses) and their effective role in identifyingcampaigns, we look at all similarity links within clusters. We observe that the features mainly responsiblefor linking scam messages in the clusters involve phonenumbers (in 88% cases), followed by the reply emailaddress (for 66% of the links). Not surprisingly, the fromaddress (which can be easily spoofed) changes muchmore often and is used as linking feature in only 46%of clusters.One could wonder about the longevity of these features, hence we also looked at phone numbers and emailDensityB. Clustering resultsWe identified 1,040 clusters with TRIAGE that consistof at least 5 correlated scam messages. Because ofthe multi-criteria aggregation, we hypothesize that theseclusters are quite likely reflecting different scam campaigns organized by the same individuals – as emailswithin the same clusters share several common traits.These, though, give no indication on the actual numberof individuals that are behind each campaign. Based onthe topologies of those campaigns, we anticipate therecould be more than a single person in most cases. Welook at this aspect in more details in section n of emails and phones (days)Figure 3: Duration of phone numbers and emails usedby scammers, in daysaddresses from a time perspective. Figure 3 representsthe usage of the same email addresses and phonenumbers over time. The Y axis is density of the featuresthat indicates their distribution in time on a 100%scale. As mentioned before, many of them are used foronly one day, so there is a slight concentration on theleft side of the plot. However, the phone numbers aremore often reused over time than email addresses. Thiscould be explained by an easy access to new mailboxesoffered by many free email providers. As for the phone,they probably still require some financial investmentcompared with emails. We checked the domain namesof email addresses used in our scam dataset and foundthat top 100 belong to webmail providers from all overthe world. This finding suggests that email messagessent from such accounts would overpass sender-basedanti-spam techniques are widely deployed today.C. Content categorization419scam.org [1], as mentioned, also categorizes thescam emails into 10 categories. We presented theirshares in the dataset section III. Since this provided categorization is too broad, we decided to evaluate ourselvesthe categories in our dataset by measureing the wordfrequency in the body of the scam messages. To extractsome more generalized knowledge of the clustereddata, we create a list of the most repetitive keywords(after removing all the stop words) and group theminto meaningful categories. As a result, we identifiedthree big categories within clusters: money transfer andbank related fraud (54%), lottery scam (22%), and fakedelivery services (11%). The rest is uncategorized andrefers to 13% of the clusters. The repartition is similarto the one provided by the data source, except that thedelivery services are separated into a separate category.The so called general “419 scam” category correspondsto letters about lost bank payments, compensations,and investment proposals. We grouped them togetheras they are very difficult to separate due to a number ofkeywords in common.146

V. C HARACTERIZATION OF CAMPAIGNSThis section provides deeper insights into 419 scamcampaign orchestration. We present a few typical scamcampaigns and we show connections between clusters,possibly run by the same group of scammers.A. Scam campaign examplesFigures 4, 5, 6 show examples of scam campaignsidentified by TRIAGE, depicted with graph visualizationtools developed in the VIS-SENSE project1 . Thosegraphs are drawn with a circular layout that representsthe various dates on which scam messages were sent.The dates are laid out starting from 9 o’clock (farleft in the graph) and growing clockwise. Then, thecluster nodes are drawn with a force-directed placementalgorithm. The big nodes on the graphs are mostlyphone numbers and From email addresses. Smallernodes represent mostly subjects and email addressesfound in the Reply-To header or the message content.Figure 4 is an example of a campaign impersonatinga private company in South Africa, ESKOM Holdings.The ESKOM campaign was initially a fake lottery scam(left upper corner of Figure 4), but later switched toa different scam, while still re-using the same phonenumber. A noteworthy aspect of this campaign, sharedwith some other campaigns we found, is that it relies onfew From emails addresses (i.e., the bigger nodes in thefigure). The other email addresses are used with largernumber of emails and change over time.Another campaign, presented in Figure 5b, illustratesthe roles of email addresses and phone numbers in 419scam over time. This campaign, that lasted for 1,5 year,changed topic over time (every 1 to 2 months), which isclearly visible by looking at the larger subgroups placedaround the circle. These shorter campaigns were mostprobably run by the same scammers. We see that theyalmost completely changed the email addresses betweendifferent scam runs, but kept the same phone number.The email addresses were often selected to match thecampaign topic and subjects.Unfortunately, graphical interpretation of the campaigns is not always straightforward, as can be seen onFigure 5a. This graph was generated from a cluster ofa recent campaign of iPhone-related scams that lastedfor 1,5 years. The communication infrastructure of thesescammers is much more diverse. The campaign relies ona large number of

A Closer Look at 419 Scam Email Operations Jelena Isacenkova , Olivier Thonnard†, Andrei Costin , Davide Balzarotti , Aurelien Francillon Eurecom, France †Symantec Research Labs Abstract—Nigerian scam is a popular form of fraud in wh

Related Documents:

Satyam scandal, 2G Spectrum Scam, the UTI scam, C.R. Bhansali scam, Madhu Cora scam, Indian Coal Allocation Scam, Wakf Board Land Scam, Commonwealth Games Scam and The Fodder Scam appears that corporate accounting fraud is a major problem that is

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

or is using online dating websites or apps, although people do not necessarily use them for nding new relationships, but also use them just for fun [3]. With the growing popularity of online dating a new scam became apparent: the online romance scam. This scam is introduced in the next subsection. 1.1 The online romance scam

Trustee Joy Harris Jane Gardener Simon Hebditch Trustee Sarah Howell- Davies Jill Batty Cartriona Sutherland treasurer Verity Mosenthal Jenny Thoma Steve Mattingly Trustee Anne Sharpley Lynn Whyte Katy Shaw Trustee Sandra Tait Tina Thorpe Judith Lempriere The position of chair is contested so there will be an election for this post Supporting Statements David Beamish Standing for Chair I .