Machine Learning: How To Build A Better Threat Detection Model

3y ago

16 Views

3 Downloads

706.47 KB

16 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Laura Ramon

Report this link

Download PDF

Transcription

Machine Learning:How to Build a BetterThreat Detection ModelBy Madeline SchiappaAt Sophos, we’re focused on protecting our customers from threatsfrom every possible attack vector. And here in the Data Science Group,we’re challenged every day to come up with new and better techniquesto address these cyber threats in a scalable way that not only improvesprotection, but changes the paradigm of how emerging threats areaddressed. That’s why we’re focusing on new deep learning andmachine learning methods to be leveraged across our entire portfolio.One of our first challenges is supplementing reactive, human-basedmalware research with predictive machine learning models. Thischallenge is very unique, and can be an afterthought in traditionalmachine learning cybersecurity literature.In this article, we describe the process we use to develop our models.To help explain the concepts, we’ll work through the developmentand evaluation of a toy model meant to solve the very real problem ofdetecting malicious URLs.

Machine Learning: How to Build a Better Threat Detection ModelDetecting Malicious URLs the Traditional WayLet’s start with how the problem of malicious URL detection can be traditionally solved usingsignatures, and then take a closer look at how we would design a detection model.Say we have reports of the following 098f8cfc54cd872a35192a82ac3\?entrypop acebook.com/BlacklistingA traditional protection method would be to add the malicious URLs to a blacklist that isthen either pushed out to customers directly or updated in a cloud-based blacklist serviceleveraged by an endpoint product.The problem with these solutions is that the sheer daily volume of malicious URLs found onthe internet means that updates can grow relatively large in size, which naturally leads todecreased performance on end users’ machines due to increased disk and memory usage.Furthermore, since the internet is used to either push updates to customers or pull updatesfrom cloud services, if connections get interrupted or updates don’t complete correctly,customers can remain unprotected from URL updates. Additionally, in the instance of cloudbased lookups, round-trip latency delays can negatively impact the user experience. Andperhaps the biggest issue with this method is that it’s reactive: malicious URLs must bedetected and protections published prior to users navigating to them.Regular Expressions (RegEx) and SignaturesAnother traditional method is to create regex-based signatures meant to capture maliciousURLs and their variants. Similar to blacklisting, after signatures are created, they aredelivered to customers either via an update or pushed from the cloud, meaning the sameissues of connectivity and memory usage apply.The main concern, however, is whether we base the regex match on the domain itself orinclude what comes after or before the domain as well. In the example URLs, we use severalFacebook links as the target URLs being exploited. These examples demonstrate theimportance of analysis, because if we were to simply create a signature that blocks trafficbased on “facebook.com” in general, we would wholesale block a commonly used, popular,clean site. Facebook – and those who visit it often – would be very unhappy.2

Machine Learning: How to Build a Better Threat Detection ModelHowever, we still want to protect customers from malicious content that may havebeen attached to the Facebook domain and related keywords. For example, we couldwrite a signature that uses regex to block URLs that match “facebook.com” but that arefollowed by a period with more text after the initial “.com” portion of the URL. This wouldblock two of our sample URLs out of the five.We could further finesse this signature to block URLs containing any periods, hyphens,or text that directly followed “facebook.com” without the presence of a slash. The regex“/ facebook\.com[\-\.\w\/\?\ ] /” would block three out of five of our sample ypop ��.Because it only blocks three of the five URLs, an additional signature would be neededto capture the remaining two. These signatures each take about five minutes to write,meaning that two signatures will require an initial investment of 10 minutes just toblock five URLs. When there are thousands of malicious URLs, this time adds up quickly.We also need to test the signatures to ensure they don’t block the clean URLs, whichcan take another 5 minutes each.As you can see, we’re now up to 20 minutes. And this doesn’t include the time it takes tofind and validate which URLs are clean and which are malicious. Each time we receive anew set of malicious and clean URLs, our human analysts have to undergo the processall over again.In summary, with this method, human analysts are constantly analyzing URLs, creatingsignatures, and pushing out updates. This causes several areas of concern:1.This is a reactive method, so some customers may visit malicious sites before weknow about them. Additionally, they may not be protected from zero-day malware.2.An individual signature can only match on so many variants of a domain, resultingin the need for many, many signatures to cover only a portion of malicious URLs.3.Because updates are pushed through an internet connection, there’s always arisk of an interruption to an update, meaning customers may be unprotected fromthe latest malicious content. These updates also consume a lot of memory on theendpoint.4.The manual generation and subsequent maintenance of these signatures is notonly slow, but requires a large investment of time and resources.3

Machine Learning: How to Build a Better Threat Detection ModelA Brief Introduction to Machine LearningMachine learning “learns” by using mathematical models instead of being explicitlyprogrammed to address the particularities of a specific problem. Using large amountsof data, we generate a general model that is able to accurately describe the data it’singesting. However, since we’re dealing with general models in order to try to explainspecific phenomena, we never know if our machine learning model has learned to predictproperly. As such, any model that we develop is always coupled with a rigorous set ofevaluations.Here at Sophos, we focus specificallyon deep learning, which is a kind ofMachine Learningmachine learning that most similarlymimics the human brain. Deep learningInput DataOutputInformation ( Answers)Optimum Modelinvolves many layers of neurons toform an artificial neural network. BothRelationshipPatternsa brain-based neural network and anDependenciesHiddenstructuresartificial neural network ingest somesort of input, manipulate the input inAlgorithms Techniquessome way, and then output informationto other neurons. The major difference is that the human brain contains approximately100 billion neurons, while an artificial neural network contains a miniscule fraction of that.In order to develop a meaningful deep learning model, we need to feed it large amountsof data, translate the date into a language that the model can understand, building theunderlying architecture to support the model, and then finally train, test, and evaluate themodel.In our malicious URLs example, we canMachine Learningleverage large sets of data to recognizeOutputInput Datacharacteristics of benign and maliciousURLs automatically. Eventually, ourmodel will be able to predict thelikelihood that a given URL is maliciouswithout storing signatures or blacklists on the local machine. We’re left with a generalizedmodel that covers the entire distribution of data, whereas signatures can only detectsmall subsets of samples.With our research, we are able to automate detection processes and push updates lessfrequently. Instead of analyzing a suspicious URL against many signatures for a possiblematch, it can be passed through our URL model and assigned a score based upon howmalicious it appears. If the score is above a certain threshold, the URL will be blocked.Customer machines don’t need to be connected to the internet to receive updates everyday in order to be protected. With deep learning, updates are just newly trained modelsbased on the same feature engineering techniques; therefore, we can continuouslyimprove the architecture of our model without redesigning its features. Features areextracted continuously and easily without requiring changes to our collection method,and changes to the model itself are largely unnecessary. We simply retrain the model so itcan predict what’s next in the current landscape.4

Machine Learning: How to Build a Better Threat Detection ModelFeature Engineering in Machine LearningBefore creating a machine learning model, it’s important to prepare our data. Preparingthe data requires translating it into a language our model can understand. This is referredto as feature engineering.Artificial neural network models intake data as a vector of information, so simply feedingthe model a URL – which is not in the language of a vector – means that the modelcan’t process it without some manipulation. There are countless ways that samplescan be translated into features, though it takes some domain knowledge to do so.Using the URL example again, one way to translate a URL into a usable language isthrough a combination of ngramming and hashing. Ngrams are a popular method inDNA sequencing research. For example, the results of a three-gram ngram for the URL“https://sophos.com/company/careers.aspx” would be:['htt', 'ttp', 'tps', 'ps:', 's:/', '://', '//s', '/so', 'sop', 'oph', 'pho', 'hos', 'os.', 's.c', '.co', 'com', 'om/', 'm/c','/co', 'com', 'omp', 'mpa', 'pan', 'any', 'ny/', 'y/c', '/ca', 'car', 'are', 'ree', 'eer', 'ers', 'rs.', 's.a', '.as', 'asp','spx']Once the ngrams are calculated, we need to translate them into a numericalrepresentation. This can be done through a hashing mechanism. We will create ann-length long vector – say 1000 – and hash each ngram using a hashing algorithm. Theresulting number from the hash of a particular ngram will be the index of which we willadd 1. For example, if the first ngram ‘htt’ results in a hash of three and our vector is fiveunits long, the result would be [0, 0, 1, 0, 0]. We continue this process for every ngramand for every URL until we have the list of URLs completely transformed into individualn-length vectors. When using this method for our toy model, these vectors will be 1,000units long.Artificial Neural NetworksDeep learning typically refers to three major components that, when combined together,allow for the creation of very powerful predictive models:1.A connected graph of layers wherein each layer takes input from a parent layer,mixes the data together in some predefined way, and outputs it to the next layerin the graph2.A loss function that measures how accurate the model makes its predictions3.An algorithm that optimizes the loss function and trained datasetLayersLayers are made of interconnected nodes, or neurons. Each layer is some differentiablefunction that takes in a set of input weights, does some basic manipulation, and outputsthe result as a set of output weights. Layers can be split into two categories: (1) layersthat mix together input weights or (2) activation functions that independently act uponeach input weight.5

Machine Learning: How to Build a Better Threat Detection ModelA layer that mixes input weights together is known as a dense layer. A dense layer existswhen all the neurons in a particular layer are connected to all those in the next layer. Forexample, one neuron in this layer could mix together inputs [1, 2, 3] with weights [.5, .5,1] to result in an output of [.5, 1, 3] after the inputs and weights are multiplied. In Figure1, the weights input into a neuron are displayed next to the letter W alongside the arrowspointing to the neurons.Figure 1: Sigmoid and ReLU are both commonly used activation functionsThe next layer is known as the activation layer. The results from the previous layer are fedinto the activation function associated with this layer to provide an output. The differentactivation functions available are softmax, ReLU, tanh, ELU, sigmoid, linear, softplus,softsign, and hard sigmoid. For hidden layers, sigmoid and ReLU are both commonly usedactivation functions. Sigmoid ranges from 0 to 1, while ReLU ranges from 0 to infinity.Deep learning commonly uses ReLU because it handles certain constraints better thansigmoid.Simply combining layers that wedescribed above typically resultsin overfitting. Overfitting occurswhen our model learns only thetraining data but does not performwell on any new data. This is whyalmost every deep neural network isregularized in some way. We can regularize the network by either directly regularizing theweights inside a layer (for example, L1 or L2 regularization), or we can put regularizationlayers in between standard layers.6

Machine Learning: How to Build a Better Threat Detection ModelTwo commonly used regularization layers are dropout layer and batch normalization layer.Dropout layer is a regularization used to reduce overfitting the model against the trainingset, and serves to help the model improve its prediction generalization when working withnew datasets.Dropout works by randomly dropping a designated percentage of weights to zero, whichhelps neurons learn different things from the data. By combining the neurons, the modelproduces a stronger classifier and ensures the overall network will not depend on oneneuron alone.Figure 2: The impact of dropouts on neural networksBatch normalization regulates batches of input before sending them to the next layer,resulting in each batch having a mean of zero and a standard deviation of one. This canaccelerate learning and improve accuracy by removing certain outliers. Read more onbatch normalization.Loss FunctionOnce we lay out a model graph, we need to train the model to accurately classify theresults. In our example here, we need to train our model to properly distinguish betweengood and bad URLs. The first thing we need is a way to measure how successful ourmodel is during each step of the training process. This measurement, which needs to bedifferentiable, is referred to as a loss function. Various loss functions can be used for thesame model, and each can potentially yield somewhat different results. For classificationtasks, such as URL detection, the most common loss function used is cross-entropy.Cross-entropy is used to quantify the difference, or loss, between the distribution of amodel’s predictions and the actual label’s distribution. We are measuring how far awaythe model is from the optimal solution: where the prediction distribution and the actualdistribution match.When we use a sigmoid output as our final layer, we get an output of two probabilitiesfor each URL: the probability that the URL is benign and the probability that the URLis malicious. Let’s assume the threshold in this scenario is 0.5, meaning a probabilitygreater than or equal to 0.5 is malicious and anything less is benign. We can thencalculate the cross-entropy loss for each URL as depicted in the example below:7

Machine Learning: How to Build a Better Threat Detection py Loss[0.3 0.7]Malicious[0 1] (malicious)True-(log(0.3)*0 log(0.7)*1) - log(0.7) 0.36[0.6 0.4]Benign[1 0] (benign)True-(log(0.6)*1 log(0.4)*0) - log(0.6) 0.51[0.2 0.8]Malicious[1 0] (benign)False-(log(0.2)*1 log(0.8)*0) - log(0.2) 1.6What the model uses as the cross-entropy error is the average of all training samples. Inthis case, the average cross-entropy is: -(log(0.7) log(0.6) log(0.2))/3 0.83Our goal is to minimize the average cross-entropy loss to improve the trustworthiness ofour model.OptimizationOptimization is the process of adjustingmodel weights in a way that minimizes theaverage loss over all the training samples.Imagine weights on horizontal axes and losson a vertical axis, and for simplicity, the lossfunction looks like a parabolic bowl. The goal isto find the weights at the bottom of the bowl asdepicted in the far-right image of Figure 3.Figure 3This method is called Stochastic Gradient Descent and is a process that updates weightsthrough the gradient of the loss. The method by which you calculate the gradient of ourloss function is called backpropagation.This can be described in three steps:1.Feed the model input and measurethe error of the output using the lossfunction.2.Update weights using the gradients; inother words, adjust them in a way thatreduces the error.3.Repeat this process for all trainingsamples until the weights are no longerchanging.The mathematics behind this process are beyond the scope of this article, so we will notgo into further detail. However, additional resources on the topic can be found here:ÌÌhttps://www.youtube.com/watch?v o.gl/vUokCZ8

Machine Learning: How to Build a Better Threat Detection ModelOptimizing the model requires feeding the model batches of data, and running over thatdata a certain number of times – also known as an epoch. Feeding the model is done inbatches because the size of our data prevents it from being exposed to the algorithmcomputationally at one given time. The higher the batch size, the more memory needed.A single epoch means the algorithm has seen every input once. If epoch is set to 10, forexample, the model will see each input 10 times. Batch size and number of epochs aretwo parameters that are decided before the fitting of the model begins. Once we havetrained and optimized the model, we must then evaluate its performance to determine ifit is ready for deployment to customers.Evaluating the PerformanceWhen a model predicts a URL as malicious, there is always a chance that the model isincorrect. Conversely, there is also a chance that the model predicts a URL as benignwhen it is actually malicious. Knowing how much to trust a model’s decision is animportant aspect of evaluating its performance.When a URL is predicted to be malicious but is actually benign, the event is considereda false positive (FP). When a URL is predicted to be benign but is actually malicious, theevent is considered a false negative (FN). Correctly classified malicious URLs are truepositives (TP) and correctly classified benign URLs are true negatives (TN). These fourcategories are combined to create metrics that help evaluate our models.Precision is one of the measures that gives us an idea about how trustworthy the modelis. Precision is calculated using the following formula:Recall is a metric used to understand how many bad URLs the model missed, whichgives a better picture of how well the model detects bad URLs. Recall is also known asthe true positive rate (TPR). TPR is an indicator of all the bad URLs the model has seenand how many the model correctly labeled as bad. Recall is calculated using the followingformula:Before deploying a model, a decision threshold is set. If the probability output from themodel for the URL is greater or equal to the threshold, the URL is predicted malicious; ifit is less than the threshold, it is predicted benign. We decide the threshold based on thedesired false positive rate (FPR) that results when applied to the test dataset. The falsepositive rate is the rate at which the model will detect a URL that is actually benign. Whenwe change that threshold, precision and recall will change as well because the

machine learning methods to be leveraged across our entire portfolio. One of our first challenges is supplementing reactive, human-based malware research with predictive machine learning models. This challenge is very unique, and can be an afterthought in traditional machine learning cybersecurity literature.

Related Documents:

Specification and Price of Automatic Rendering Machine (FOB ... - AR

decoration machine mortar machine paster machine plater machine wall machinery putzmeister plastering machine mortar spraying machine india ez renda automatic rendering machine price wall painting machine price machine manufacturers in china mail concrete mixer machines cement mixture machine wall finishing machine .

16 Views

3m ago

Mathematical Methods in Machine Learning - UMD

Machine learning has many different faces. We are interested in these aspects of machine learning which are related to representation theory. However, machine learning has been combined with other areas of mathematics. Statistical machine learning. Topological machine learning. Computer science. Wojciech Czaja Mathematical Methods in Machine .

27 Views

1y ago

Lecture 1: Machine Learning Problem - University of Adelaide

Machine Learning Real life problems Lecture 1: Machine Learning Problem Qinfeng (Javen) Shi 28 July 2014 Intro. to Stats. Machine Learning . Learning from the Databy Yaser Abu-Mostafa in Caltech. Machine Learningby Andrew Ng in Stanford. Machine Learning(or related courses) by Nando de Freitas in UBC (now Oxford).

36 Views

1y ago

Machine Learning - B. Supervised Learning: Nonlinear Models B.5. A ...

Machine Learning Machine Learning B. Supervised Learning: Nonlinear Models B.5. A First Look at Bayesian and Markov Networks Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL .

16 Views

1y ago

Cloud Essentials: Build Machine Learning Solutions with Oracle's ...

their use of AI and machine learning, 92 percent of today's companies use machine learning technology in some fashion and 85 percent are building predictive models with machine learning tools. 2 . For example, financial institutions use machine . learning to determine a person's credit score to aid in loan approval decisions. Manufacturers use

9 Views

1y ago

Craft Council of Newfoundland and Labrador - Webflow

work/products (Beading, Candles, Carving, Food Products, Soap, Weaving, etc.) ⃝I understand that if my work contains Indigenous visual representation that it is a reflection of the Indigenous culture of my native region. ⃝To the best of my knowledge, my work/products fall within Craft Council standards and expectations with respect to

310 Views

2y ago

Flock: Hybrid Crowd-Machine Learning Classiﬁers - Stanford University

with machine learning algorithms to support weak areas of a machine-only classiﬁer. Supporting Machine Learning Interactive machine learning systems can speed up model evaluation and helping users quickly discover classiﬁer de-ﬁciencies. Some systems help users choose between multiple machine learning models (e.g., [17]) and tune model .

52 Views

7m ago

Artificial Intelligence, Machine Learning, Deep Learning ...

Artificial Intelligence, Machine Learning, and Deep Learning (AI/ML/DL) F(x) Deep Learning Artificial Intelligence Machine Learning Artificial Intelligence Technique where computer can mimic human behavior Machine Learning Subset of AI techniques which use algorithms to enable machines to learn from data Deep Learning

178 Views

3y ago

Recent Views

Guidance for opponents in civil legal aid cases - Scottish Legal Aid Board

injury case - may apply for civil legal aid (since this leaﬂet deals only with civil legal aid, where we refer to "legal aid" we mean "civil legal aid"). Legal aid is ﬁnancial help from public funds. It helps people who qualify to get legal advice and the help of a solicitor to put their case in court.

4m ago

110 Views

WHAT TO DO IF YOU ARE SEXUALLY HARASSED

There are many legal clinics or legal information centres you can contact to obtain legal information, educational resources or legal referrals. Alberta Central Alberta Community Legal Clinic (Red Deer) Centre for Public Legal Education Alberta Pro Bono Law Alberta Women's Centre Legal Advice Clinic (Calgary)

3y ago

245 Views

Legal Advocacy Essentials

Legal Advocacy Essentials: a core training for legal advocates Presented by the Washington State Coalition Against Domestic Violence, 2008. This information is not intended as a substitute for legal advice. 1 Legal Advocacy Essentials . A core training for legal advocates . Table of Contents . What is a legal advocate?

1y ago

249 Views

Legal & Corporate Services: Strategic Plan - CP6

the provision of legal advice, managing legal risk and managing the legal supply chain. By doing this well, the team will move towards its vision. Legal Services is made up of 4 teams, each serving different customers with a dedicated legal resource. This is summarised in the figure right. Although Legal Services has customerdistinct, -focussed .

1y ago

171 Views

Legal Proceedings and Legal Privilege Exemptions: Myth-busting - ICO

If asking for legal advice, say so, and start new email chain If giving legal advice, say so Involve lawyers (before litigation contemplated) Maintain confidentiality of legal advice documents Limit dissemination of legal advice (need to know; original only) Make internal communications re legal advice factual

1y ago

240 Views

Community Fundraising Kit - Marrickville Legal Centre

Is a CLC the same as Legal Aid? Community legal centres are not the same as Legal Aid. Legal Aid NSW is a government body that provides legal services to people who experience significant disadvantage across NSW. Legal Aid provides assistance for criminal, family and civil law plus domestic and family violence.

6m ago

70 Views

Dafne-EFC 2020 Legal Environment for Philanthropy in .

Dafne-EFC Philanthropy Advocacy: 2020 Legal Environment for Philanthropy in Europe, Switzerland 3 I.Legal framework for foundations 1. Does the jurisdiction recognise a basic legal definition of a foundation? (please describe) What different legal types of foundations exist (autonomous organisations with legal

3y ago

215 Views

Legal Studies - Washington University in St. Louis

Legal Studies (02/09/21) Legal Studies The Legal Studies minor is an interdisciplinary program that allows students to study the role of law and legal institutions in society. Students who minor in Legal Studies learn about law in courses from anthropology, economics, history, philosophy, political science and other disciplines.

3y ago

183 Views

CLASS K - LAW

K85-89 Legal research K94 Legal composition and draftsmanship K100-103 Legal education K109-110 Law societies. International bar associations K115-130 The legal profession K133 Legal aid. Legal assistance to the poor K140-165 History of law K170 Biography K

2y ago

172 Views

Contract Management in Corporate Legal Departments .

May 25, 2016 · Relationship Between Legal, Finance, & the Business Create/Negotiate Activate Perform Analyze Renew Business Business Legal Legal Finance Finance Business Legal Finance Business . - Collaboration Legal Portal - Standard Operating Procedures - KPIs Dashboards - Reports. Technology Enabled Contract Management Best Practices 1. Initiate/

2y ago

361 Views

Persuasive Legal Writing

the court just focuses on the facts of the crime and hardly addresses any legal issue. The way to convince a court that a legal issue is worth reversing on requires that we have more than a legal basis to appeal - it requires us to put the legal issue in the context of a persuasive storyline. Sometimes the storyline will be about the legal issue.

1y ago

129 Views

Legal AI - Thomson Reuters

of the legal AI market, for example in relation to contract generation and completion. In short, legal AI has a potential use wherever there are people who must deal with legal documents or address legal queries, especially where those legal needs are expressed through text, which AI experts refer to as 'unstructured data'.

1y ago

123 Views

Legal Information vs Legal Advice Guidelines - TMCEC

giving legal advice. Legal advice is a written or oral statement that: o Interprets some aspect of the law, court rules, or court procedures; o Recommends a specific course of conduct a person should take in an actual or potential legal proceeding; or o Applies the law to the individual person's specific factual circumstances. What is Legal .

1y ago

225 Views

Smart legal contracts Advice to Government

The forms a smart legal contract can take 22 Use cases for smart legal contracts 30 Costs and benefits of smart legal contracts 35. CHAPTER 3: FORMATION OF SMART LEGAL CONTRACTS 39. The law on contract formation 39 Agreement 39 Consideration 49 Certainty and completeness 50 Intention to create legal relations 54 Formality requirements 57

1y ago

162 Views

CSR FREQUENTLY ASKED QUESTIONS - Legal Services Corporation

Because of this lack of legal analysis applying the law to the client's unique circumstances, these letters do not meet the definition of legal assistance (legal advice is a subset of legal assistance) set forth in Section 2.2 of the 2008 CSR Handbook which reads: For CSR purposes, legal assistance is defined as the provision of limited service

1y ago

140 Views

Machine Learning: How To Build A Better Threat Detection Model

It looks like you're using an ad-blocker