EPP or Endpoint Protection is a (fairly) new viewpoint on cybersecurity that prioritizes attack paths to and from an endpoint rather than focusing on an individual device. With computer networks encompassing more than personal computers and network-bound peripherals (e.g., IDS\IPS, physical firewalls, routers, printers, etc.), it’s painfully obvious that safeguarding the routes leading in and out of those devices is superiorly efficient compared to single-device (cyber)security.

Endpoint protection covers more than antivirus protection – EPP and EDR (Endpoint Detection and Response), its counterpart, take into account all the known and soon-to-be-known attack vectors (i.e., the pathway to a device connected to the primary corporate network) and provide actionable detection, response, and remediation solutions.

Despite the fact that the human factor plays a key role in both EPP and EDR, operating under one’s own steam cannot possibly account for all the variables and subsequent outcomes. This is where Artificial Intelligence interjects; AI allows for better malicious content predictions whilst decreasing the cost associated with EPP and EDR. In this article, I’m going to cover Machine-Learning in Endpoint Protection and the various ML methodologies employed in detection & remediation. Of course, no ML repartee wouldn’t be complete without a rundown of the ML algorithms used to model threat data, so we’re going to have those as well. Enjoy!

What is Machine-Learning?

More than often, ML is mistaken for Artificial Intelligence. Since this article’s mostly focused on the technical aspect of ML-facing methodologies, I’m going to give you a quick rundown on the difference between Machine-Learning and Artificial Intelligence. So, AI is the machine’s ability to carry out human-like reasoning tasks, while ML is the approach used to teach a machine to think, act, and react just like a human being would. Some would be inclined to trivialize this aspect – how hard can it be to, well, to think?

Let’s not get lost in metaphysics here – sapience is something a machine cannot comprehend. A machine needs instructions; it needs to know the context, the input, the output, the content, and everything in between. They’re quite Sartrian off their own accord, n’est pas?  Machines need instructions to operate and, of course, the human is the one who provides them. Artificial Intelligence was founded on the premise that human-made thinktanks can and will achieve this so-called sapiency simulacrum through learning.

A human can learn, so why can’t a machine? Welcome to the field of Machine-Learning or the closest thing we have to a positronic brain. What’s the difference between AI and ML? AI is the fire-eyed pupil, eager to learn everything the world has to offer, while ML is the ‘teacher’.

In more technical terms, Machine-Learning is an AI sub-discipline that:

(…) aims to automatically improve the performance of the computer algorithms designed for particular tasks using experience (…) derived from training data, which may be defined as the sample data collected on previously recorded observations or live feedbacksOrhan G. Yalçın, 4 Machine Learning Approaches that Every Data Scientist Should Know

The goal of this endeavor is to ‘teach’ machine to use this gathered or human-provided experience to “learn and build mathematical models to make predictions and decisions.”

In Machine-Learning, amounts of data need to be fed into the algorithm for the machine to derive a pattern.

Naturally, the type and quantity of data supplied depend on the goal of the ‘training session’. When it comes to ML coaching, the most cited example is height prediction. As you know from your biology classes, a person’s height depends on gender, age, and weight. There’s even a formula for that, but I can’t seem to remember it at the moment. Anyway, you can teach the ML algorithm how to make out the height of any Tom, Dick, and Harry by supplying it with data pertaining to people’s age, gender, weight, and yes, even height.

The machine will extract the relationship between all those variables and use the ‘fruits’ of its experience to tell the height of a randomly chosen person. And because we’re also going to talk about machine-learning approaches, this method is called supervised learning.

In supervised learning, the machine’s taught how to recognize the relationship between a given input and a given output by studying similar pairings. This method is very useful when a large pool of labeled data is available. Supervised learning has two types of algorithms: classification and regression.

Classification algorithms are used to, well, classify, an observation/concept based on the value of each variable. I’m personally very fond of the “it’s a cat\it’s not a cat” exercise, but there are many other classification examples out there.


Regression, the second type of supervised learning algorithm, is used to compute a certain value based on the rapport between other variables. For instance, a regression algorithm can be used to forecast the sale value of a house based on an input variable and a (continuous output variable).

I won’t go into too many details regarding classification and regression algorithms because it defeats the purpose of this article. If you’re interested in seeing them in action, you may want to check out Machine Learning Mastery’s blog post on regression and classification.

The second approach to machine learning is called unsupervised learning. What’s the difference between supervised and unsupervised learning? Feedback. Well, technically speaking, there are more things that set those two apart, but, fundamentally, unsupervised learning means giving no kind of feedback to the machine.

Heimdal Official Logo

Simple standalone security solutions are no longer enough.


Is an innovative multi-layered security approach to
organizational defense.
  • Next-gen Antivirus & Firewall which stops known threats;
  • DNS traffic filter which stops unknown threats;
  • Automatic patches for your software and apps with no interruptions;
  • Privileged Access Management and Application Control, all in one unified dashboard
Try it for FREE today Offer valid only for companies.

Oh, yes…forgot to mention – in supervised learning, you have to give the machine many pats on the back (or swats across the nose) in order to increase the algorithm’s efficiency. Getting back to unsupervised learning, we have no feedback, tons of unlabeled data, and a machine left to fend on its own.  This approach leans on two types of algorithms: clustering and dimensionality reduction.

Clustering algorithms – yes, there are many kinds of clustering methods – basically band together items into clusters or groups based on their similarities or lack thereof. Clustering analysis has numerous applications, especially in the medical fields (e.g., in radiology, clustering can be used to enhance the sensitivity of PET scanners, greatly increasing their abilities to differentiate between tissue types).

The second unsupervised learning algorithm is called dimensionality reduction. As I pointed out, each dataset packs variables – smaller datasets have a handful of them, while more complex models incorporate hundreds of them. Variables, which are also called features, help the algorithm create a mathematical model.

However, too many variables or features can confuse the machine. The dimensionality reduction method allows the machine to form relationships between features that can be correlated. This greatly increases the machine’s efficiency by cutting off all unnecessary variables (i.e., noise).

The last two ML methods are called semi-supervised and reinforcement learning. As you would imagine, semi-supervised learning is a combination of supervised and unsupervised learning, meaning that the machine might be fed with both labeled and unlabeled data and, yes, even given some feedback. Reinforcement learning is as close to human-like learning as machines will ever get (for now). Software agents (i.e., autonomous computer programs) are unleashed upon an environment, and it’s their job to get the job done (right) in order to seize the reward.

This particular machine-learning methodology is called agent-environment-action-reward. Basically, one or more agents are given a certain task to perform in an environment. The agent or the ‘swarm’ will try to effect changes on the environment in order to receive the reward – which can be positive or negative depending on the outcome of their actions and the impact on the environment.

Machine-Learning algorithms in malware detection and threat-hunting

Now it’s time to see how Machine-Learning algorithms facilitate the malware detection and threat-hunting processes.

Android Malware Detection using the Random Forest Algorithm

The random forest algorithm can be used to make uncannily accurate predictions, by coalescing several decisional trees. In malware detection, specifically Android malware, the random tree model can be successfully employed to quickly identify applications with malicious intentions. IJITEE’s paper on dynamic malware detection, reveals a 0.9% error margin in malware recognition when using the random tree model.

For the purpose of this endeavor, the researchers fed the algorithm a blend of malicious and benign Android applications (i.e., the dataset contained 15,000 malicious apps and around 30,000 applications labeled as benign). Maldroid – the name of the random tree algo – was tested against other approaches such as DE- BRIN, DroidAPIMiner, and MUDFLOW.  Throughout the session, the algorithm was asked two questions: “is this app malware?” and “is this app benign?” Answers were provided through the Android App Features extraction flow (see image below).


At the end of the experiment, Maldroid outperformed its peers with only 51 unrecognized malicious applications from the malicious dataset that contained over 13,000 malicious apps.

One-Class SVM (Support Vector Machines)

OCSVM is an algorithm used in unsupervised training to detect and classify abnormal patterns in network traffic. In the paper “Anomaly Detection Using Similarity-based One-Class SVM for Network Characterization”, the authors fed real network traffic data to the OC-SVM algo in an attempt to discover a malicious pattern in ingress and egress traffic. The results were gauged with four pre-defined KPIs: Total Incoming Traffic, Total Outgoing Traffic, Server Delay, and Network Delay – experimentation was performed on 70 network segments.

As for the experiment’s outcome, the OC-SVM algorithm managed to isolate all the anomalies in both inbound and outbound traffic, thus proving the efficiency of this particular approach in modulating the hyper-parameter data of a learning configuration. For more information about the OC-SVM approach, I highly encourage you to read the paper.

Rooting out Malicious Apps and Executables using E-BIRCH

E-BIRCH, which is short for Enhanced Balanced Iterative Reducing and Clustering using Hierarchical, is a cluster-type approach to detecting malicious payload and executables. BIRCH and E-BIRCH are the cornerstones of threat data-mining being used with undisputable efficiency in the classification and detection of both signature and signatureless malware.

The main advantage of using the enhanced version of the BIRCH algorithm lies in resource-usage efficiency. For instance, in Windows systems, the average detection time of a malicious executable is around 40 seconds for BIRCH and circa 30 seconds for E-BIRCH. Moreover, studies have revealed that E-BIRCH modeling is superiorly efficient to BIRCH in the area of malware detection – average malware identification time was around 500 seconds (8.3 minutes) for E-BIRCH and 800 seconds (13.3 minutes) for BIRCH.

K-Means Clustering

K-Means Clustering is an unsupervised learning method very useful in detecting Windows registry changes associated with malicious activity. Most modern IDS solutions incorporate K-Means Clustering techniques for efficient threat-hunting.

For instance, K-Means Clustering and Euclidean Distance-based classifiers produce high (true) positive rates in a C2 detection model with malware binaries as input data. Integrating K-Mean Clustering into an existing IDS can bolster malware detection efficiency to up to 96%.

Takeaways and parting thoughts

To sum up, everything we’ve talked about so far – can a machine think? Provided that you give it enough instructions and a gentle ‘nudge’, a machine can be taught to sing and dance and reason just like a human being. Of course, there are limitations to what a ‘sapient’ machine can do, but that’s a matter of perspective – should you encourage Pinocchio to dream away his life thinking that he will one day become a true boy or just tell him that he’s unique.

As to Machine-Learning – we have four major ML approaches (supervised, unsupervised, semi-supervised, and reinforced), each of them suitable for carrying out certain types of tasks. In endpoint protection, ‘coached’ deep networks are more than suitable to ‘police’ the pathways leading in and out of those endpoints.

Now, in regards to the methods used in efficient threat-hunting, K-Means Clustering is great for IDS extension, BIRCH and E-BIRCH can quickly pick up malicious applications and executables, while One-Class SVMs work wonders against malicious packs hidden in network traffic. Let us not forget about the random forest algorithm, an approach that significantly increases the true positive detection rate of any IDPS system.

AI and ML are both budding computer science fields, with tons of discoveries to be made along the way. In the meanwhile, I advise you to perk up your defenses in order to secure all those pathways. Heimdal™ Security’s E-PDR suite is the perfect ally in your battle against malware. With ML and vulnerability detection capabilities, Heimdal’s EDR will help you secure your network on all fronts.

I hope you’ve enjoyed my article on ML and endpoint protection and, as always, for rants, questions, or beer donations, head to the comments section. Stay safe!

Leave a Reply

Your email address will not be published. Required fields are marked *