Machine Learning in Cyber Security 💻

9 min readJun 6, 2021

Machine learning has become a vital technology for cybersecurity. Machine learning preemptively stamps out cyber threats and bolsters security infrastructure through pattern detection, real-time cybercrime mapping, and thorough penetration testing.

Emerging technologies put cybersecurity at risk. Even the new advancements in defensive strategies of security professionals fail at some point. Besides, as offensive-defensive strategies and innovations are running in a never-ending cycle, the complexity and volume of cyberattacks have increased. Combining the strength of artificial intelligence (AI) with cybersecurity, security professionals have additional resources to defend vulnerable networks and data from cyber attackers. After applying this technology, it brought instant insights, resulting in reduced response times. Capgemini recently released a report based on AI in cybersecurity, which mentions that 42% of the companies studied had seen a rise in security incidents through time-sensitive applications. It also revealed that two out of three organizations are planning to adopt AI solutions by 2020.

Challenges and Promises of Artificial Intelligence in Cybersecurity

While cybersecurity experts have accepted AI as the future of the industry, finding solutions to its problems is still not adequately addressed. Apart from being a solution, it is a considerable threat to businesses.

AI can efficiently analyze user behaviors, deduce a pattern, and identify all sorts of abnormalities or irregularities in the network. With such data, it’s much easier to identify cyber vulnerabilities quickly. In spite of being a security risk to businesses, AI will continue to minimize routine security responsibilities with high-quality results. AI automation will be able to identify recurring incidents and even remediate them. It will also be able to manage insider threats and device management.

AI Adopters Inspiring to Make a Shift

AI has already been adopted to strengthen the security infrastructure of organizations. There are numerous real-life examples where AI-powered solutions are significantly improving cybersecurity.

Gmail uses machine learning to block 100 million spams in a day. It has developed a system to filter out emails and offer a spam-free environment efficiently.
IBM’s Watson cognitive training uses machine learning to detect cyber threats and other cybersecurity solutions.
Google is using Deep Learning AI on its Cloud Video Intelligence platform. On this platform, the videos stored on the server are analyzed based on their content and context. The AI algorithms send security alerts whenever something suspicious is found.
Balbix platform uses AI-powered risk predictions to protect the IT infrastructure against data and security breaches.

Understanding Confusion Matrix

When we get the data, after data cleaning, pre-processing and wrangling, the first step we do is to feed it to an outstanding model and of course, get output in probabilities. But how can we measure the effectiveness of our model? Better the effectiveness, better the performance and that’s what we want. And it is where the Confusion matrix comes into the limelight. Confusion Matrix is a performance measurement for machine learning classification.

There are multiple ways of finding errors in the machine learning model. The Mean Absolute Error (Error/cost) function helps the model to be trained in the correct direction by trying to make the distance between the Actual and predicted value to be 0. We find the error in machine learning model prediction by “y — y^”.

Mean Square Error (MSE): Points from the data set are taken and they are squared first and then the mean is taken to overcome the error.

In Binary Classification models, the error is detected with the help of a confusion matrix.

Confusion Matrix is a performance measurement for machine learning classification problems where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy, and most importantly AUC-ROC curves.

Understanding Confusion Matrix in a simpler manner:

Let’s start with an example confusion matrix for a binary classifier (though it can easily be extended to the case of more than two classes):

What can we learn from this matrix?

● There are two possible predicted classes: “yes” and “no”. If we were predicting the presence of a disease, for example, “yes” would mean they have the disease, and “no” would mean they don’t have the disease.

● The classifier made a total of 165 predictions (e.g., 165 patients were being tested for the presence of that disease).

● Out of those 165 cases, the classifier predicted “yes” 110 times, and “no” 55 times.

● In reality, 105 patients in the sample have the disease, and 60 patients do not.

Let’s now define the most basic terms, which are whole numbers (not rates):

● true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease.

● true negatives (TN): We predicted no, and they don’t have the disease.

● false positives (FP): We predicted yes, but they don’t have the disease. (Also known as a “Type I error.”)

● false negatives (FN): We predicted no, but they do have the disease. (Also known as a “Type II error.”)

I’ve added these terms to the confusion matrix, and also added the row and column totals:

This is a list of rates that are often computed from a confusion matrix for a binary classifier:

Accuracy: Overall, how often is the classifier correct?

o (TP+TN)/total = (100+50)/165 = 0.91

Misclassification Rate: Overall, how often is it wrong?

o (FP+FN)/total = (10+5)/165 = 0.09

o equivalent to 1 minus Accuracy

o also known as “Error Rate”

True Positive Rate: When it’s actually yes, how often does it predict yes?

o TP/actual yes = 100/105 = 0.95

o also known as “Sensitivity” or “Recall”

False Positive Rate: When it’s actually no, how often does it predict yes?

o FP/actual no = 10/60 = 0.17

True Negative Rate: When it’s actually no, how often does it predict no?

o TN/actual no = 50/60 = 0.83

o equivalent to 1 minus False Positive Rate

o also known as “Specificity”

Precision: When it predicts yes, how often is it correct?

o TP/predicted yes = 100/110 = 0.91

Prevalence: How often does the yes condition occur in our sample?

o actual yes/total = 105/165 = 0.64

A couple of other terms are also worth mentioning:

Null Error Rate: This is how often you would be wrong if you always predicted the majority class. (In our example, the null error rate would be 60/165=0.36 because if you always predicted yes, you would only be wrong for the 60 “no” cases.) This can be a useful baseline metric to compare your classifier against. However, the best classifier for a particular application will sometimes have a higher error rate than the null error rate, as demonstrated by the Accuracy Paradox.

Cohen’s Kappa: This is essentially a measure of how well the classifier performed as compared to how well it would have performed simply by chance. In other words, a model will have a high Kappa score if there is a big difference between the accuracy and the null error rate. (More details about Cohen’s Kappa.)

F Score: This is a weighted average of the true positive rate (recall) and precision. (More details about the F Score.)

ROC Curve: This is a commonly used graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class. (More details about ROC Curves.)

Precision: The precision metric shows the accuracy of the positive class. It measures how likely the prediction of the positive class is correct.

The maximum score is 1 when the classifier perfectly classifies all the positive values. Precision alone is not very helpful because it ignores the negative class. The metric is usually paired with the Recall metric. The recall is also called sensitivity or true positive rate.

● Sensitivity: Sensitivity computes the ratio of positive classes correctly detected. This metric gives how good the model is to recognize a positive class.

Is it necessary to check for recall (or) precision if you already have a high accuracy?

We can not rely on a single value of accuracy in classification when the classes are imbalanced. For example, we have a dataset of 100 patients in which 5 have diabetes and 95 are healthy. However, if our model only predicts the majority class i.e. all 100 people are healthy even though we have a classification accuracy of 95%.

When to use Accuracy / Precision / Recall / F1-Score?

● Accuracy is used when the True Positives and True Negatives are more important. Accuracy is a better metric for Balanced Data.

● Whenever a False Positive is much more important use Precision.

● Whenever a False Negative is much more important use Recall.

● F1-Score is used when the False Negatives and False Positives are important. F1-Score is a better metric for Imbalanced Data.

Why do we need a Confusion matrix?

Here are the pros/benefits of using a confusion matrix.

● It shows how any classification model is confused when it makes predictions.

● The confusion matrix not only gives you insight into the errors being made by your classifier but also the types of errors that are being made.

● This breakdown helps you to overcome the limitation of using classification accuracy alone.

● Every column of the confusion matrix represents the instances of that predicted class.

● Each row of the confusion matrix represents the instances of the actual class.

● It provides insight not only into the errors which are made by a classifier but also errors that are being made.

Cyberattack detection is a classification problem, in which we classify the normal pattern from the abnormal pattern (attack) of the system.

Cyber Attack Detection and Classification Using Parallel Support Vector Machine

Cyber-attack is becoming a critical issue of organizational information systems. Several cyber-attack detection and classification methods have been introduced with different levels of success that are used as a countermeasure to preserve data integrity and system availability from attacks. The classification of attacks against computer networks is becoming a harder problem to solve in the field of network security.

Soon, AI-powered systems will be an integral part of cybersecurity solutions. It will also be used by cybercriminals to harm organizations. This will leave AI using automated programs susceptible to advanced threats. Like any other cybersecurity solution, AI is not 100% foolproof. It is a double-edged sword with the ability to limit cyber-attacks and automate mundane routine tasks, and yet, it’s a blessing. The automation wave will take over everyday tasks while the same technology will increase the chances of fewer human errors and negligence.

I hope you liked this article.💖

Would definitely like to hear your views on this and feedbacks so that I can improve on those points in future articles. 🙌 Comment your views below.

You can also check my LinkedIn profile and connect with me.

Follow me on medium as I will come up with articles on various technologies like Cloud Computing, DevOps, Automation, and their integration.

Rahul Sil - Certified Kubernetes Application Developer ( CKAD ) - ARTH - The School of Technologies…Hello everyone, I am Rahul Sil, a technology enthusiast. I love playing with technology and to try out various…
www.linkedin.com

That’s all for now. Thank You !! 😊✌

Machine Learning in Cyber Security 💻

Challenges and Promises of Artificial Intelligence in Cybersecurity

AI Adopters Inspiring to Make a Shift

Understanding Confusion Matrix

Rahul Sil - Certified Kubernetes Application Developer ( CKAD ) - ARTH - The School of Technologies…

Hello everyone, I am Rahul Sil, a technology enthusiast. I love playing with technology and to try out various…

Written by Rahul Sil