Machine Learning in Cyber Security – Part 1

Cybersecurity Is A Shared Responsibility, And It Boils Down To This: In Cybersecurity The More System We Secure, The More Secure We All Are.

The year 2016 witnessed advancements in artificial intelligence in self-driving cars, language translation, and big data. That same period, however, also witnessed the rise of ransomware, botnets, and attack vectors as popular forms of malware attack, with cybercriminals continually expanding their methods of attack (e.g., attached scripts to phishing emails and randomization), according to Malware Byte’s State of Malware report. To complement the skills and capacities of human analysts, organizations are turning to machine learning (ML) in hopes of providing a more forceful deterrent. ABI Research forecasts that “machine learning in cybersecurity will boost big data, intelligence, and analytics spending to $96 billion by 2021.” At the SEI, machine learning has played a critical role across several technologies and practices that we have developed to reduce the opportunity for and limit the damage of cyber-attacks. 

AI vs. Machine Learning

Before jumping into the details, what’s the difference between AI and machine learning. Put simply, AI is a field of computing, of which machine learning is one part. Specifically, AI encompasses any case where a machine is designed to complete tasks which, if done by a human, would require intelligence. Within AI there are a variety of technologies, including:

Machine learning — Machines which “learn” while processing large quantities of data, enabling them to make predictions and identify anomalies.

Knowledge representations — Systems of data representation that enable machines to solve complex problems.

Rule-based systems — Machines that process inputs based on a set of predetermined rules.

As the volume and complexity of threats have increased rapidly in recent years, security vendors have begun to leverage one or more elements of AI in an attempt to reduce the burden on human analysts. 

Machine learning refers to systems that can automatically improve with experience. Traditionally, no matter how many times you use software to perform the same task, the software won’t get any smarter. Always launch your browser and visit the same website? A traditional browser won’t “learn” that it should probably just bring you there by itself when first launched. With ML, the software can gain the ability to learn from previous observations to make inferences about both future behaviours, as well as guess what you want to do in new scenarios. From thermostats that optimize heating to your daily schedule, autonomous vehicles that customize your ride to your location, and advertising agencies seeking to keep ads relevant to individual users, ML has found a niche in all aspects of our daily life.

To understand how ML works we first need to understand the fuel that makes ML possible: Data. Consider an email spam detection algorithm. Original spam filters would simply blacklist certain addresses and allow other mail through. ML enhanced this considerably by comparing verified spam emails with verified legitimate email and seeing which “features” were present more frequently in one or the other. For example, intentionally misspelt words (“F@ceb00k “), the presence of hyperlinks to known malicious websites, and virus-laden attachments are likely features indicative of spam rather than legitimate email. This process of automatically inferring a label (i.e., “spam” vs “legitimate”) is called classification and is one of the major applications of ML techniques. It is worth mentioning that one other very common technique is forecasting, the use of historical data to predict future behaviour.

Share your thoughts