Machine Learning For Improved Data Security

Reading Time: 4 minutes

The reality these days is that malware is constantly reinventing itself, and as such the threat to data storage never minimizes to the extent that we’d like it to. Indeed, data breaches have been a major issue for company IT departments for as long as they have been storing data. Nowadays, it seems there’s a new wrinkle in malware development and distribution that reminds us the threat is as present as ever and an inescapable reality.

However, there is a new technology that is genuinely slowing the malware threat in countless industries, and data security stands to benefit from it considerably. Like any Canadian web hosting provider, we’re very attuned to the need for better security for big data, and especially so considering the ever-increasing level of sensitive and personal information being stored in large data centers. We tip our hats to those who have the expertise needed to counter the growth and advances of malware.

The technology we’re talking about is machine learning, and more specifically artificial intelligence (AI) within it. Many insiders claim it will revolutionize the way we go about protecting data. As it is now, companies are frequently dealing with more and more attacks as their networks and the data volume they handle grows.

Machine Learning From Antivirus Data

One specific area within data security for AI shows especially strong promise. Traditional antivirus (AV) software used the specific signature of malware to identify it, but that method is not ideal for a number of reasons. By making small changes to their malware to alter the signature slightly, these hackers in many cases made it so that the malware could slip past AV software undetected.

Current AI antivirus technology promises a far more sophisticated solution, despite not being AI in the traditional sense. By using machine learning (ML), this technology works by training a program with a large collection of malware data. Eventually it becomes able to recognize the characteristics of potential malware threats in general, and isn’t limited to only looking for signatures as the identifier of particular malware.

This means that provided the programs are kept up to date with new malware to so that they are constantly relearning and being challenged, even malware with completely new signatures can be rendered ineffective and there’s no need to update their software as strictly.

This is a perfect fit, as there is already a large body of data to train the programs on, and the bulk of new malware is not really ‘new’ – it builds off the foundations of other malicious programs. If your machine learning program has encountered a number other malware programs with most of the same core functionality, it becomes a situation where the hacker will need to invest a massive amount of time into creating malware that can disguise itself sufficiently.

Many of the cybersecurity AI firms will claim this is an all-powerful solution, but that might be a bit too grand of a claim. It does provide enough of a deterrent to protect against most typical threats, and primarily because hackers aren’t inclined to create a full malware program completely from scratch.

One important point to understand, however, is that without a large enough set of data these programs won’t be able to train themselves as effectively. Currently there is not quite enough data from network attacks to train machine learning programs as reliably as IT security professionals would like. There have been several hopeful attempts to find a suitable dataset, but so far that’s not been accomplished.

AI: The Solution for Human Error

As you might expect, all of these technological advances can be rendered ineffective if human error comes into play. If an authorized person is the one facilitating the breach, even the best security tools won’t be of assistance, and this of course does happen fairly often.

The majority of data breaches are not the result of malware forcing its way through firewalls undetected. Most breaches are the result of a simple mistake, and often it’s negligence or an untimely oversight. And commonly victims will say it’s an unfortunate reflection of the fact that they’re understaffed, underfunded, and undertrained.

Social engineering education is the solution here. Why? Because with it employees aren’t trained only to defend against common social engineering hacking tactics. When employees are trained, machine learning can be a strong and effective complement to best practices.

One particular tool that carries a big stick in this regard is Onfido ( It prevents identity fraud by verifying the login with a photo ID, a selfie, and machine learning algorithms. It identifies whether the right person is trying to log in, and then crawls the Web for any potential problems with that identity. Fraudulent data access is prevented with this technology, and even if passwords are compromised.

Monitoring Behavior Patterns

Another machine learning variation is capable of identifying the baseline online behavior for a particular identity, and then any deviation from standard patterns are flagged as indicating a possible malware threat. It’s not unlike your credit card company calling when someone makes a charge on your card on a different continent, but in the digital landscape instead.

It’s very promising for people like us to see how machine learning, and perhaps eventually true AI, can deliver the type of complement that effective data security practices need in response to the new realities of IT security risks. It’s only just beginning to open its lungs and breathe in full, but hopefully once it starts to roar we’ll be able to rest a little bit easier when it comes to knowing data is secure in our data centers.



Post Navigation