Application Security and Machine Learning

Application architecture hasn’t really changed all that much over the last 50 years. While we have gone from client-server to monoliths, to SOAs, and now containers and microservices, it’s all about new levels of abstraction and automation built on the same age-old concepts. Those concepts revolve around the core fundamental theory that computers are “dumb” machines that can’t make any decisions without explicit input from a user.

Machine Learning aims to change that since computers are now not only capable of making decisions without input, but also capable of “learning” without being humanly programmed. This brings brand new levels of abstraction and automation to the world of DevOps and container orchestration. In addition to being able to edit code to make it more efficient, research also suggests ML can now be used to create new programs.

Bigger Better Data

With new technology comes new complexities; and with that, additional vulnerabilities and security concerns. To fully comprehend the security implications of using ML to develop and deploy software in the future, it’s important to understand the scale at which these algorithms operate. Datasets used to train ML algorithms can range from a few Gigabytes to a few Terabytes, and sometimes even a few Petabytes.

To further elaborate, training an ML algorithm doesn’t just require massive volumes of data, it requires massive volumes of highly specific and contextualized data. The ‘garbage-in-garbage-out’ concept holds true here, which is why data scientists spend about 40% of their time gathering and cleaning data, and not on higher-level tasks such as data modeling.

The bigger and more valuable the target, the bigger the risk. Operating such a large volume of “user-specific” data and using it to train ML algorithms increases exposure to risk considerably. Care should be taken when handling data to ensure that the complexity and scale of it all doesn’t compromise security and user privacy.

Data Engineering

In a future where machines can code and most of the development process is automated, collecting and managing data to feed AI is going to be where the new challenges lie. This is because data can be manipulated and without proper security, attackers can easily feed false data to your algorithms. This can be done in a number of ways like dataset tampering or social engineering attacks, which ML models are especially vulnerable to.

Additional complications arise from the fact that ML typically uses three different datasets for training, testing, and production; and while only the third set is considered “valuable”, all three sets contain sensitive information and need to be guarded.

Another associated problem arises from the use of public datasets that are open to the public for contributions. While these datasets are a great public resource, no measures are in place to prevent attackers from “contributing” to and corrupting them.

Managing Secrets

DevSecOps “secrets” refer to all the information embedded in your code that you don’t want to share, this could include usernames and passwords, encryption keys, SSH keys, API tokens and the like. Managing secrets for humans is pretty straightforward since you’re basically just permitting or restricting access. With regards to non-human access, however, the numbers keep going up with VMs, servers, scripts, and containers; and it becomes unsustainable for IT teams to keep up.

This is why it’s important to automate the process of secret generation by allowing machines to generate their own “dynamic secrets.” Modern environments are ephemeral and it makes no sense to maintain centralized, static and shared secrets across the environment. Dynamic secrets are temporary and unique to an individual or an event, and an additional advantage to this approach is more visibility as each action can be audited on a “per-secret” basis. Conjur from CyberArk is a popular option for secret management that features dynamic secrets.

AI in App Architecture

When we talk about ML or AI and the way it will affect application architecture and security in the future, an important factor to remember is attackers have ML too and are far ahead in terms of deployment. While ML is being used in security for pattern recognition and anomaly detection, humans are still required to recognize new threat vectors. On the other hand, however, there are already a number of reports of hackers weaponizing AI, using it to hide malicious code and the use of AI triggers. To counter this challenge, it takes ML-powered security solutions that scan various threat vectors and return a vulnerability score and real-time alerts for various parts of the system.

What’s clear from all this is that as the use of ML grows in application development, the future of application security is, not surprisingly, ML itself. Organizations that recognize the power of ML and vendor solutions that leverage this power are the ones who will capitalize on the opportunity ahead.

In conclusion, with top organizations like Facebook, Yelp, Snapchat, Netflix and eBay all using ML algorithms to better their user experience, ML can’t be an afterthought when it comes to application security. It may take some time for this trend to trickle down to the larger enterprise ecosystem, but ML is well on its way to revolutionizing the future of application security.

Join the Conversation on the CyberArk Commons

If you’re interested in this and other open-source content, join the conversation on the CyberArk Commons Community. Secretless Broker, Conjur and other open-source projects are a part of the CyberArk Commons Community, an open community dedicated to developers, engineers, cybersecurity researchers, and other technically-minded people. To discuss Kubernetes, Secretless Broker, Conjur, CyberArk Threat Research, join me on the CyberArk Commons discussion forum.

Twain Taylor

Twain is a Fixate IO Contributor and began his career at Google, where, among other things, he was involved in technical support for the AdWords team. His work involved reviewing stack traces, and resolving issues affecting both customers and the Support team, and handling escalations. Later, he built branded social media applications, and automation scripts to help startups better manage their marketing operations. Today, as a technology journalist he helps IT magazines, and startups change the way teams build and ship applications.