Category

Data Science

Category

Data Security: A Machine Learning Perspective!

Here is an article by one of our Top 5 Budding Data Scientists, Dr. Chitra Desai. Read how her refreshing take on Data Security. Dr. Desai is also ranked among INSAID’s Top Budding Data Scientist, click here to know more!

Data Science, Artificial Intelligence and Machine Learning are leveraging the possibilities of adding newer dimensions to businesses which are evolving over a period of time for enhancing productivity and improving over the processes. These recent computation practices are useful in identifying causal and temporal relationship with massive data sets. From traditional machine learning to federated learning a whole new paradigm shift in computing is being observed.

At one end where every miniature data being captured is playing significant role in data science so is the security of this data is emerging as a big challenge while sharing data. Data security which in itself is science and study related to protecting data has rapidly grown since 1975. Cryptography is one way through which data security can be ensured. 

Ronald Rivest in his paper Cryptography and Machine Learning has referred these two fields as sister fields long back in 1991.The concept of machine Learning is almost 60 years back. Machine learning which was evolving with theoretical development with future prospects started gaining momentum from 2010 is found to be at the core of both Data Science and Artificial Intelligence. This is because the data and processes are now built around such using the newer technology that it has helped achieve ecosystem in true sense.

Ecosystem has become mature enough that using machine learning has become possible in business context. The relationship between data science and machine learning is stated as – when machine learning is used to solve business problem it becomes data science. One such business problem is data security which has brought in the FUD concern that is Fear, Uncertainty and Doubt on how to investment is solving business problems related to security. This definitely depends upon access to right data and it is through data science which liberates access to unprecedented data which enables to move from assumptions to fact.

The biggest challenge from the data security perspective is not only to know the reasons for breach of system but also how to secure the system from future attacks. The role of data science is to facilitate with a newer approach to solve the data security related problem.

“The big challenge that the whole security industry and the chief security officers have right now is that they’re always chasing yesterday’s attack… It’s flawed, because the attackers keep changing the attack vector.”

Nicole Eagan

Therefore, learning in real time about the events, and using Artificial Intelligence to recommend actions to take, even if the attack’s never been seen before brings in the role of machine learning through Data Science and Artificial Intelligence.

At one end while machine learning is used to find solutions to data security problems for preventing their systems from newer attacks, at the sane time, the intruders are also using machine learning to propel smarter attacks. To be more precise the power of machine learning algorithm is misused. It  can be well explained with reference to cryptography.

Cryptography is the science of converting the information or message into a non-readable form that is secure and immune to attack. In cryptography the sender encrypts the original message (plain text) to be transmitted in the insecure communication channel using a key. The receiver receives this encrypted message in the form of cipher text and decrypts it using a key.

Thus, a Cryptosystem comprises of algorithms for key generation, encryption and decryption. If the sender and receiver use same key for encryption and decryption, then that class of cryptographic algorithms are referred to as symmetric key algorithms. If the sender and receiver use different keys for encryption and decryption then they are referred to as asymmetric key or public key cryptography algorithms.

The strength of any cryptographic algorithm is based on the complexity of solving underlying mathematical problem. Public key cryptography uses one-way function that have trapdoor. These functions can be based on integer factorization, discrete logarithms, elliptical curves etc.

A function which is easy to compute but hard to invert is called a one-way function. Thus, a cryptanalyst who aims at obtaining all possible information about a cryptosystem to break the cryptosystem basically is a problem of learning an unknown function based on prior knowledge of class of possible functions.  

To elaborate, suppose a cryptographic algorithm which belongs to the class of public key cryptography based on integer factorization as in case of well-known RSA algorithm. To find non-trivial factors of composite numbers techniques such as trial division method, Pollard rho algorithm, Pollard p-1 algorithm etc exists. In RSA two large primes p and q gives a large composite number n and the entire security of this algorithm depends upon p and q. Let e be the encryption key and d be the decryption key. e and n are publicly available and d is private also called as unknown secrete key. If one succeeds in decomposing n, then d can be easily obtained and the cryptosystem is compromised. The cryptanalyst basically aims at determining the unknown secrete key that is exactly identifying the unknown cryptographic function.

Machine Learning in cryptography can be used both ways that is by makers of cryptosystem and breakers of cryptosystem. M. Abadi and D. G. Andersen at the Google Brain Team have come up with an approach implementing a cryptosystem that consists of three artificial neural networks adversely interacting together to learn to protect their communication. All three neural networks have a specific goal to achieve – encrypt, decrypt and intercept the encrypted message. Here the machine learning aspect is that none of the neural networks are given a specific encryption or decryption algorithm so they learn and optimize their own algorithms over time in order to communicate privately. The intruder attempts to learn to intercept the message to decrypt it to output the original plaintext. If the intruder succeeds the encryption decryption algorithm are further improved. The cryptosystem uses both supervised and unsupervised machine learning approach.

Last but not the least, well-known cryptographic algorithms today ensuring utmost data security are threaten by strength of quantum computers as they are powered with solving mathematical functions with faster rate. So just like quantum computation has evolve quantum cryptography so will quantum machine learning will find its niches.