Extraction of Credentials from Texts Captured Using Keyloggers

The aim of the proejct is to automate the process of finding useful data, particularly Credentials, from thousands of lines of text generated by keyloggers. Specially, we propose a Machine Learning Algorithm using Recurrent Neural Network(RNN) with Long Short Term Memory(LSTM) as the de facto text architectures to develop our model. We collected the data from Crack Station which contains 2 million leaked passwords, to train our model.

The proposed methodology is novel in its use of RNN with LSTM architecture for analyzing text data generated by keyloggers. This approach has been proven effective in other natural langauge processing tasks and is particularly suitable for analyzing text data that is generated over time.

We will inject our spyware into the victim’s system using some sort of social engineering and receive their key logs and feed it to our model to extract their credentials. Text, especially in the form of sentences, is a type of sequential data in which the order of words is important. In the case of password recognition, the order of characters in a password is also crucial. Traditional models used for tabular/matrix-type data cannot be applied to sequential data, requiring a different approach. Long Short-Term Memory networks (LSTMs) are a type of recurrent neural network (RNN) that were specifically designed to handle sequence data. LSTMs consist of neurons with a loop that enables them to remember information about the data in a sequence, making them particularly effective for text data