The goal of this post is to walk you through the steps to create and train an AI deep learning neural network for anomaly detection using Python, Keras and TensorFlow. I will not delve too much in to the underlying theory and assume the reader has some basic knowledge of the underlying technologies. However, I will provide links to more detailed information as we go and you can find the source code for this study in my GitHub repo.
We will use vibration sensor readings from the NASA Acoustics and Vibration Database as our dataset for this study. In the NASA study, sensor readings were taken on four bearings that were run to failure under constant load over multiple days. Our dataset consists of individual files that are 1-second vibration signal snapshots recorded at 10 minute intervals. Each file contains 20,480 sensor data points per bearing that were obtained by reading the bearing sensors at a sampling rate of 20 kHz.
You can download the sensor data here. You will need to unzip them and combine them into a single data directory.
Anomaly detection is the task of determining when something has gone astray from the “norm”. Anomaly detection using neural networks is modeled in an unsupervised / self-supervised manner; as opposed to supervised learning, where there is a one-to-one correspondence between input feature samples and their corresponding output labels. The presumption is that normal behavior, and hence the quantity of available “normal” data, is the norm and that anomalies are the exception to the norm to the point where the modeling of “normalcy” is possible.
We will use an autoencoder deep learning neural network model to identify vibrational anomalies from the sensor readings. The goal is to predict future bearing failures before they happen.
I will be using an Anaconda distribution Python 3 Jupyter notebook for creating and training our neural network model. We will use TensorFlow as our backend and Keras as our core model development library. The first task is to load our Python libraries. We then set our random seed in order to create reproducible results.
The assumption is that the mechanical degradation in the bearings occurs gradually over time; therefore, we will use one datapoint every 10 minutes in our analysis. Each 10-minute data file sensor reading is aggregated by using the mean absolute value of the vibration recordings over the 20,480 datapoints. We then merge everything together into a single Pandas dataframe.
Dataset shape: (984, 4) Output: Bearing 1 Bearing 2 Bearing 3 Bearing 4 2004-02-12 10:32:39 0.058333 0.071832 0.083242 0.043067 2004-02-12 10:42:39 0.058995 0.074006 0.084435 0.044541 2004-02-12 10:52:39 0.060236 0.074227 0.083926 0.044443 2004-02-12 11:02:39 0.061455 0.073844 0.084457 0.045081 2004-02-12 11:12:39 0.061361 0.075609 0.082837 0.045118
Next, we define the datasets for training and testing our neural network. To do this, we perform a simple split where we train on the first part of the dataset, which represents normal operating conditions. We then test on the remaining part of the dataset that contains the sensor readings leading up to the bearing failure.
To complete the pre-processing of our data, we will first normalize it to a range between 0 and 1. Then we reshape our data into a format suitable for input into an LSTM network. LSTM cells expect a 3 dimensional tensor of the form [data samples, time steps, features]. Here, each sample input into the LSTM network represents one step in time and contains 4 features — the sensor readings for the four bearings at that time step.
One of the advantages of using CNN cells is the ability to include multivariate features in your analysis. Here, it’s the four sensor readings per time step. However, in an online fraud anomaly detection analysis, it could be features such as the time of day, dollar amount, item purchased, internet IP per time step.
Neural Network Model
We will use an autoencoder neural network architecture for our anomaly detection model. The autoencoder architecture essentially learns an “identity” function. It will take the input data, create a compressed representation of the core / primary driving features of that data and then learn to reconstruct it again. For instance, input an image of a dog, it will compress that data down to the core constituents that make up the dog picture and then learn to recreate the original picture from the compressed version of the data.
The rationale for using this architecture for anomaly detection is that we train the model on the “normal” data and determine the resulting reconstruction error. Then, when the model encounters data that is outside the norm and attempts to reconstruct it, we will see an increase in the reconstruction error as the model was never trained to accurately recreate items from outside the norm.
We create our autoencoder neural network model as a Python function using the Keras library.
We then instantiate the model and compile it using Adam as our neural network optimizer and mean absolute error for calculating our loss function.
Finally, we fit the model to our training data and train it for 100 epochs. We then plot the training losses to evaluate our model’s performance.
By plotting the distribution of the calculated loss in the training set, we can determine a suitable threshold value for identifying an anomaly. In doing this, one can make sure that this threshold is set above the “noise level” so that false positives are not triggered.
Finally, we save both the neural network model architecture and its learned weights in the h5 format. The trained model can then be deployed for anomaly detection.
In the next article, we’ll do another way Anomaly Detection using LSTM Autoencoder using Keras