Anomaly Detection with CNN Autoencoder using Keras

The goal of this post is to walk you through the steps to create and train an AI deep learning neural network for anomaly detection using Python, Keras and TensorFlow. I will not delve too much in to the underlying theory and assume the reader has some basic knowledge of the underlying technologies. However, I will provide links to more detailed information as we go and you can find the source code for this study in my GitHub repo.

We will use vibration sensor readings from the NASA Acoustics and Vibration Database as our dataset for this study. In the NASA study, sensor readings were taken on four bearings that were run to failure under constant load over multiple days. Our dataset consists of individual files that are 1-second vibration signal snapshots recorded at 10 minute intervals. Each file contains 20,480 sensor data points per bearing that were obtained by reading the bearing sensors at a sampling rate of 20 kHz.

You can download the sensor data here. You will need to unzip them and combine them into a single data directory.

Anomaly detection is the task of determining when something has gone astray from the “norm”. Anomaly detection using neural networks is modeled in an unsupervised / self-supervised manner; as opposed to supervised learning, where there is a one-to-one correspondence between input feature samples and their corresponding output labels. The presumption is that normal behavior, and hence the quantity of available “normal” data, is the norm and that anomalies are the exception to the norm to the point where the modeling of “normalcy” is possible.

We will use an autoencoder deep learning neural network model to identify vibrational anomalies from the sensor readings. The goal is to predict future bearing failures before they happen.

I will be using an Anaconda distribution Python 3 Jupyter notebook for creating and training our neural network model. We will use TensorFlow as our backend and Keras as our core model development library. The first task is to load our Python libraries. We then set our random seed in order to create reproducible results.

import warnings
warnings.filterwarnings('ignore')
import tensorflow as tf
import os
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt
%matplotlib inline
from numpy.random import seed

from tensorflow.keras.layers import Conv1D, GlobalMaxPool1D, Dense, Flatten, Input, TimeDistributed, Dropout
from tensorflow.keras.models import Sequential, Model, load_model
from tensorflow.keras import regularizers

 

The assumption is that the mechanical degradation in the bearings occurs gradually over time; therefore, we will use one datapoint every 10 minutes in our analysis. Each 10-minute data file sensor reading is aggregated by using the mean absolute value of the vibration recordings over the 20,480 datapoints. We then merge everything together into a single Pandas dataframe.

# load, average and merge sensor samples
data_dir = "data\IMS\2nd_test"
merged_data = pd.DataFrame()

for filename in os.listdir(data_dir):
dataset = pd.read_csv(os.path.join(data_dir, filename), sep='\t')
dataset_mean_abs = np.array(dataset.abs().mean())
dataset_mean_abs = pd.DataFrame(dataset_mean_abs.reshape(1,4))
dataset_mean_abs.index = [filename]
merged_data = merged_data.append(dataset_mean_abs)

merged_data.columns = ['Bearing 1', 'Bearing 2', 'Bearing 3', 'Bearing 4']

# transform data file index to datetime and sort in chronological order
merged_data.index = pd.to_datetime(merged_data.index, format='%Y.%m.%d.%H.%M.%S')
merged_data = merged_data.sort_index()
merged_data.to_csv('Averaged_BearingTest_Dataset.csv')
print("Dataset shape:", merged_data.shape)
merged_data.head()
Dataset shape: (984, 4)
Output: 
Bearing 1 Bearing 2 Bearing 3 Bearing 4
2004-02-12 10:32:39 0.058333 0.071832 0.083242 0.043067
2004-02-12 10:42:39 0.058995 0.074006 0.084435 0.044541
2004-02-12 10:52:39 0.060236 0.074227 0.083926 0.044443
2004-02-12 11:02:39 0.061455 0.073844 0.084457 0.045081
2004-02-12 11:12:39 0.061361 0.075609 0.082837 0.045118

Next, we define the datasets for training and testing our neural network. To do this, we perform a simple split where we train on the first part of the dataset, which represents normal operating conditions. We then test on the remaining part of the dataset that contains the sensor readings leading up to the bearing failure.

train = merged_data['2004-02-12 10:52:39': '2004-02-15 12:52:39']
test = merged_data['2004-02-15 12:52:39':]
print("Training dataset shape:", train.shape)
print("Test dataset shape:", test.shape)
Output:
Training dataset shape: (445, 4)
Test dataset shape: (538, 4)

To complete the pre-processing of our data, we will first normalize it to a range between 0 and 1. Then we reshape our data into a format suitable for input into an LSTM network. LSTM cells expect a 3 dimensional tensor of the form [data samples, time steps, features]. Here, each sample input into the LSTM network represents one step in time and contains 4 features — the sensor readings for the four bearings at that time step.

scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(train)
X_test_scaled = scaler.transform(test)

X_train = X_train_scaled.reshape(X_train_scaled.shape[0], 1, X_train_scaled.shape[1])
print(“Training data shape:”, X_train.shape)
X_test = X_test_scaled.reshape(X_test_scaled.shape[0], 1, X_test_scaled.shape[1])
print(“Test data shape:”, X_test.shape)

Output:
Training data shape: (445, 1, 4)
Test data shape: (538, 1, 4)

 

One of the advantages of using CNN cells is the ability to include multivariate features in your analysis. Here, it’s the four sensor readings per time step. However, in an online fraud anomaly detection analysis, it could be features such as the time of day, dollar amount, item purchased, internet IP per time step.

We will use an autoencoder neural network architecture for our anomaly detection model. The autoencoder architecture essentially learns an “identity” function. It will take the input data, create a compressed representation of the core / primary driving features of that data and then learn to reconstruct it again. For instance, input an image of a dog, it will compress that data down to the core constituents that make up the dog picture and then learn to recreate the original picture from the compressed version of the data.

The rationale for using this architecture for anomaly detection is that we train the model on the “normal” data and determine the resulting reconstruction error. Then, when the model encounters data that is outside the norm and attempts to reconstruct it, we will see an increase in the reconstruction error as the model was never trained to accurately recreate items from outside the norm.

We create our autoencoder neural network model as a Python function using the Keras library.

def autoencoder_model(X):
    inputs = Input(shape=(X.shape[1], X.shape[2]))
    L1 = Conv1D(16, activation='relu', kernel_size=4, padding='same', kernel_regularizer= regularizers.l2(0.00) )(inputs)
    L2 = Conv1D(4, activation='relu', kernel_size=4, padding='same')(L1)
    L4 = Conv1D(4, activation='relu', kernel_size=4, padding='same')(L2)
    L5 = Conv1D(16, activation='relu', kernel_size=4, padding='same')(L4)
    output = TimeDistributed(Dense(X.shape[2]))(L5)    
    model = Model(inputs=inputs, outputs=output)

 

We then instantiate the model and compile it using Adam as our neural network optimizer and mean absolute error for calculating our loss function.

model = autoencoder_model(X_train)
model.compile(optimizer='adam', loss='mae', metrics=["accuracy"])
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 1, 4)              0         
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 1, 16)             272       
_________________________________________________________________
conv1d_2 (Conv1D)            (None, 1, 4)              260       
_________________________________________________________________
conv1d_3 (Conv1D)            (None, 1, 4)              68        
_________________________________________________________________
conv1d_4 (Conv1D)            (None, 1, 16)             272       
_________________________________________________________________
time_distributed_1 (TimeDist (None, 1, 4)              68        
=================================================================
Total params: 940
Trainable params: 940
Non-trainable params: 0
_________________________________________________________________

 

Finally, we fit the model to our training data and train it for 100 epochs. We then plot the training losses to evaluate our model’s performance.

nb_epochs = 100
batch_size = 10

import datetime
t_ini = datetime.datetime.now()

history = model.fit(X_train, X_train, epochs=nb_epochs, batch_size=batch_size,
validation_split=0.05).history

By plotting the distribution of the calculated loss in the training set, we can determine a suitable threshold value for identifying an anomaly. In doing this, one can make sure that this threshold is set above the “noise level” so that false positives are not triggered.

import seaborn as sns
sns.set(color_codes=True)
import matplotlib.pyplot as plt

X_pred = model.predict(X_train)
X_pred = X_pred.reshape(X_pred.shape[0], X_pred.shape[2])
X_pred = pd.DataFrame(X_pred, columns=train.columns)
X_pred.index = train.index

scored = pd.DataFrame(index=train.index)
Xtrain = X_train.reshape(X_train.shape[0], X_train.shape[2])
scored[‘Loss_mae’] = np.mean(np.abs(X_pred-Xtrain), axis = 1)

mx = round(max(scored[‘Loss_mae’]),2)

th = round(((mx * 10)/100) + mx,3)

X_pred = model.predict(X_test)
X_pred = X_pred.reshape(X_pred.shape[0], X_pred.shape[2])
X_pred = pd.DataFrame(X_pred, columns=test.columns)
X_pred.index = test.index

scored = pd.DataFrame(index=test.index)
Xtest = X_test.reshape(X_test.shape[0], X_test.shape[2])
scored[‘Loss_mae’] = np.mean(np.abs(X_pred-Xtest), axis = 1)
scored[‘Threshold’] = th
scored[‘Anomaly’] = scored[‘Loss_mae’] > scored[‘Threshold’]
scored.head()

scored[‘Anomaly’].value_counts()

 

Out:
                                                 Loss_mae    Threshold   Anomaly
2004-02-15 12:52:39    0.067486   0.187   False
2004-02-15 13:02:39    0.072280   0.187   False
2004-02-15 13:12:39    0.037381    0.187   False
2004-02-15 13:22:39    0.043859   0.187    False
2004-02-15 13:32:39    0.022220   0.187    False

Out:
True 453
False 85
Name: Anomaly, dtype: int64

Finally, we save both the neural network model architecture and its learned weights in the h5 format. The trained model can then be deployed for anomaly detection.

model.save("CNN_model.h5")
print("Model saved")

In the next article, we’ll do another way Anomaly Detection using LSTM Autoencoder using Keras

Leave a Reply

Your email address will not be published. Required fields are marked *