An Intelligent Framework for Streaming Sensor Data Analytics and Management
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Finding regular and irregular patterns in streaming big data has the potential to provide insights for many data domains. Representative pattern extraction helps to remove noise from the data and extract key data features for analysis and decision support. The goals of this research are: a) to propose models for classification of time series sensor data from the Internet of Things (IoT) in the domain of Human Activity Recognition (HAR) and b) to store representative patterns as Data-Event Profiles (DEPs) rather than the entire incoming data for concise storage of massive streaming data. Thus, our proposed intelligent data analytics framework consists of four components. a) Streaming data ingestion and b) pattern extraction and recognition components help validate models for real time HAR. c) Pattern representation as DEP and d) pattern reconstruction components help demonstrate the effectiveness of our storage reduction approach in retaining informative patterns in DEPs for HAR with reconstructed data. For the streaming data ingestion, pattern extraction, and recognition, we implement and compare the performance of multiple deep learning models for HAR using MobiAct and UCI-HAR datasets. Our Convolutional Neural Network (CNN) - Long Short-Term Memory (LSTM) model achieves around 96% accuracy for multi-class classification. We explore a variety of autoencoder models to extract representative patterns to reduce storage consumption for concise storage of IoT streaming time series data. Our Multi-Layer Perceptron (MLP) autoencoder achieves a storage reduction of 90.18% compared to the three other autoencoders namely CNN autoencoder, LSTM autoencoder, and CNN LSTM autoencoder, which achieve storage reductions of 11.18%, 49.99%, and 72.35% respectively. Encoded features from the autoencoders have smaller size and dimensions, which help to reduce the storage space. However, we demonstrate that infrequent patterns are lost in the most reduced feature representation obtained from the MLP autoencoder because the reconstructed data using the MLP decoder achieves only 24% accuracy in HAR. With higher dimensions of representative features extracted by the CNN autoencoder model, the storage reduction is low. However, by retaining more relevant information, it achieves a higher accuracy of 95.28% in HAR.

