Empirical Evaluation Of Edge AI Deployment Strategies Involving Black-Box And White-Box Operators

Loading...
Thumbnail Image

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Edge AI enables deploying models across Mobile, Edge, and Cloud (MEC) tiers using a wide range of ML model transformation operators. Despite the wide range of operators, broadly categorized into white-box (training-based) and black-box (non-training-based) techniques, deciding which type of operator to use in an Edge AI setup to achieve performance advantage is mostly left to the personal judgment of the MLOps engineers.

This study involves inference experiments with three black-box (i.e., Partitioning, SPTQ, Early Exiting) and three white-box (i.e., QAT, Pruning, Knowledge Distillation) operators, and their combinations across 3 deployment tiers (i.e., MEC) on 4 Computer Vision (CV) and 2 Natural Language Processing (NLP) models. We used a reproducible docker-based simulation approach for the Edge AI setup of MEC tiers, in which sequential inference requests of a wide range of varying input (i.e., images, texts) sizes were studied to measure the latency introduced by deployment strategies.

Findings suggest that for CV models, Edge deployment using the hybrid SPTQ Early Exit black-box operator is preferred when faster latency (1.17x/1.45x of SPTQ/Early Exit) is a concern at medium accuracy loss in terms of effect size. However, if minimizing accuracy loss is a concern, the SPTQ black-box operator on the edge should be used.

For models with large input data samples (ResNet, ResNext, DUC), an edge tier with higher network/computational capabilities is more viable than partitioning and mobile/cloud deployment strategies. A network-constrained cloud tier is a better alternative for models with small input data samples (FCN, Bert, Roberta).

Regarding the white-box operators, the Distilled operator shows a faster latency than QAT/Pruning in Mobile (3.36x/3.34x) and Edge (2.66x/3.31x) tiers at the cost of small to medium accuracy loss in terms of effect size. Moreover, the Distilled SPTQ hybrid operator should be preferred over non-hybrid operators (i.e., Distilled/SPTQ/QAT/Pruned) when faster latency (1.52x/2.89x/3.93x/5.17x) is a concern in the edge tier at small to medium accuracy loss in terms of effect size.

This thesis aims to be a stepping stone in the field of MLOps, evaluating the benefits and trade-offs of deployment strategies with respect to latency and accuracy.

Description

Keywords

Edge AI, Deployment Strategies, Inference Latency, Model Performance

Citation

Endorsement

Review

Supplemented By

Referenced By