Asm2Seq: Explainable Assembly Code Functional Summary Generation
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Technology is at the forefront of nearly every aspect of the modern world. Humans write code for technology to function as we desire, but often times little is understood about the code needed to control the computer without significant human analysis. This research aims to bridge this gap by producing human-readable summarizations of the functionality of the assembly code needed for computer execution. Vulnerability datasets are used as starting datasets on the model because finding and understanding vulnerabilities in a program are important for software maintenance, software anal- ysis, and software development. Source code files exhibiting various vulnerabilities are compiled to produce their assembly code counterparts and used as input to the model. Descriptions of how the vulnerabilities function are extracted from the source code files and used as the desired output for the model. Various neural network architectures make up the encoder-decoder experiments to determine the best model. Each experiment undergoes significant training in order to produce accurate predictions. Attention was added in order to understand what aspects of the assembly code had the biggest effect on generating the summary. The models produced high rates of accuracy and Bilingual Evaluation Understudy (BLEU) score, which are both indicative of a well performing network. Comparisons between model predictions and the true descriptions showcase the favorable results.

