Compilation Integrity Assurance through Deep Code Alignment
Lior Wolf
Lior Wolf
Hardware is expected to be the root of trust in most products, and embedded threats are the “new black” in system security. Hardware Trojans, on which we focus, are both persistent and extremely hard to detect. In this project, we address the problem of executable component addition, substitution, and re-programming in the supply chain.
We propose a completely novel approach for detecting hardware Trojans. Consider, as an accessible example, the compilation of firmware that is provided by the hardware designer as C code and is compiled at the foundry. We obtain, from the foundry or by other means the binaries. These binaries are expected to largely match the programming code provided by the hardware designer with some unavoidable additions inserted in order to support debugging, QA, and to comply with manufacturing constraints. We then apply the novel tools we propose in order to identify for every line of the binaries (viewed as assembly code) the matching line in the original C code. Following this step, we can easily identify insertions and other forms of modifications. The engineers of the supplier company or any other verifying agency can then readily track these modifications and tag each one as malicious or not.
The detection of hardware Trojans is almost impossible post manufacturing: modern ICs have millions of nodes and billions of possible states, high system complexity, and are of a nano-scale. Besides, it is very difficult for unknown threats, for which no signatures exist and which are triggered at very low probability. Inserting malicious code as part of the compilation process done at the foundry is relatively easy and is very hard to prevent. While there are other means for inserting hardware Trojans, none are as cheap and straightforward. Verification of the compilation process is therefore of an immense importance.
Most facilities of advanced ICs fabrication and electronics assembly have migrated offshore due to economic pressure. This move has been accompanied by the dominance of the fabless model, in which the thousands of electronics manufacturing services suppliers hand over control of their design to two dozen foundries, mostly in the far east. Trust cannot be guaranteed in this model. The few remaining suppliers that keep using IDM model, where fabrication is done internally, can no longer provide the performance and variety of ICs that are needed. Therefore, establishing trust as part of the hardware manufacturing process is expected to become more and more critical and the tools developed in this project could be adopted very quickly into IC fabrication and firmware compilation. In addition to Trojan detection, the tools we are developing would support and automate a wide range of code analysis tasks that are currently being handled by a large number of engineers working for defense agencies. By building on the tools to be developed and modifying them, e.g., to align binary code with recompiled binary code, one can solve, for example, the task of analyzing executables as these shift from one version to the next, and the analysis of electronic devices as models are being replaced.
We address the task of statement-by-statement alignment of source code and the compiled object code. An explicit approach to this alignment problem is infeasible since the complexity of directly modeling it approaches the one of building the compiler itself, and needs to be done per-compiler. Instead, we propose to employ a deep neural network, which maps each statement to a context-dependent representation vector and then compares such vectors across the two code domains: source and object.