Scaling Symbolic Reasoning for Executable Code via Summarization and Interaction

Tags:
computers

Researchers

Prof. Shmuel Sagiv

Modern society fundamentally relies on software systems, with some of its most vital processes, such as communication and banking implemented in software. Hence, the security of these systems is paramount, as otherwise a malicious party can take down critical national infrastructures or lead them to incorrect, possibly disastrous, behaviors. A key reason for the vulnerability of software systems is that ensuring that they behave correctly for any given input is provably impossible. This opens the door for sophisticated users to craft an unexpected, yet seemingly benign inputs, that can derail the attacked system into an undesired execution path.

Our goal is to develop tools, techniques, and methodologies that help detect vulnerabilities in realistic software systems by synthesizing inputs that can force a program to go from one given program point to another, or determine that no such input exists, and hence that this particular vulnerability never occurs. We plan to do so by expressing the feasibility of an execution path using a logical formula and harnessing the power of modern symbolic SAT solvers, e.g., Z3, to identify a satisfying assignment, i.e., an input scenario which exposes the vulnerability, or determine that none exists. Indeed, tools such as SAGE, KLEE, and uc-KLEE show that such symbolic reasoning can be very useful in generating tricky inputs. However, scaling these tools to realistic software is difficult: Firstly, determining the existence of an execution path is an undecidable problem and even checking whether a given path formula is feasible is NP-complete. Secondly, the limits of the logical theories underlying the solver make it challenging to handle many aspects of low level code including non-linear arithmetic, pointers, loops, and indirect jumps. Finally, the code can be simply too big for current techniques.

In this project, we will scale the ability of tools such as KLEE to more realistic situations by pursuing several intertwined directions: (i) simplifying the generated logical formulae using sound information obtained from static program analysis, (ii) designing domain-specific theories suitable for reasoning about low-level code, (iii) making the reasoning modular by developing symbolic procedure summaries, and (iv) leveraging high-level guidance from the user to identify promising code parts to explore.