[12-02] Verified Separate Compilation of C Programs into Binary Object Files

Title: Verified Separate Compilation of C Programs into Binary Object Files (从C程序到二进制对象文件分离编译的验证方法研究)

Speaker: Dr. Yuting Wang, Associate Professor, John Hopcroft Center for Computer Science, Shanghai Jiao Tong University

Time: 3:00pm, December 2nd (Wednesday), 2020

Venue: Lecture room (334), Building 5, SKLCS, Institute of Software, CAS

Abstract: The object file formats (e.g., Executable and Linkable Format or ELF) are standard binary interfaces through which system software is connected. They are naturally the final target of production compilers such as GCC. However, this is not the case for formally verified compilers. For instance, the complete compilation chain of CompCert—the state-of-the-art verified C compiler—ends at assembly code. Further compilation of those assembly code to binary object files is handled by unverified external tools (e.g., the GNU assembler and linker). A key obstacle to the verification of compilation to binary machine code in CompCert is its representation of the stack as an unbounded list of memory blocks: because this representation is quite different from the finite and continuous stack on the actual hardware, CompCert has to introduce pseudo assembly instructions for manipulating its stack that do not exist in any CPU architecture.

We propose an approach to addressing the above shortcomings of CompCert. The key idea is to enrich CompCert’s block-based memory model with an abstract data type that bounds the stack consumption. By introducing this abstract stack, we are able to extend CompCert with verified compiler passes that merge the individual stack blocks into a finite and continuous stack and eliminate the pseudo instructions. Furthermore, by developing a verified assembler targeting the ELF format and connecting it with CompCert, we get CompCertELF, the first verified compiler that transforms C programs all the way to a standard object file format.

CompCert supports large-scale verification via verified separate compilation: C modules can be written and compiled separately, and then linked together to get a target program that refines the semantics of the program linked from the source modules. However, verified separate compilation in CompCert only works for compilation to assembly programs, not to object files. For the latter, the main difficulty is to bridge the two different views of linking: one for CompCert’s programs that allows arbitrary shuffling of global definitions by linking and the other for object files that treats blocks of encoded definitions as indivisible units.

We propose a lightweight solution to the above problem without any modification to CompCert’s framework for verified separate compilation: by introducing a notion of syntactical equivalence between programs and proving that it commutes with the two different kinds of linking operations, we are able to transit from the more abstract linking operation in CompCert to the more concrete one for the ELF object files. By applying this approach to CompCertELF, we obtain the first compiler that supports verified separate compilation of C programs into standard object files.

Bio: Yuting Wang is a tenure-track Associate Professor in John Hopcroft Center for Computer Science at Shanghai Jiao Tong University. Prior to that, he was an associate research scientist at Yale University under the supervision of Zhong Shao. He obtained a Ph.D. degree in Computer Science at the University of Minnesota, Twin Cities under the guidance of Gopalan Nadathur. His research interests include the theories of programming languages and formal verification (including proof theory, type theory and category theory) and their application to the construction and verification of software artifacts—especially system software such as compilers and operating systems. He is also interested in developing specification and reasoning frameworks based on the theories of formal verification. He is a core developer for Abella, an interactive theorem prover particularly suitable for the formalization and verification of meta-properties of programming languages and logics.