From Compilation Database to Compiler-Based Tools.

Panda is a compilation-independent scheduler for pipelining compiler-based tools in parallel based on the JSON compilation database. It allows you to execute various tools on translation units by replaying the compilation process. An introduction video to this tool is available from https://youtu.be/dLG2tEzuaCw.

The advantage of Panda include:

  1. Compatible to customize executions of various Compiler-Based Tools
  2. Avoiding interference with the build system;
  3. Scheduling tool execution in a dependency-free manner to take full advantages of the system resources.

Installation

Panda is a standalone Python script. You can install it by directly downloading file panda from this repo.

$ curl -fsSL https://github.com/Snape3058/panda/raw/demo/panda | sudo tee /usr/bin/panda >/dev/null
$ sudo chmod +x /usr/bin/panda

GitHub Repo for ICSE 2024 tool demo revision: https://github.com/Snape3058/panda/tree/demo. Please note that the content on the demo branch is ahead of the main branch. And the functionalities on this branch will be merged to the main branch after this tool paper gets accepted.

Usage

Scheduling the execution of compiler-based tools requires the JSON Compilation Database. Users can setup the environment according to the introduction from Clang (https://clang.llvm.org/docs/HowToSetupToolingForLLVM.html) or using tools like Bear (Build EAR).

Execution of Panda requires the CBT Execution Configurations (Section 2.2) to be scheduled, as well as optional settings, such as number of parallel workers and output path.

$ panda <configurations> [-f CDB] [-j JOBS] [-o OUTPUT] [options]

Panda provides built-in configurations that cover most scenes of executing analyzers and generating desired inputs for analyzers. The built-in configurations can be categorized as Compiler Tool (TCompiler) Configurations, Frontend Tool (TFrontend) Configurations, and Compilation Database Configurations. The first two categories have been mentioned in the paper, and the last category of configurations are used to generate output directly from the compilation database.

$ panda -YM -j 16 -o /tmp/csa-ctu-scan
$ panda --plugin /tmp/check/plugin.json -o /tmp/check

Built-in Compilation Database Configurations

The compilation database configurations transform the input compilation database to generate the output file, or summarize the output of other TFrontend configurations.

Built-in Compiler Tool Configurations

The TCompiler Configurations mainly generate inputs in desired formats for different analyzers.

Built-in Frontend Tool Configurations

The TFrontend configurations mainly invoke Clang Tooling based tools.

Plugins

Users can execute customized TCompiler and TFrontend tools with plugins defined with a CBT execution configuration in JSON format. In the description, field comment is a string for commenting the description, field type determines the type of the configuration, and object action defines the CBT Execution Configuration object.

{
    "comment": "Example plugin for Panda driver.",
    "type": "Compiler",
    "action": {
        "prompt": "Generating dependency file",
        "tool": {
            "c": "clang",
            "c++": "clang++"
        },
        "args": ["-fsyntax-only", "-w", "-M"],
        "extension": ".d",
        "outopt": "-MF"
    }
}

For a TCompiler configuration, object action has four fields. Field prompt defines the prompt string printed during executing the tool. Field args is a list of command line arguments to be added during execution. Field extension determines the extension name of the output file. And field outopt represents the option of generating the output.

{
    "comment": "Example plugin for Panda driver",
    "type": "Frontend",
    "action": {
        "prompt": "Generating raw external function map",
        "tool": "clang-tidy",
        "args": ["--config-file=/path/to/output/config.txt"],
        "extension": ".clang-tidy",
        "source": "stderr"
    }
}

For a TFrontend configuration, object action has five fields. Field prompt, args, and extension have the same meaning as a TCompiler configuration. Field tool determines the tool to be executed. And field source represents the output of which stream will be stored to the output file. Please note that, string /path/to/output will be always be replaced to the actual output path determined with option -o during execution.

Data Presentation and Open-Access

The Gantt Chart in Figure 6 can be generated with the analyze-log script. And all data in the experiments are available from the Google Spreadsheet below.

For Tool Demo paper revision, execution logs are dumped to the output path in the format of

/path/to/output/logs-<strategy>-<key>-<timestamp>

where <strategy> refers to the sorting strategies mentioned in Section 2.4 that fifo for First-Come-First-Service, and ljf for Longest-Processing-Time-First. And <key> represent the key of sorting the worklist. As mentioned in Section 2.3, the number of semicolons (semicolon) is used by default, whereas the number of code lines (loc) is also available for alternative.

To summarize a previous execution and present the Gantt Chart of all workers, please use the analyze-log script provided only in this branch.

$ analyze-log /path/to/output/logs-<strategy>-<key>-<timestamp>

The analyze-log requires Matplotlib to generate the Gantt Chart. If the Python interpreter fails to import this module, the analyze-log script will NOT report an error and exit.

An example Gantt Chart can be found from Figure 6 in the paper.

Selection of Key to Sort the Worklist

We select the key to sort the worklist with Pearson Correlation Coefficient. The detailed data of calculating the data is presented in the Google Spreadsheet below.

It may take a while to load the data.

Detailed Data of Evaluation

The detailed data of the evaluation in Section 3 is presented in the Google Spreadsheet below.

It may take a while to load the data.

Acknowledgments

Let me know if Panda helps you. Thanks.