Panda is a compilation-independent scheduler for pipelining compiler-based tools in parallel based on the JSON compilation database. It allows you to execute various tools on translation units by replaying the compilation process. An introduction video to this tool is available from https://youtu.be/dLG2tEzuaCw.
The advantage of Panda include:
Panda is a standalone Python script. You can install it by directly downloading file panda
from this repo.
$ curl -fsSL https://github.com/Snape3058/panda/raw/demo/panda | sudo tee /usr/bin/panda >/dev/null
$ sudo chmod +x /usr/bin/panda
GitHub Repo for ICSE 2024 tool demo revision: https://github.com/Snape3058/panda/tree/demo. Please note that the content on the demo
branch is ahead of the main branch. And the functionalities on this branch will be merged to the main branch after this tool paper gets accepted.
Scheduling the execution of compiler-based tools requires the JSON Compilation Database. Users can setup the environment according to the introduction from Clang (https://clang.llvm.org/docs/HowToSetupToolingForLLVM.html) or using tools like Bear (Build EAR).
Execution of Panda requires the CBT Execution Configurations (Section 2.2) to be scheduled, as well as optional settings, such as number of parallel workers and output path.
$ panda <configurations> [-f CDB] [-j JOBS] [-o OUTPUT] [options]
Panda provides built-in configurations that cover most scenes of executing analyzers and generating desired inputs for analyzers. The built-in configurations can be categorized as Compiler Tool (TCompiler) Configurations, Frontend Tool (TFrontend) Configurations, and Compilation Database Configurations. The first two categories have been mentioned in the paper, and the last category of configurations are used to generate output directly from the compilation database.
/tmp/csa-ctu-scan
under a concurrency of 16 processes.$ panda -YM -j 16 -o /tmp/csa-ctu-scan
/tmp/check/plugin.json
and store output to path /tmp/check
sequentially.$ panda --plugin /tmp/check/plugin.json -o /tmp/check
The compilation database configurations transform the input compilation database to generate the output file, or summarize the output of other TFrontend configurations.
-L
or --gen-input-file-list
): a list of all unique file
s with absolute path.-F
or --gen-source-file-list
): a list of all unique source files and the header files included.-Y
or --gen-invocation-list
) for Cross Translation Unit Analysis of the Clang Static Analyzer under on-demand-parsing strategy.-M
or --gen-extdef-mapping
) for Cross Translation Unit Analysis of the Clang Static Analyzer under on-demand-parsing strategy.-P
or --gen-extdef-mapping-ast
) for Cross Translation Unit Analysis of the Clang Static Analyzer under AST-loading strategy.The TCompiler Configurations mainly generate inputs in desired formats for different analyzers.
-X
or --syntax
): invoke compiler with -fsyntax-only -Wall
-C
or --compile
): invoke compiler with -c
-E
or --preprocess
): invoke compiler with -E
-A
or --gen-ast
): invoke the clang compiler with -emit-ast
-B
or --gen-bc
): invoke the clang compiler with -emit-llvm
-R
or --gen-ll
): invoke the clang compiler with -emit-llvm -S
-S
or --gen-asm
): invoke compiler with -S
-D
or --gen-dep
): invoke compiler with -M
--analysis
)The TFrontend configurations mainly invoke Clang Tooling based tools.
Users can execute customized TCompiler and TFrontend tools with plugins defined with a CBT execution configuration in JSON format. In the description, field comment
is a string for commenting the description, field type
determines the type of the configuration, and object action
defines the CBT Execution Configuration object.
-D
or --gen-dep
).{
"comment": "Example plugin for Panda driver.",
"type": "Compiler",
"action": {
"prompt": "Generating dependency file",
"tool": {
"c": "clang",
"c++": "clang++"
},
"args": ["-fsyntax-only", "-w", "-M"],
"extension": ".d",
"outopt": "-MF"
}
}
For a TCompiler configuration, object action
has four fields. Field prompt
defines the prompt string printed during executing the tool. Field args
is a list of command line arguments to be added during execution. Field extension
determines the extension name of the output file. And field outopt
represents the option of generating the output.
config.txt
in output directory and storing command line output of stderr stream to output file.{
"comment": "Example plugin for Panda driver",
"type": "Frontend",
"action": {
"prompt": "Generating raw external function map",
"tool": "clang-tidy",
"args": ["--config-file=/path/to/output/config.txt"],
"extension": ".clang-tidy",
"source": "stderr"
}
}
For a TFrontend configuration, object action
has five fields. Field prompt
, args
, and extension
have the same meaning as a TCompiler configuration. Field tool
determines the tool to be executed. And field source
represents the output of which stream will be stored to the output file. Please note that, string /path/to/output
will be always be replaced to the actual output path determined with option -o
during execution.
The Gantt Chart in Figure 6 can be generated with the analyze-log
script. And all data in the experiments are available from the Google Spreadsheet below.
For Tool Demo paper revision, execution logs are dumped to the output path in the format of
/path/to/output/logs-<strategy>-<key>-<timestamp>
where <strategy>
refers to the sorting strategies mentioned in Section 2.4 that fifo
for First-Come-First-Service, and ljf
for Longest-Processing-Time-First. And <key>
represent the key of sorting the worklist. As mentioned in Section 2.3, the number of semicolons (semicolon
) is used by default, whereas the number of code lines (loc
) is also available for alternative.
To summarize a previous execution and present the Gantt Chart of all workers, please use the analyze-log
script provided only in this branch.
$ analyze-log /path/to/output/logs-<strategy>-<key>-<timestamp>
The analyze-log
requires Matplotlib to generate the Gantt Chart. If the Python interpreter fails to import this module, the analyze-log
script will NOT report an error and exit.
An example Gantt Chart can be found from Figure 6 in the paper.
We select the key to sort the worklist with Pearson Correlation Coefficient. The detailed data of calculating the data is presented in the Google Spreadsheet below.
It may take a while to load the data.
The detailed data of the evaluation in Section 3 is presented in the Google Spreadsheet below.
It may take a while to load the data.
Let me know if Panda helps you. Thanks.