Tutorial: Getting Started with Coccinelle – Julia Lawall, Inria

This was a two-hour workshop about how to use coccinelle.

Coccinelle is used to:

  • Find and fix programming errors;
  • Make API changes;

A tool like this needs to abstract irrelevant information. Using .* in grep doesn’t quite cut it, because strings may be split over lines, and even the order of substrings may be different. In general, the matches may be pretty far apart.

Coccinelle understands C so it can make real abstraction. It also knows the semantics, so it can distinguish a constant from a generic expression. It uses unpreprocessed C code so you can use macros instead of having to deal with the expanded constants. It also can patch the code. Whitespace is abstracted away completely.

The syntax of the coccinelle language (SmPL, Semantic Patch Language) is similar to C. Changes are expressed with patch-like annotation, abstraction is expressed with …. See examples in the slides.

Metavariables are used to make abstraction of pieces of code, while specifying constraints on these pieces of code. E.g. an expression only matches an expression, not a sequence of statements. There may be additional constraints specified on the metavariables.

Transformations use + and – in the leftmost column. You can also use * to be informed of what was matched.

A good approach for writing the spatch files is to start from a complete example diff, and then make it more and more generic.

Additional Coccinelle features:

  • Isomorphisms are automatic equivalences. For instance, if you check for x == NULL, you also want to match !x when x is a pointer. This particular case is still easy to cater for with disjoints (= alternatives), but for e.g. matching (n + d – 1)/d with all possible parenthesifications would be quite tiresome. Standard isomorphisms are defined in std.iso; you can add more of them, you can disable them for a specific rule, etc.
  • Dots are used to search for code fragments separated by arbitrary code paths. E.g. kmalloc that is not followed by a NULL check – the check may be a way down. Then you use ‘when’ clauses to specify what shouldn’t be in the elided code.
  • It’s possible to write a rule that executes a piece of python code on every match instead of printing out the match. The python code can have access to the matched strings and their positions.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s