Lockfish - parse and analyze C Programs in Python

Lockfish is a tool which goes open-source today. With it you can parse C files from Python and perform simple syntactic analysis on the result.

Included are caller table and caller graph as well as pointer analysis, to find pointers to functions. We apply lockfish to search for not-taken locks in OpenBSD kernel, but you can set lockfish and implement your own checks on your C code base.

Disclaimer On this web site you might read about or get access to various kinds of software and technology, including but not limited to libraries, operating systems, software for communications, mobile phones and tablets, Android software and Linux, even cars and motorcycles, security and penetration testing software, software used in security research and forensics, some samples of software which can be used (elsewhere) for malicious or illegal purposes. You will read about or be provided with the ways to change it, to operate it and to use it. You might find advice and recommendations, which are only an opinion, and not a legal advice or commercial recommendation..
Bear in mind, please, that everything you do, you do solely at your own risk and responsibility. In no way the author of this web site, information, graphics and other materials presented here or related to it can be made liable or anyhow else responsible for your own actions as well as actions of any third party and their direct or indirect results or consequences with or without the use of this information as well as the software, technology and systems mentioned and/or presented here, no matter if developed by the author or by any third party.
In no way it is guaranteed that you will meet any suitability for any particular purpose, safety, security, legality or even simply functioning of the software and systems described here. You have to make sure each time yourself, whether what you do, is really what you intend to do, and that you are ready to be yourself responsible for. All the recommendations and experiences described here are the opinions of corresponding authors and are to be taken with care and own full responsibility.
The software provided on or through this web site, linked to from this web site or anyhow else related to this web site is provided by the corresponding authors on their own terms. We provide all the software here as is without any guarantees to you. You are responsible for deciding whether it is suitable for you or not. You are also responsible for all direct or indirect consequences of using this software.
Other web sites linked to from the current one are out of the author's control, we can not guarantee anything about their content, its quality or even legality. We can not be liable for any use of the linked to web sites or of the information presented there.
We reasonably try to keep this website running smoothly and to deliver information to the best of our knowledge corresponding to the state of the art at the times when the information is composed, usually presented together with the information, and out of good intents. We can not however guarantee and can not be liable for this website being temporarily or permanently unavailable, presenting unreliable information or software, or any other similar or not malfunctioning or functioning not up to your expectations as well as any consequences which might result from this site's operation.

Little history of this micro-project

Alexander Bluhm, an OpenBSD committer, and a software architect at genua, was busy one day fishing out a problem of not-taken locks in OpenBSD kernel.

Some functions in the kernel shall only be executed after a lock is taken. The lock shall eventually be released.

Example:


NET_LOCK(s);
bug_me_not();
NET_UNLOCK(s);

The bug_me_not() function was called here during the lock was taken, and this is a correct way to call it.

The problem is now to find all the places the function is called and to check whether the lock is taken there too. Now if another function is called after the lock has been taken, and it then calls bug_me_not(), it is also a correct situation. We want to find only incorrect ones.

This was the problem, which I got from Alexander, as he knew, I understand (at least he thought so) program analysis.

It came to my mind immediately, that I know of no tool, which would easily parse OpenBSD kernel and let me conveniently perform this search as requested. At least syntactically, ignoring the actual C semantic.

After a smooth surf on the Internet, I found that clang has Python bindings, and decided to try them out. Now, the thing is, the bindings are rather low-level. Formulating algorithms on them: searches, queries, etc. - is not easy.

Before solving one particular problem in low-level abstractions, I made my life a bit easier writing the lockfish. And then, one day in Starnberg, Alexander just published it on github. So I had no way but fork it, make it at least a bit more world-presentable and write this article.

The locking problem Alexander had originally was solved by this tool by the way.

Getting lockfish and installing it

For this you just have to go to my github repository and clone, then follow, please, the installation instructions: https://github.com/qutorial/lockfish

Some of the innerworkings are describe in this additional README.md: link

Parsing C and analyzing it from Python

In particular, lockfish can help you parse a C program in Python.

There are a few examples on how to begin with it here: link

Here is an example, to parse your C and build a caller graph. Six-liner!

It should be as simple as this:


# Parse all C files in a folder of your choice
tus =  parse_folder("some c folder, ext = ".c")
cursors = get_cursors(tus)
contents = ncl(cursors)

# Get all functions quickly
allfuncs = contents.ofkind(CursorKind.FUNCTION_DECL).shallow().maxdepth(1)

# Build a caller table
callertable = build_caller_table_obsd(allfuncs)

# Build a caller graph
cg = build_call_graph(callertable, allfuncs, "main")

So if you too solve the problem how to parse a C program in Python or how to analyze a C program in Python you might want to reuse this our little work.

Hacking with lockfish

Basically, apart of the bugs found in OpenBSD kernel, I am happy about the lazy collections on which the analysis is built.

Lazy node collections allow you to specify conditions on the abstract syntax tree produced by clang, and then iterate the matching nodes lazily.

E.g. you can formulate a request to fund a call to tdb_walk() in a function stored in f like this (we assume you found the f before):


get_all_descendants(f).filter(lambda n: n.kind == CursorKind.CALL_EXPR).spelled("tdb_walk"):

The example above is more complex, and shows you also how to build a caller graph, if you need one.

Using lockfish interactively

You can use lockfish interactively from the Python interpreter.

First, activate and go python:


$ . activate.sh 
(env2) :~/lockfish$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10) 
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

Now we will play with such file from lockfish tests:


$ cat tests/takes_lock_c/main.c

void a(){
}

int netlock;

void _rw_enter_write(int n){
}

void _rw_exit_write(int n){
}

void c(){
 _rw_enter_write(netlock);
 a();
 _rw_exit_write(netlock);
}


void b(){
 a();
}

void pointer(){
 a;
 b;
 c;
}

There is a function of our interest, a(), it is getting called from b() and from c(). Only c() takes the lock properly, as it should be in the OpenBSD kernel. The function pointer() just references all the functions above.


>>> from lockfish.clangparser import *
>>> from lockfish.lazy import *
>>> from clang.cindex import CursorKind
>>>
>>> # We will parse all files in tests/takes_lock_c which have .c extension
>>> tus = parse_folder('tests/takes_lock_c', '.c')
Parsing 1/1: main.c
 - Done
>>>
>>> # Now let's engage lockfish'es lazy collections to wrap the results 
>>> cursors = get_cursors(tus) # iterable cursors
>>> contents = ncl(cursors) # wrapped in a lazy collection for performance
>>> contents

>>> 
>>> # As you can see now, we work with ncl now
>>> # Let's play with it a bit
>>> >>> contents.pprint()
tests/takes_lock_c/main.c
pointer

c
c
b
b
a
....
>>>
>>> # This iterates over the whole set of all nodes in depth 
>>> # Not directly useful, ha?
>>> # Let's engage the tooling from lockfish
>>> # For example, let's find all functions
>>> allfuncs = contents.ofkind(CursorKind.FUNCTION_DECL).shallow().maxdepth(1)
>>> allfuncs.pprint()
pointer
b
c
_rw_exit_write
_rw_enter_write
a
>>> 
>>> # Now we will build a caller table, to know, who calls a() eventually
>>> #  but first we have to import more from lock fish:
>>> from lockfish.cgutils import *
>>> callertable = build_caller_table(allfuncs)
Analyzing declaration #0: pointer
 - Done
>>> {'a': [, 
            ], 
    '_rw_enter_write': [], 
    '_rw_exit_write': []}
>>> # Caller table maps function names to objects which call them.
>>>
>>> # We have four calls in this example, as expected. a() is called two times.
>>> # It shall be from b() and from c(), let's check it:
>>> callertable['a'][0].spelling
'b'
>>> callertable['a'][1].spelling
'c'
>>> # -exactly. Let's go ahead and build a caller graph next.
>>> # This feature knows more about OpenBSD, and this is why we have
>>> #  to import more, but don't worry, it will work for your code too.
>>> from lockfish.obsdanalysis import *
>>> cg = build_call_graph(callertable, allfuncs, 'a')
Searching for the root function...
New caller added: b at depth: 2
>>> 
>>> # Now that we have a caller graph (like a call graph, but inverted)
>>> #  we can pretty print it.
>>> cg.pprint()
 |- a
    |- b
    |- c
>>>
>>> # Nice visualization, now we see, who call a().
>>> # Finally, let's get lock analysis to work.
>>> # It runs on the call graph.
>>> lock_analysis(cg)
No lock: [a, b]
>>>
>>> # So, we see, that the lock was not taken when a() was called from b()
>>> #   it would be a bug in OpenBSD, had we found it there..
>>> # Unless b() is called by pointer somewhere, and the lock is taken there.
>>> # Let's perform pointer analysis:
>>> pointer_analysis('b', contents)
Building pointers table...
Analyzing declaration #0: pointer
 - Done
Pointers table:  {'b': ['pointer']}
b pointed from: ['pointer']
>>> # Now we know that pointer() references b() and we can have a look there.
>>> #   pointer() does not call b(), so for now our story is over.

Instead of a conclusion

Lockfish is a work in progress now, It helps us improve on OpenBSD's quality.

You might find it easy to implement some syntactic checks in your project using lockfish. Let us know, if you experiment with it!

Please, also do not hesitate to contact me, if you have questions or a support request for lockfish.



Thanks for reading my blog!
Created: 17/07/2017
Last edited on: 17/07/2017
Your comment: