First pypy-cli-jit benchmarks
Originally published on the PyPy blog.
As the readers of this blog already know, I've been working on porting the JIT to CLI/.NET for the last months. Now that it's finally possible to get a working pypy-cli-jit, it's time to do some benchmarks.
Warning: as usual, all of this has to be considered to be a alpha version: don't be surprised if you get a crash when trying to run pypy-cli-jit. Of course, things are improving very quickly so it should become more and more stable as days pass.
For this time, I decided to run four benchmarks. Note that for all of them we run the main function once in advance, to let the JIT recoginizing the hot loops and emitting the corresponding code. Thus, the results reported do not include the time spent by the JIT compiler itself, but give a good measure of how good is the code generated by the JIT. At this point in time, I know that the CLI JIT backend spends way too much time compiling stuff, but this issue will be fixed soon.
- f1.py: this is the classic PyPy JIT benchmark. It is just a function that does some computational intensive work with integers.
- floatdemo.py: this is the same benchmark involving floating point numbers that have already been described in a previous blog post.
- oodemo.py: this is just a microbenchmark doing object oriented stuff such as method calls and attribute access.
- richards2.py: a modified version of the classic richards.py, with a warmup call before starting the real benchmark.
The benchmarks were run on a Windows machine with an Intel Pentium Dual Core E5200 2.5GHz and 2GB RAM, both with .NET (CLR 2.0) and Mono 2.4.2.3.
Because of a known mono bug, if you use a version older than 2.1 you need to pass the option -O=-branch to mono when running pypy-cli-jit, else it will just loop forever.
For comparison, we also run the same benchmarks with IronPython 2.0.1 and IronPython 2.6rc1. Note that IronPython 2.6rc1 does not work with mono.
So, here are the results (expressed in seconds) with Microsoft CLR:
Benchmark pypy-cli-jit ipy 2.0.1 ipy 2.6 ipy2.01/ pypy ipy2.6/ pypy f1 0.028 0.145 0.136 5.18x 4.85x floatdemo 0.671 0.765 0.812 1.14x 1.21x oodemo 1.25 4.278 3.816 3.42x 3.05x richards2 1228 442 670 0.36x 0.54x
And with Mono:
Benchmark pypy-cli-jit ipy 2.0.1 ipy2.01/ pypy f1 0.042 0.695 16.54x floatdemo 0.781 1.218 1.55x oodemo 1.703 9.501 5.31x richards2 720 862 1.20x
These results are very interesting: under the CLR, we are between 5x faster and 3x slower than IronPython 2.0.1, and between 4.8x faster and 1.8x slower than IronPython 2.6. On the other hand, on mono we are consistently faster than IronPython, up to 16x. Also, it is also interesting to note that pypy-cli runs faster on CLR than mono for all benchmarks except richards2.
I've not investigated yet, but I think that the culprit is the terrible behaviour of tail calls on CLR: as I already wrote in another blog post, tail calls are ~10x slower than normal calls on CLR, while being only ~2x slower than normal calls on mono. richads2 is probably the benchmark that makes most use of tail calls, thus explaining why we have a much better result on mono than CLR.
The next step is probably to find an alternative implementation that does not use tail calls: this probably will also improve the time spent by the JIT compiler itself, which is not reported in the numbers above but that so far it is surely too high to be acceptable. Stay tuned.