Optimizing brainfuck compiler
This Python script translates a brainfuck source file into a C/Java/Python source file. Along the way, it performs optimizations on the brainfuck program, described below. Afterward, you use your C/Java/Python compiler or interpreter on the newly generated source code.
Download: bfc.py
Usage: python bfc.py BrainfuckFile OutputFile.c/java/py
Tip: It can be tedious to compile and run a brainfuck program using this tool. For example for C, you might want to put all the commands on one line: python bfc.py myprog.b myprog.c && gcc myprog.c -o myprog && ./myprog. Alternatively, you can write a Makefile with rules for compiling, running, cleaning, etc.
Example and benchmark
- Brainfuck (input): mandelbrot.b.txt
- C (output): mandelbrot.c
- Java (output): mandelbrot.java
- Python (output): mandelbrot.py
Benchmark:
| Unoptimized | Optimized | |
|---|---|---|
| C, -O0 | 24.70 s | 3.73 s |
| C, -O1 | 1.66 s | 1.19 s |
| C, -O2 | 1.77 s | 1.25 s |
| C, -O3 | 1.77 s | 1.21 s |
| Java | 67.80 s | 29.00 s |
| Python | 1070.00 s | 396.00 s |
All benchmarks above were performed on an Intel Core 2 Q6600 CPU (single-threaded). The non-optimized source codes are not published because they are uninteresting. Numbers are reported to 3 significant figures. For C, GCC 4.4.3 running on Ubuntu 10.04 x86 was used. For Java, Oracle Java 1.6.0_22 running on Windows XP Professional SP3 was used. For Python, CPython 2.6.5 running on Ubuntu 10.04 x86 was used.
Optimizations
I will use this notation: unsigned char *p points to the current cell (which corresponds to the C output from the script).
- Fusing increments/decrements
A real computer can add by not just one, but by any number. So for example, we can translate
+++intop[0] += 3;.- Fusing left/right movements
Similarly, we can move the pointer by more than one – for example, we can translate
>>intop += 2;.- Fusing movements into adds
We can add/subtract from cells other than the current one. For example, we can translate
>++<intop[1] += 2;, which no longer contains pointer movements.- Postponing movements
In a block of brainfuck commands, if the block consists of only increments/decrements, left/right, and input/output, then we can keep track of the virtual movement and then perform it in one shot at the end of the block. For example, we can translate
>+>->intop[1]+=1; p[2]-=1; p+=3;. Note that we have to actually perform the movements before entering a loop/if and before exiting a loop/if.- Simple loops
A moment’s thought should make it clear that
[-]translates top[0] = 0;. But we can generalize even further than that. If the loop body has no subloops and no input/output, all the movements add up to 0, and all the increments/decrements atp[0]add up to −1, then we are basically running the loop bodyp[0]times. So for all the increments/decrements outside ofp[0], some multiple ofp[0]is added to that location. For example, we can translate[->+>---<<]intop[1]+=p[0]; p[2]-=p[0]*3; p[0]=0;.- Assign followed by add
Obviously,
p[i] = 0;followed byp[i] += v;can be replaced byp[i] = v;. As well,p[i] = 0;followed byp[i] += p[j] * v;can be replaced byp[i] = p[j] * v;.- Complex loops
Extending the simple loop optimization, if a location other than the loop counter is used as the source for multiply-add and this location is cleared to 0 within the loop, then the loop can be turned into an if-statement. For example, we can optimize
while(p[0]!=0){ p[0]--; p[1]+=2; p[2]+=p[3]*5; p[3]=0; }intoif(p[0]!=0){ p[1]+=p[0]*2; p[0]=0; p[2]+=p[3]*5; p[3]=0; }.
Notes
Technically speaking, this script is called a source-to-source compiler, which has a somewhat different design compared to a typical source-to-machine-code compiler.
To disable optimizations, simply remove the lines that call the
optimize()function.The generated source code uses a large but fixed-size array for the brainfuck memory, with no explicit bounds checking. Thus, the rare brainfuck program that uses a lot of memory will either crash precisely in the Java and Python versions, or exhibit undefined (arbitrary) behavior in the C version. This implementation choice is because I didn’t want to deal with the complexity of (for example) resizing the memory array to uphold the illusion of infinite memory for the brainfuck program.
For Java output, large brainfuck programs may exceed the class file format’s 64 KiB per method size limit. This can be fixed by modifying the Java code generator (in the Python script) to be able to split the brainfuck program into multiple methods based on a conservative estimate of the output bytecode size.