Visualizing (non-POSIX) Regular Expressions
█▄▄▂▃▇▄▅▆▆▄▁▃▃▃▄▁▅▇▃▁▂▃▄▂▆▁▆▂▁▇▅▇▃█▆▆█▅▅▅▁▃▂▇▃█▆▇▂█▂▅▇█▇▅▇▅▆▁▁▆▄
« 📅 published on 27/Nov/2013
»
🔖 tagged code
Introduction
I recently came across the pyFSA project and played with it for sometime. It includes a built-in API to render finite state machines as dotstring. Regular expressions can be represented as FSMs and then be rendered to a dot-string for visual introspection. I quickly wrote a proof-of-concept tool to implement this idea and hence re2dotgraph was born.
Installation and Testing
First, the pyFSA module has to be installed. Download the package, unzip it and follow the standard Python module installation steps:
$ python setup.py build
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
copying FSA.py -> build/lib.linux-x86_64-2.7
copying NumFSAUtils.py -> build/lib.linux-x86_64-2.7
copying FSChartParser.py -> build/lib.linux-x86_64-2.7
copying reCompiler.py -> build/lib.linux-x86_64-2.7
$
$ sudo python setup.py install
running install
running build
running build_py
running install_lib
copying build/lib.linux-x86_64-2.7/FSChartParser.py -> /usr/local/lib/python2.7/dist-packages
copying build/lib.linux-x86_64-2.7/FSA.py -> /usr/local/lib/python2.7/dist-packages
copying build/lib.linux-x86_64-2.7/reCompiler.py -> /usr/local/lib/python2.7/dist-packages
copying build/lib.linux-x86_64-2.7/NumFSAUtils.py -> /usr/local/lib/python2.7/dist-packages
byte-compiling /usr/local/lib/python2.7/dist-packages/FSChartParser.py to FSChartParser.pyc
byte-compiling /usr/local/lib/python2.7/dist-packages/FSA.py to FSA.pyc
byte-compiling /usr/local/lib/python2.7/dist-packages/reCompiler.py to reCompiler.pyc
byte-compiling /usr/local/lib/python2.7/dist-packages/NumFSAUtils.py to NumFSAUtils.pyc
running install_data
warning: install_data: setup script did not provide a directory for 'README.txt' -- installing right in '/usr/local'
copying README.txt -> /usr/local
warning: install_data: setup script did not provide a directory for 'LICENSE.txt' -- installing right in '/usr/local'
error: can't copy 'LICENSE.txt': doesn't exist or not a regular file
Ignore the LICENSE.txt
copy error. Now you need to install the graphviz
package which includes the dot
tool useful in rendering .dot
files into images. Once the dependencies are successfully installed, you can use re2dotgraph
and test it with a sample regex:
$ ./re2dotgraph.py 'a*b?c+'
[+] Generating dot string ...
[+] Label: a*b?c+
[+] states: [0, 1, 2]
[+] states count: 3
[+] initialState: 0
[+] finalStates: [2]
[+] alphabet: None
[+] transitions:
(0, 0, <CharacterSet a>)
(0, 1, <CharacterSet b>)
(0, 2, <CharacterSet c>)
(1, 2, <CharacterSet c>)
(2, 2, <CharacterSet c>)
[+] Generating dot graph ...
[+] regex: 'a*b?c+' -> regex.png
The input regex is represented using a FSM which is then rendered to a dot
file. This dot
file is then rendered to an image name regex.png
(name can be changed by passing a custom value as the second argument) using the installed dot
program, called natively from inside Python runtime using the os.system
method. Here is how the graph looks like:
I would like to point out that representing regular expressions as FSMs is really tricky. There are certain aspects of a regex that can never be represented (like backreferences and lookaheads) due to the fact that FSMs don't have memory to keep note of what was matched previously. This highlights the fact that any regexes, with backreferences or lookaheads won't be rendered correctly. Also, the pyFSA page mentions that the project currently is not fully compliant with POSIX regex guidelines. This severely limits the real-world use-cases:
$ ./re2dotgraph.py 'a\dc'
[+] Generating dot string ...
[+] Label: a\dc
[+] states: [2, 0, 3, 1]
[+] states count: 4
[+] initialState: 0
[+] finalStates: [3]
[+] alphabet: None
[+] transitions:
(0, 1, <CharacterSet a>)
(1, 2, <CharacterSet \d>)
(2, 3, <CharacterSet c>)
[+] Generating dot graph ...
[+] regex: 'a\dc' -> regex.png
And here's the generated image:
Conclusion
As is evident from the above image, the \d
escape sequence was not identified and the graph as such is incorrect. However, this still is an interesting project and a really easy way of visualizing regular expressions. For POSIX compliant solutions, refer these projects: RegExVis, Debuggex and Regexper
You can get the source files through the project repository on GitHub.