15 Dec 2013
Virex: a Vim-flavored Regex playground on the web
/ vi[RE] / x
a tool for exploring regular expressions in vim
I've been goofing around with Vim and Erlang lately, and since two great tastes taste great together, I made you a thing! My latest toy - Virex, where you can experiment with Vim's regex on the web. It's like Rubular but for Vim regex. Have fun!
I've been having a lot of fun mucking around with learning and tweaking my Vim-related toolset ever since I started working at Case Commons, so when a coworker asked why there wasn't anything like Rubular for Vim's regex, I jumped at the excuse to throw this together.
Using Virex is slower than just experimenting in Vim locally, natch, but it was fun to build and it comes in handy when you're out with your friends arguing about regular expressions in Vim with only your phone handy for trying to prove your point. (SHUT UP, this happens.)
The interesting part turned out to be thinking about security, which is what the rest of this post will mostly be about.
Erlang is handy for sending messages between processes (shocking, n/n?)
I wanted to delegate the user-input test strings and regex patterns directly to Vim rather than try to reimplement Vim's regex perfectly, while avoiding sending anything through the shell (danger zone like whoa, obvs). Erlang's open_port/2 function was the perfect solution.
(Okay, I admit, I also really just wanted an excuse to play with Erlang some more. No Starch Press offered to send me free books to review a few months ago, and on a whim I asked for a copy of Learn You Some Erlang for Great Good!. I've been having a lot of fun exploring Erlang on the side ever since the book arrived.
It's a pretty fantastic book, overall - concepts are explained clearly and thoroughly, in a way that I find very intuitive. My only caveat is that I found some of the examples used by the author distractingly offensive - I was really put off by bound variables being illustrated by a sadface dude in a suit standing next to a smiley lady in a white dress. Also, binary gender examples much? So, problematic. But "I like things, and some of those things are problematic." It's definitely also clear, thorough, and informative.)
Waaah don't shell out via Vim please
So, great, user input is bypassing the shell when being sent directly from my Erlang server to Vim. But wait, it's possible to shell out from Vim in various ways! Oh noes, we can't have that.
I'm highlighting matches by using Vim's regex substitution, %s/PATTERN/REPLACEMENT/g. The risky aspect of this is that the REPLACEMENT section can take any vimscript expression, so I don't want users to be able to escape the PATTERN section and potentially get arbitrary code executed that could let them shell out and cause trouble.
This led to a truly absurd bit of Erlang that rejects any user-input pattern with a forward slash preceded by an odd number of contiguous backslashes. Tsk tsk, don't go trying to escape my slash and causing trouble.
Regular Expressions Denial of Service attacks
The other big security risks I fretted over were Regular Expressions Denial of Service attacks. Regular expressions are pretty powerful, and can be written to run dangerously slowly and consume large amounts of memory.
I had a lot of fun testing Virex with this list of Evil Regexes I found on Wikipedia and this fabulous post Dave sent me, In search of an exponential time regex.
The biggest ReDoS problem Vim's regex seems to be susceptible to is greedy quantifiers along the lines of \(.*\)\{1,32000\} - that hung forever. Bummer.
After a bit of poking around, I determined that \{99,\} and \{,99\} were safe, but \{999,\} and \{,999\} are not. So, Virex rejects repetitions that are 3 digits are longer.
Here's the function I'm using to test whether a user-input regex pattern is safe:
As you can see, I'm limiting patterns to 80 characters on basic principle - if you need to test a longer regex than that, you can do it when you get home.
If the pattern is short enough, that long regex I've got there does two other checks - it makes sure that no quantifiers have repeats that are 3 digits are longer, and that no forward slashes are immediately preceded by an even number of backslashes.
(Why so many backslashes? Blame Erlang. I feel like half the time I spent on this little project was focused on making sure I was escaping characters properly as they went through Erlang, Vim, and oh god you got your regex in my regex.)
So.
To sum up - Virex is a webmachine app, with nginx acting as a reverse proxy and serving the static content, which sanitizes the user-input regex patterns and sends them off to Vim to test them out. Alex Feinman designed that awesome logo for me. The source code is here.
I adore Erlang's syntax, and I had a lot of fun exercising the paranoid portion of my brain and exploring evil regexes - hopefully I caught them all. If you can think of anything else I ought to test for, please let me know! (Ideally via twitter or pull requests, not by crashing my server, thankyouverymuch. ^^)