7

Today I‘ve been investigating a freeze in our app. It took me many hours to narrow it down to the textfield validation regex. And it turned out to be a "catastrophic backtracking" issue.
I‘m a regex noob so I don‘t have a clue how it occurs exactly. But I‘m a bit perplexed about what a seemingly innocent regex can cause.

For me it became another argument against regex now.
I‘ve rewritten the regex into readable code and the freeze is gone.

I could try to fix the regex but… nah. The code is better anyway.

Comments
  • 4
    i think it's best to always include a timeout for regex evaluations and also, avoid evil regexes...

    but we also have some weird regex usages in our code that feel pretty shaky and also unnecessary...^^'
  • 3
    Regex can be a problem because some kinds of patterns create an exponential growth in the internal search tree.

    It can be tricky to see if a regex can cause this so as mentioned, a timeout can be a good thing.

    Early versions of internet explorer had this problem in the html parser so a tag of I think 8 letter a and no ending > would crash the browser.

    But a warning sign is nested matching especially if they have similar patterns or wild cards.

    Those can easily match the same thing in millions of different ways, all valid.
  • 3
    https://regular-expressions.info/ca...

    https://regular-expressions.info/to...

    ...

    At the bottom of the page there are more pages.

    Imho that is basic knowledge that should be understood once before writing regex.

    It's just like the saying "Don't hand roll your own cryptography...." - "Don't write RegEx expressions you don't understand".
  • 1
    @IntrusionCM yeah, I‘ve read that. To be honest, I had troubles understanding the details. Like I know what backtracking is but I don‘t know why it needs to backtrack.

    The regex was not written by me and I have little interest in learning and understanding it.

    I‘d rather write the validation logic in code than having to deal with regex.

    Code is easier to read and easier to debug.

    The only advantage of regex that I see is that it can be provided by the backend to the different clients as a piece of validation logic written in a common language. A tiny piece of shared code, so to say.
  • 2
    @Lensflare that wasn't meant against you :)

    More an agreement that RegEx can be exclusively one of excellent or a doomsday machine.

    If it can be done via (few) simple string operations, I'd pick the string operations.

    RegExes only shine when it's either too complex for string operations or the matching is clean and precise…

    Or nothing matters, as (e.g. logging) you just fire a gatling gun in a fish barrel and hope to hit sth. XD
  • 1
    @Lensflare regex shines for well defined formats and for ad hoc parsing where you would not take the time to write a program.

    But if the regex starts to be complex it becomes VERY hard to know if it will work for all inputs and I agree, rolling your own custom string parsing is often better for readability and maintainability.

    But for simple field validation or finding possible data in text it can be an easy way, just make sure to keep it simple ;)
  • 1
    Regex is like a chainsaw. It can be intimidating and dangerous, but used properly it can power through many jobs that would have taken much longer and required much more effort.
Add Comment