6
azuredivay
280d

TIL "Regular Expression DDOS" is a thing
I thought OS/server would be smart enough to cut short long running CPU intensive session-threads without affecting others, thats their job after all

I overestimate the OS-level I guess :v

https://security.snyk.io/vuln/... //ref

Comments
  • 2
    I'm trying to be understandable ...

    But what the hell do you mean by "cut short long running CPU intensive session-threads"...

    You really think a kernel would terminate a process thread based on its duration, did ya?
  • 0
    There is an oom killer but no cputhread hogger killer

    Perhaps, the regexp jmpl could provide configurations to specify but there are no perfect defaults as usual
  • 1
    .*(?:.*=.*)
  • 1
    @IntrusionCM thats what IIS allowed you to do, if a user/session thread took up too much CPU time or memory, it'd simply terminate it so it doesnt affect others
    On a OS level perhaps not then, but the server-side does try to ensure service-availability over handling each request to completion

    @asgs wouldn't long-running-threads trigger a flag? this is why they're usually wrapped in CancellationTokens with an upper bound of time in .NET (so something like regex which you know should at max take 2-3s, if takes more than that, cancel it and return default-true/false)

    my assumption was that anything "server-OS" related would internally follow the same mantra of prioritising service-availability
  • 1
    @azuredivay

    You are aware that any CPU process that is not IO bound or V-synced will always take all available CPU power, right?

    The OS killing such processes would be utterly retarded.

    A *webserver* can make such assumptions and handle subprocesses it launches.

    An OS can not afford that.

    (And by the way, if people bothered to read the vulnerability warnings nom gives them they would learn about regex DoS way earlier.)
  • 1
    @CoreFusionX Yes but an OS whose sole job is to handle 1 webserver with all its CPU/memory/networking, doing nothing being a backend, I assumed would be aligned to how a webserver behaves (flagging to differenciate between its processes vs timed-server ones)

    And if no such OS with tweaked behaviour exists, then oh well

    Id assume companies would want to milk the most out of their machines and would decidate a few kernel-tweaked derivates to run on a baremetal doing their 1-job and doing it most efficiently
  • 1
    @azuredivay

    Again, that's the *webserver's* job if anything.

    OSs can not afford to make assumptions about the processes they manage.

    And you can most definitely configure this in basically any somewhat mature cgi-bin gateway.
  • 1
    @CoreFusionX hmm
    In one of the companies I worked with, the DB team tweaked the OS kernel with proprietary code and wrote their own lean DBMS to make the internal distributed DB fast, since time-to-respond to a request directly affected their income (b2b)

    So assumed the same would be done for a server OS who's handling long running requests
    Not in configuration layer but at the core

    But i guess not
  • 1
    @azuredivay

    You *could*, of course. But then what's the point of an OS that can only do one thing?

    Let the OS handle processes, which is what they are supposed to do, and let those processes manage their own children if they need to.

    In your example, say, if you wanted to ssh into the machine to diagnose whatever, boom. Your connection gets dropped since the process is long (potentially forever) running.

    Do you see the clownship in that?
  • 1
    @CoreFusionX thats how it is for high-perf single-use servers though

    A few of the machines we had, no ssh nothing, u make a YAML conf that goes into their semi-writable ROM, takes ages to boot, but once it's on, it doesnt shut down and u cant touch it

    it does its thing as part of the process pipeline
    if u want to diagnose anything u rely on logs it writes to a diff ssh-able server

    anyway i get ur point but if u "can" do it, thats more than enough for me

    Il try to find if anyone's done that yet
  • 1
    @azuredivay k, there are ways to achieve your behaviour, especially in Linux.

    For example CGroups.

    But killing things is usually avoided. Even the OOM Killer is usually disabled.

    Seems like you had some interesting work places :-) 👍

    The trouble with RegEx is - which is why it cannot be solved imho in implementation - a RegEx is a simple grammar.

    A regex directed engine just takes the grammar and then tries to find part by part of a match in the input string - going backwards if a match fails, otherwise forward to the next expression.

    There is no way to short circuit this.

    One can only avoid patterns which increase backtracking and - most importantly - don't stick shit into a RegEx engine.

    By shit I mean literally shit.

    E.g. password validation: Adding a check that the password shouldn't be a 2 Megabyte emoticon filled blob is a great idea.

    Many CMS filter for that reason passwords to a specific length - as either RegEx or later cryptographic hashing could lead to severe resource usage if the input has no size constraint.

    The problem is two fold: Time and resources.

    One can limit the resource usage and the duration by e.g. shutting down threads exceeding a specific runtime...

    But given that today you can get a Raspberry PI farm with hundreds of sim cards for fun... The attempt to apply resource constraints is mostly futile.

    As soon as enough clients poke the hole, the system will just be swamped and not usable.

    As the resource constraints pose a lot of risks, for example unwanted shutting down of processes or underutilization of hardware, the approach isn't common in my opinion.
  • 0
    @IntrusionCM I see, so it's more on "avoiding it as much as possible" with prevention by design

    coz if someone really wishes to clog your server, they will lol

    Il take RegEx more seriously after all this discussion ._.
    I was going to push all live-user-created data through RegEx defined by Admin(s) for a pet project without much thought on stress,
    thinking either OS or my personal server-contraints will keep em in check
  • 1
    @azuredivay

    https://regular-expressions.info/re...

    The site not only explains ReDos in great detail, but also regular expressions in general.

    The problem though isn't RegEx per se (usual rule applies: use a tool when it's needed and does the job well, not to be cool) - rather a general "more defensive" approach regarding programming.

    Timing / resource attacks are pretty common. Be it RegEx, hashing of input like a password, a streaming file upload, a comment box for text, ...

    When a user provides input, assume that they want to murder you.

    Zero trust is a good advice imho.

    RegExes are useful. But only if string operations cannot do the job or do the job too inefficiently, and only if the input of the regex can be defined in a sane manner.

    E.g. email validation by regex is utter nonsense...

    Checking wether a string consists only of blanks by regex is nonsense, just iterate and break etc.
  • 0
    @IntrusionCM I thought of giving admins the power of censoring comments via their own regex

    coz saves me from the trouble of paying-for and calling APIs to stop ppl from spamming objectionable words

    was thinking of extending it to live websockets, but seems like id def have to rethink that ~_~

    media/file-uploads and simple text inputs I spent my time sanitising, quarantining and limting the time/size/number of requests

    regex being a possible exploit came outa the blue lol

    will re-think the process :3
  • 1
    @azuredivay the easiest way to spam filtering based on word tokens is tokenization by utilising an inverted index.

    Other attempts utilize an variant of Bayer Moore algorithm.

    Hashmap of forbidden words initialized once, each token checked if present in hashmap for example - Lucene makes this easier, but in the end: same approach.
Add Comment