server

Ranter

azuredivay

1224

Comments

2

IntrusionCM

13981

2y

I'm trying to be understandable ...

But what the hell do you mean by "cut short long running CPU intensive session-threads"...

You really think a kernel would terminate a process thread based on its duration, did ya?
0

asgs

10963

2y

There is an oom killer but no cputhread hogger killer

Perhaps, the regexp jmpl could provide configurations to specify but there are no perfect defaults as usual
1

lungdart

3513

2y

.*(?:.*=.*)
1

azuredivay

1224

2y

@IntrusionCM thats what IIS allowed you to do, if a user/session thread took up too much CPU time or memory, it'd simply terminate it so it doesnt affect others
On a OS level perhaps not then, but the server-side does try to ensure service-availability over handling each request to completion

@asgs wouldn't long-running-threads trigger a flag? this is why they're usually wrapped in CancellationTokens with an upper bound of time in .NET (so something like regex which you know should at max take 2-3s, if takes more than that, cancel it and return default-true/false)

my assumption was that anything "server-OS" related would internally follow the same mantra of prioritising service-availability
1

CoreFusionX

3566

2y

@azuredivay

You are aware that any CPU process that is not IO bound or V-synced will always take all available CPU power, right?

The OS killing such processes would be utterly retarded.

A *webserver* can make such assumptions and handle subprocesses it launches.

An OS can not afford that.

(And by the way, if people bothered to read the vulnerability warnings nom gives them they would learn about regex DoS way earlier.)
1

azuredivay

1224

2y

@CoreFusionX Yes but an OS whose sole job is to handle 1 webserver with all its CPU/memory/networking, doing nothing being a backend, I assumed would be aligned to how a webserver behaves (flagging to differenciate between its processes vs timed-server ones)

And if no such OS with tweaked behaviour exists, then oh well

Id assume companies would want to milk the most out of their machines and would decidate a few kernel-tweaked derivates to run on a baremetal doing their 1-job and doing it most efficiently
1

CoreFusionX

3566

2y

@azuredivay

Again, that's the *webserver's* job if anything.

OSs can not afford to make assumptions about the processes they manage.

And you can most definitely configure this in basically any somewhat mature cgi-bin gateway.
1

azuredivay

1224

2y

@CoreFusionX hmm
In one of the companies I worked with, the DB team tweaked the OS kernel with proprietary code and wrote their own lean DBMS to make the internal distributed DB fast, since time-to-respond to a request directly affected their income (b2b)

So assumed the same would be done for a server OS who's handling long running requests
Not in configuration layer but at the core

But i guess not
1

CoreFusionX

3566

2y

@azuredivay

You *could*, of course. But then what's the point of an OS that can only do one thing?

Let the OS handle processes, which is what they are supposed to do, and let those processes manage their own children if they need to.

In your example, say, if you wanted to ssh into the machine to diagnose whatever, boom. Your connection gets dropped since the process is long (potentially forever) running.

Do you see the clownship in that?
1

azuredivay

1224

2y

@CoreFusionX thats how it is for high-perf single-use servers though

A few of the machines we had, no ssh nothing, u make a YAML conf that goes into their semi-writable ROM, takes ages to boot, but once it's on, it doesnt shut down and u cant touch it

it does its thing as part of the process pipeline
if u want to diagnose anything u rely on logs it writes to a diff ssh-able server

anyway i get ur point but if u "can" do it, thats more than enough for me

Il try to find if anyone's done that yet
1

IntrusionCM

13981

2y

@azuredivay k, there are ways to achieve your behaviour, especially in Linux.

For example CGroups.

But killing things is usually avoided. Even the OOM Killer is usually disabled.

Seems like you had some interesting work places :-) 👍

The trouble with RegEx is - which is why it cannot be solved imho in implementation - a RegEx is a simple grammar.

A regex directed engine just takes the grammar and then tries to find part by part of a match in the input string - going backwards if a match fails, otherwise forward to the next expression.

There is no way to short circuit this.

One can only avoid patterns which increase backtracking and - most importantly - don't stick shit into a RegEx engine.

By shit I mean literally shit.

E.g. password validation: Adding a check that the password shouldn't be a 2 Megabyte emoticon filled blob is a great idea.

Many CMS filter for that reason passwords to a specific length - as either RegEx or later cryptographic hashing could lead to severe resource usage if the input has no size constraint.

The problem is two fold: Time and resources.

One can limit the resource usage and the duration by e.g. shutting down threads exceeding a specific runtime...

But given that today you can get a Raspberry PI farm with hundreds of sim cards for fun... The attempt to apply resource constraints is mostly futile.

As soon as enough clients poke the hole, the system will just be swamped and not usable.

As the resource constraints pose a lot of risks, for example unwanted shutting down of processes or underutilization of hardware, the approach isn't common in my opinion.
0

azuredivay

1224

2y

@IntrusionCM I see, so it's more on "avoiding it as much as possible" with prevention by design

coz if someone really wishes to clog your server, they will lol

Il take RegEx more seriously after all this discussion ._.
I was going to push all live-user-created data through RegEx defined by Admin(s) for a pet project without much thought on stress,
thinking either OS or my personal server-contraints will keep em in check
1

IntrusionCM

13981

2y

@azuredivay

https://regular-expressions.info/re...

The site not only explains ReDos in great detail, but also regular expressions in general.

The problem though isn't RegEx per se (usual rule applies: use a tool when it's needed and does the job well, not to be cool) - rather a general "more defensive" approach regarding programming.

Timing / resource attacks are pretty common. Be it RegEx, hashing of input like a password, a streaming file upload, a comment box for text, ...

When a user provides input, assume that they want to murder you.

Zero trust is a good advice imho.

RegExes are useful. But only if string operations cannot do the job or do the job too inefficiently, and only if the input of the regex can be defined in a sane manner.

E.g. email validation by regex is utter nonsense...

Checking wether a string consists only of blanks by regex is nonsense, just iterate and break etc.
0

azuredivay

1224

2y

@IntrusionCM I thought of giving admins the power of censoring comments via their own regex

coz saves me from the trouble of paying-for and calling APIs to stop ppl from spamming objectionable words

was thinking of extending it to live websockets, but seems like id def have to rethink that ~_~

media/file-uploads and simple text inputs I spent my time sanitising, quarantining and limting the time/size/number of requests

regex being a possible exploit came outa the blue lol

will re-think the process :3
1

IntrusionCM

13981

2y

@azuredivay the easiest way to spam filtering based on word tokens is tokenization by utilising an inverted index.

Other attempts utilize an variant of Bayer Moore algorithm.

Hashmap of forbidden words initialized once, each token checked if present in hashmap for example - Lucene makes this easier, but in the end: same approach.