Me: Optimize a sort & match method in backend because users complain it's a bit slow. Coworker: These algorithms ar

Ranter

bittersweet

45304

Comments

37

RememberMe

13617

5y

This. Also when people think just throwing more cores at a problem helps indefinitely. Amdahl's Law, folks. Performance scaling vs. other metrics is what you should be looking at.
38

Voxera

10883

5y

Quite impressive your coworker could see they where both O(n) and still fail to see the reason, that some very selective brilliance ;)
22

bittersweet

45304

5y

@RememberMe

Factorio is amazing at demonstrating things like this.

It's not just the amount of factories you build... it's also about belt speed, how fast your little robot inserter arms turn around to transfer items, how much fits in your chests, how often trains arrive at the local supply station, how big those trains are, how fast those trains drive, how energy/space efficient your whole factory is, etc.
15

bittersweet

45304

5y

@Voxera Yeah.

Many developers have this thing where they learn a new concept, and then over-apply it.

This guy just posts his expert Big-O analysis advice on absolutely everything, regardless of whether it actually makes sense.
7

ddit

570

5y

@bittersweet this is the exact reason why I love Factorio and played for large amount of hours
5

tiltedpanda

389

5y

Well, on inspection you can guess that it's O(n), but the computer architecture may implement it as O(n^2). Man some people do not fucking understand that big O is a gross surface level abstraction.
5

sladuled

6993

5y

@bittersweet Did you reeeeaaaalllyyy talk slow?! If yes, kudos to you.. I'd probably just yell..
3

hinst

1262

5y

using Zoom eww
12

LLAMS

3611

5y

If someone just said “nope” and closed my PR without consulting me first I think I would have something to say.
0

Lensflare

21821

5y

I get your point but in practice, big O is all that matters.
If you have a significant speed difference with the same O complexity, then your constant factor must be really huge. But how can you have such a huge constant factor?
That would require some intensive nonsense computations that do not depend on the input size, otherwise it wouldn’t be the same complexity.
I really can not think of an example where a constant factor would matter.
Unless, of course, we are talking about tiny input sizes.
11

bittersweet

45304

5y

@Lensflare In big-O, both O(1000n + 1000) and O(n) are both just O(n).

So if you have 100 items in a simple, singular for loop which takes 10ms, after which the result is reduced to a result in 100ms, the total operation takes 1.1 seconds. If you can reduce the iterations to 2ms and the reduce to 15ms, so the related user page ends up loading in 215ms, that's a significant improvement.

Of course it still SCALES at a pretty evil O(n) -- if your dataset becomes 2000 items, your page is 20 times as slow, because O(n) represents a linear relationship.

But you have sets which scale, like "the set of all registered users on my platform" and you have sets which will predictably never scale (much), such as "the set of all zipcode areas in my country".

So whether the constant factors (O(2n)) and less significant terms (O(n+10)) matter really depends on the case.

Sometimes, optimizing the performance of each iteration is very useful regardless of whether the scalability improves.
3

bittersweet

45304

5y

@halfflat Exactly. Performance and scalability are related, but separate concepts.

Especially if you can't escape the fact that you need some algorithm which scales terribly (like brute forcing a password), it becomes all the more important to make sure the iterations themselves are optimized.
0

Lensflare

21821

5y

@bittersweet I just don’t see how you would reduce 10ms to 2ms for each iteration in practice. I mean, what useless crap would the code have to be doing so that it could be optimized that extremely? I think it’s not realistic.

Of course you can think of any theoretical example where it would matter but in practice, it doesn’t.
2

bittersweet

45304

5y

@Lensflare You'd be surprised what you find in a Laravel codebase.

Sometimes, you're sorting user profiles, and they have 10 megabyte selfies attaches to them. Sometimes, you're trying to match two collections of objects, one of which calls a postgres database within every iteration through "magic", simply because you're reading a property. Sometimes you think you're handling some collection of strings, but in reality it's a LazyCollection of Closures which resolve to become strings.

Depends a lot on how opaque and "helpful" (read: apply complex magic for the sake of making things easy rather than simple) the framework is.
4

IntrusionCM

13820

5y

@bittersweet @Lensflare

What @bittersweet said applies to any language... Small nuances can make a huge difference.

Eg (scroll to perf for win) this gem...

https://kate-editor.org/post/2021/...

Most of the time, be it PHP / Python / C / Java / ... or whatever tickles ya fetish the o notation is a partial lie as you only evaluate what you're aware of.

What really goes on, down in the assemblied code, is hard to account for.

Eg. string functions. Do you account for the charset? UTF-8 vs ASCII vs UTF-32... Big difference - yet you might not be aware of it, as you look at the algorithm and not at the internal representation of the string and how the string get's treated.

There can be quite a lot of shenanigans going on under the hood - O notation most of the time only includes what you are aware of.

Looking at profiling, fuzzing input, tracing, toolchain changes and so on can open a whole other dimension.

To sum it up: O notation is an indicator - not a summary and not the full evidence.
2

Voxera

10883

5y

@bittersweet I love Factorio and Satisfactory :D
1

Voxera

10883

5y

@RememberMe It easier to understand than misconception since usually more is more, but also easier to prove by using easy examples like having a baby, it does not matter how many women there is, a baby still takes about 9 months ;)
0

ronswansonator

130

5y

@RememberMe so true. this thread reminds me of aggregate and window functions used in relational databases and how slow they can be depending upon design/implementation constraints. an example such as I want to see what's in a warehouse's inventory right this minute in an almost overly relational db model. calculating across zones, lots, pallets for customers and products using specific profiles that apply to any level is expensive. recursively checking if a profile rule applies takes time and recursion itself is slow. the point being is that there's a time and place for optimization and it's usually motivated by the visibility of performance degradation to end users
1

YADU

1347

5y

@Lensflare

Think of iterating over a vector vs a linked list.

Both O(n), but the vector is going to be sooooo much faster because you have so few cache misses compared with the linked list.

In fact, in practice, the constants involved in a linked list are so much bigger than those for a vector that you basically just want to use a vector all the time, even when the Big O for the linked list is better.

Add Comment

rant