Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
Reminds me of a famous story back in the old days where they used a profiler and even added new CPU instructions for the most heavily run code part. Afterwards - no speedup.
They had optimised the idle loop of the OS. -
@pk76 Don't just make new implementations to see whether that spot can squeeze some more performance or not :) That's a waste of time.
Yes, you have to make informed decisions. You have to know what tools you are using and what alternatives you have. You have to see where you might have taken a wrong approach and where the logic can be altered to perform faster.
profiler, threads monitor, fastthread.io -- these are the ABC tools we use as perf engineers :)
btw, which language? -
pk7611705y@netikras you just don't get it no matter how many times it's said, do you? I do those very things you're saying. There's one out of fourteen cases where after gathering insight and making an informed decision it was wrong. The other thirteen out of fourteen changes made gains.
I don't know how I can make this any clearer. -
-
@pk76 yeah, because I got the notifications flowing in from all the threads.. Sorry if I offended you somehow. Or made you feel harassed. Or attacked.
kiss and make up? -
endor56475yIf you haven't seen this talk already, you really should (even if you don't work with c++): https://youtu.be/r-TLSBdHe1A
Watch the whole thing, then get ready to pick up your jaw from the floor. -
endor56475y@pk76 I don't know what language you're working with, but perhaps there's a way for you to try something similar to what coz does, since ultimately all code ends up in the same cpu.
Making a single measurement of performance in a single environment (with all the vairable factors it comes with - including execution path and environment variables) is not enough to establish how much you *actually* sped up the execution. You need more measurements before you can say that your fix made the code run faster/slower/the same.
Maybe there's a way to scramble things around in other languages too?
(Though I realize that building one such tool from scratch is not a realistic expectation for one single guy who's working on something else already - but hey, food for thought!) -
pk7611705y@endor oh I'm not doing single machine measurements. I got five here, including two on different architectures. And when I make a claim about "X increase" I'm saying the average of those. If there's a massive variance I'm hesitant about even calling it an improvement. Two of these machines are "in regular use" and the other three are clean machines that only ever run benchmarks/tests and get reset after each time. It's not ideal, but it's hardly one single machine.
This project is in C#. -
pk7611705y@endor oh I should also add, the benchmarks are also taken on .NET Framework, .NET Core, and .NET Core AoT. I plan on adding Mono as well, but that requires much more set up. And are taken on three Windows machines and two Linux (no .NET Framework results there obviously).
It's not the extent of Stabilizer, which I would love to be using. But it's certain some more randomization instead of just being a quirk of something else. -
pk7611705y@endor so after looking into this, it seems like every .NET runtime does memory reorganization during GC phases, which should (in an ideal world) mean memory layout issues aren't going to happen, although AoT would still be subject to this and I'm not noting a huge variance (and in fact the kurtosis is nearly identical to the .NET Core results).
However other things coz/stabilize checks for are still potential pitfalls. I'll look into ways to check branch prediction misses and other things.
As a somewhat unrelated note I did find that JIT's in general are really bad at unrolling loops in a way that cooperates with the superscalar pipelines in basically every CPU now. So I can potentially get a boost in a major hotpath by manually unrolling to those sizes.
I spent four days doing a rewrite for a possible performance boost that yielded nothing.
I spent an hour this morning implementing something that boosted parsing of massive files by 22% and eliminated memory allocations during parsing.
Work effort does not translate into gains.
rant