41

Dev: This could be sooooo easily optimized...

Me: Uhm. Don't think so. What's your idea?

Dev: Just use threads.

Me: Nope. Problem requires 3 shared resources per process step, it won't be faster by threading. Shared resource will only lead to locking contention, decreasing performance.

Dev: I don't think that will happen. Can you PROOF to ME that this will happen?

Me: It was your suggestion, so you should proof me wrong. Nice try, but no thanks.

Dev: Yeah, but it's too slow and it should run faster.

Me: If you cannot find a better approach than the current one, it runs as fast as it can while providing correct results. That's not slow. That's just working as intended and designed.

Dev: Yeah, but it's still slow.

....

You know these conversations where you just wanna rip some people's face off, stick it in the shit hole they use to talk and toss them out of the window....

Yeah. Had those conversations today.

Comments
  • 15
    Have you tried adding blockchain? AI? ML? DevOps? One of those has to be the solution, otherwise that sales rep wouldn't be talking about them all the time!
  • 4
    No problem - go ahead and add a ticket to the backlog, and estimate it. Ticket should cover:
    Some baseline measurments to show improvments upon.
    Creating a "close to production" benchmark test scenario to use in testing this.
    Improving the code however you like, Threads, Processes, CSP, parallel hamsters with mice - Whatever.

    Get Product and QA to agree to the change - And only then you can Optimize.

    Now go away.
  • 2
    I was actually on the "dev" side of this conversation before, except it wasn't about performance optimization but information encoding, and in the end I was actually right and rewrote the code to be much cleaner AND shorter with much less branching

    though the shared resources would be a problem with threading, but is there any chance that you can pre-fetch them or cache them to avoid or minimize that? Or doing it in piped parallelization so at least the individual steps outside of the shared resource space is done simultaneously while another shared resource is being accessed?
  • 7
    @magicMirror

    The thing that I hate here is the approach.

    Reminder that I'm an manager.

    If you want as a Dev an Ticket for researching / improvements, fine.

    But don't start such a bullshit conversation. Deliver some good input / feedback, convincing me you've thought about this for more than 5 minutes....

    And never pull the: Can YOU proof it to ME? card if you want ME to be convinced that you can solve the problem more efficiently.

    It's just a big fucking waste of time.

    I will not endorse such behaviour by e.g. creating tickets and wasting more time and money.

    You have a plan? Good. Convince me.
    You are bored? Go fuck the backlog tickets, but don't piss me off.

    @Hazarth

    The first part: Sounds jolly. Though I really hope you haven't started the conversation like in my rant (see above part).

    The second part: Now we're talking.

    You see the difference between your second part and the "it's slow"?

    You put some thought into it, you came up with something that shows you have thought about this for a few secs and you have some useable ideas.

    "It's slow. Throw threads at it" Isn't a usable idea.

    Just like "It's slow. Throw hardware at it".

    That's the part that I'm madly pissed about.

    Don't waste my fucking time if you have nothing concrete and useful to say. xD
  • 1
    @IntrusionCM aah that makes sense. Yeah no, when I had the conversation, I wasn't just throwing shade xD I was suggesting a better approach immediately from the start. Not to mention it was quite friendly cause it was with a colleague that was on the same level as me, so there was no power dynamic there, just a constructive talk and him being a bit stubborn because they did indeed put a lot of though into optimizing it, just missed something that was obvious to me at that time

    Yeah, I just wouldn't start a conversation with "uuuuh, it's slow, do better!" without having anything to contribute xD
  • 0
    @Hazarth These are conversations I like.

    The "Oops, you're right. That could be done differently" type of conversation.

    Then I'd definitely try to make sure the dev gets their sweet time frame to optimize it.

    Just for explanation... The problem at hand is a fine grained analysis of cardinality - finding duplicate values, counting occurrence of properties, etc.

    So each value of each property of each document in an database cluster needs to be analyzed ( ceiled up roughly 15 TiB of raw data ).

    The 3 shared resources I mentioned were for tracking property names, property distinct values and uniqueness of the property across dimensions (document level, index level, cluster level).

    One "could" try to make e.g. some kind of deep queue that just gets the necessary data asynchronously and uses some kind of e.g. worker or event system. That should circumvent the problem and make it maybe more easily to optimize (like your idea with caching).

    So gist is that "something" just gets the values, and evaluation is deferred.

    But I really think that this is a very touchy and hard to implement thing, as the whole thingamabob uses up to N processes to fetch and process the data on the index dimension.

    So each process is specific to an index - parsing N indices in parallel already made it good.

    If you now add a cache system in the background that deferres the evaluation, the cache system needs to deal with several terabyte of incoming data by multiple processes.

    That's my primary concern... I'm not saying it's impossible, but it's definitely a tricky thing to build. Plus I'm not sure if there is a "good net improvement" after doing this... For the simple reason that the whole caching system still needs to maintain / produce the fine grained statistics like before, so one just deferres the problem to a later stage imho.
  • 1
    @IntrusionCM Yeah, that's a lot of data for a local cache and too many hits for remote cache. I don't see any obvious way for threads to make this better. You'd probably lose any advantage due to communication overhead. Seems optimal to get all the analyzed properties for each property as you're going through it rather than splitting it any which way... hmm....

    would it be possible to merge the collected properties from separate documents or clusters? so you could process multiple docs and or clusters in parallel?

    But yeah, I'd need to see the actual setup and data to judge this better, but nothing obvious pops to mind :D
  • 0
    @Hazarth xD

    I always wanted to toy with e.g. Badger or anything similar that allows concurrency key value locally / embedded.

    I think your direction would be right - one needs to think a bit outside of the box.

    If one would e.g. use a prefixed database, one could simply build up the indexes / document dimension first and generate the cluster dimension last.

    Basically the cache idea on steroids.

    Prefixed database ala "indexName-indexWorker" and then store the distinct values of indexes per worker…

    That would speed up the document processing.

    Downside is that one has to be careful because of IO.

    Then one could stream the result of the generated databases and zip them.

    But yeah, it's a dangerous game. Hard to predict how much storage is needed and how IO performance is.

    E.g. with 32 processes, thus 32 indexes parallel… and 2 workers per index... That makes 64 databases that get filled on NVME and later read to generate the statistic.

    It's a nice idea, though I'm really terrified of implementing it, as it's one of those things where "looks" are deceiving.

    Noones like race conditions, transaction issues, etc. XD
  • 0
    Zip them as as creating an aggregation of distinct values for all created databases
  • 1
    Arguing is my favorite part of the job ngl
Add Comment