what the fucks per minute

Ranter

atheist

10816

Comments

11

atheist

10816

4y

Looks like someone has just blindly re-implemented what a data scientist wrote, so........

That's great.
16

atheist

10816

4y

The hilarious thing is a code review comment is like "this loop is implementing parallel reduce, there's a library version that doesn't preserve order" and it's not saying "Why the fuck are we doing this stupid shit with temp files?"
6

netikras

34599

4y

Why not just do sequential_writes...
10

atheist

10816

4y

@netikras I think you're overestimating the ability of our dev team.
8

PonySlaystation

20942

4y

optimizedn't? 😅
4

Fast-Nop

36813

4y

@netikras Why not just using memory buffers?
1

Hazarth

9179

4y

Because that's how harddrives work, yes
4

hjk101

5550

4y

The answer is: write to disk twice while incurring massive file system operation overhead. Kills performance every time... We can just write it once to the single file .

I mean if you really need have the small files in an archive I can understand this approach even though there are streaming solutions. But this is as stupid as it gets.
13

Lensflare

19726

4y

This is genius!
When you create as many threads as there are bytes in the file, you can write that file instantly, no matter its size!
2

vane

10560

4y

if those are small files why not use sqlite instead of files ?
they already implemented most of filesystem optimizations there

https://sqlite.org/fasterthanfs.htm...
4

atheist

10816

4y

Y'all assuming it's some super complicated use case, it's literally writing out a 2d matrix to a file, just with the original implementation written by someone that was good at maths, not high performance code, then someone else "making it faster" by doing the same thing in c++ instead of python.

The big difference between the c++ and python seems to be that the python didn't delete these temp files, instead had a comment suggesting the user deletes it. So they've automated the comment...
1

netikras

34599

4y

@Fast-Nop because memory buffers are not persistent?
You *may* use aio to flush them, but then you are never sure whether/when the flush is complete before doing another write. Also, additional hurdle of managing the buffer with aio... painful.
1

Fast-Nop

36813

4y

@netikras If the goal is to generate one final output file anyway, there's no point in persistent temporary files because they will be deleted anyway. You don't need aio for the buffers because you eliminate the io part from the equation.

Sure, you need the parallel threads to finish their work before you can batch up their individual result buffers into the result file, but that's just joining threads.
2

hjk101

5550

4y

@atheist hahaha. The good old everything is faster in C approach.
Just to spite the assholes you should refactor the Python one (or insert favourite language here). It will be magnitudes faster than the C++ one.

Don't know what the output is used for or how it is build up but a sqlite table or csv can be a 2d matrix. Unfortunately I need to build an an Excel file that is build up horizontally as well
1

netikras

34599

4y

@Fast-Nop missed the "lots of small files" part. My bad
3

atheist

10816

4y

@hjk101 I mean... It *is* faster in c++, the maths being done has gone from an hour to about 10 minutes, but apparently most of the remaining 10 minutes is file IO.

Like, my background is literally "make shit fast", I'm used to being pleased if I can save a few clock cycles by making something SIMD/vectorized, this stuff I'm like
* don't write this out to the HD twice. HDs are really really slow.
* pass vectors by reference, this is copying everything. Copies are slow.
* mutexes are slow, try to avoid them, here's a design pattern that makes it redundant.
0

hjk101

5550

4y

Wow the math in Python taking an hour vs 10 min. Either that is something that is super inefficient in python or it has a similar bad design.

It would surprise you that pointer dereferencing is can be slower than copying the blasted thing. It's actually a compiler optimisation strategy to remove the derefs.
1

Fast-Nop

36813

4y

@hjk101 There is a reason that heavy math lifting in Python usually goes via libs that are written in C, such as NumPy. You don't do that in Python directly. If the previous devs did, that explains why it was slow.
1

atheist

10816

4y

@hjk101 python is "not great" for threaded code (Python GIL). I'm also eyeballing the performance difference, it still takes 10 minutes to run in its current state, but it's still a lot faster than the previous implementation. We're talking 32 core machine.
1

atheist

10816

4y

@Fast-Nop It was some numpy stuff, but even then, threads.
2

atheist

10816

4y

@hjk101 re pointer dereferencing, not sure I follow. The copy would require a malloc which acquires a mutex. Mutex is slooowww.

I'm a very niche dev...
2

hjk101

5550

4y

@Fast-Nop yeah that was indeed what I meant in python crunching numbers works fine as long as you use optimised precompiled stuff where it matters.

@atheist ah didn't know it was that heavy on the parallelization. Also didn't know python was that poor at it. I always go to Go as my go-to language for that kinda stuff.
0

NeatNerdPrime

4147

4y

@atheist perhaps worth a shot to look at https://nifi.apache.org
1

atheist

10816

4y

Follow up: I was told that testing showed multithreaded IO was faster.

I just rewrote the IO to be 5x faster (single threaded vs single threaded).
1

Fast-Nop

36813

4y

@atheist I'd still guess that "no temporary IO" is even faster.
2

atheist

10816

4y

@Fast-Nop yup, but basically "building the strings to put into a file" was so slow it was benefitting from being multithreaded, even with the temporary files. It's now so much faster that it actually is faster to do single threaded without the temp files.
2

PonySlaystation

20942

4y

@atheist By "tempfile" I assume a file on a storage device.
Wouldn't it have been waaaaayyy faster to store the strings on the heap / system memory and then write everything to the target file in one go?
2

atheist

10816

4y

@PonySlaystation yes.

But that also overestimate the ability of our dev team.
0

atheist

10816

4y

@Demolishun I have in the past for writing out a lot of data for inspection (image analysis), but this just doesn't need it. 😅

Add Comment

Manager: our file IO is slow, any suggestions to make it faster?

Code: multithread writing to a few hundred small (temp) files then single thread combine to one big file and delete the temp files.

Eyes: bleeding

rant

Manager: our file IO is slow, any suggestions to make it faster? Code: multithread writing to a few hundred small (temp) files then single thread combine to one big file and delete the temp files. Eyes: bleeding

rant

what the fucks per minute

Manager: our file IO is slow, any suggestions to make it faster?

Code: multithread writing to a few hundred small (temp) files then single thread combine to one big file and delete the temp files.

Eyes: bleeding