17

Buffer usage for simple file operation in python.

What the code "should" do, was using I think open or write a stream with a specific buffer size.

Buffer size should be specific, as it was a stream of a multiple gigabyte file over a direct interlink network connection.

Which should have speed things up tremendously, due to fewer syscalls and the machine having beefy resources for a large buffer.

So far the theory.

In practical, the devs made one very very very very very very very very stupid error.

They used dicts for configurations... With extremely bad naming.

configuration = {}
buffer_size = configuration.get("buffering", int(DEFAULT_BUFFERING))

You might immediately guess what has happened here.

DEFAULT_BUFFERING was set to true, evaluating to 1.

Yeah. Writing in 1 byte size chunks results in enormous speed deficiency, as the system is basically bombing itself with syscalls per nanoseconds.

Kinda obvious when you look at it in the raw pure form.

But I guess you can imagine how configuration actually looked....

Wild. Pretty wild. It was the main dict, hard coded, I think 200 entries plus and of course it looked like my toilet after having an spicy food evening and eating too much....

What's even worse is that none made the connection to the buffer size.

This simple and trivial thing entertained us for 2-3 weeks because *drumrolls please* none of the devs tested with large files.

So as usual there was the deployment and then "the sudden miraculous it works totally slow, must be admin / it fault" game.

At some time it landed then on my desk as pretty much everyone who had to deal with it was confused and angry, for understandable reasons (blame game).

It took me and the admin / devs then a few days to track it down, as we really started at the entirely wrong end of the problem, the network...

So much joy for such a stupid thing.

Comments
Add Comment