quiz

Ranter

JS96

18259

Comments

10

CoreFusionX

3618

3y

Depends on the string implementation of the language.

If the string is immutable and size is known as constant time, doing string concatenation is slow cuz new objects are created (tho probably elided in any decent language).

Fastest way IMO is create a buffer, and memcpy two blocks, then convert back to string.
2

hitko

2995

3y

Idk, maybe the second one is faster because it doesn't have to check the string length every time? Anyway, both are kinda slow because "padChar + str" needs to allocate a new string of len(str) + 1 and copy the contents over on each iteration. It would be much faster to create a fixed-length buffer of the final length and then fill it with the data rather than concatenating strings in a loop.
1

hippolyte

188

3y

I feel like the second one would be less efficient because there is one more operation per round, but I guess the C# compiler prefers for loops for optimizations ?
1

i-hate-algebra

9

3y

guess that algo2. Maybe even compiler unrolls the loop a bit
1

Grumm

1899

3y

Maybe I am stupid, but who not just use the C# PadLeft method ?

it is just 1 line :

string str = "your text";

char pad = '^';

string result;

result = str.PadLeft(9, pad);
2

Hazarth

9197

3y

Hmm, usually whiles perform a bit faster than for loops, but then again your for loop is comparing against 0 while the while is comparing two numbers. afaik zero comparisons can be faster.

Then again, it's almost impossible for me to tell what sort of optimizations does the C# compiler use, especially since .NET is using a virtual machine on top of your physical machine..

I'm gonna ho with Algo2 being faster, since the low level optimizations are more obvious and a compiler should be able to apply those more often.
4

hitko

2995

3y

@Grumm How about var result = str.PadLeft(9, '^');

You know, because there's no need to maintain 10 lines of unnecessarily verbose code? Also you missed the point completely, everyone knows the built-in method is probably the fastest and the one you should use in most real-world scenarios.
3

netikras

34646

3y

Algo 3:

char[] buff = new char[Math.max(minLength, str.length)];

for (int i=str.length; i>=0; i--) buff[i]=str.charAt(i);

for (int i=minLength-str.length; i>=0; i--) buff[i]=padChr;

return new String(buff);

// pseudocode, I'm on my phone. Java has nice helper methods System.arrayCopy() hiding all this for-loops mess. So effectively it would be a 4 LoC function
1

happygimp0

927

3y

If compiled naively (without fancy optimization) they are both slow because you move the string in memory every time. Both are O(n+m) where n is the length of the string and m the length of the padding.

To think about how to archive a faster result, you have to first think about what you exactly need. You could write a class where the accessed index is checked and if the index is lower than the padding bits, a padding byte is returned. That would make the creation O(1), but accessing it could be a bit slower [1].

Otherwise, it would be faster to create the buffer for the complete string first, memset() it with the padding char (by the way, is the padding char just 1 byte or can it be more?) and then copy the rest to the end.

[1] This also depends on the average size of the string. If you have extremely large strings, it would be faster because in this case it would reduce memory usage and therefore probably reduce cache misses. Well, it really depends what you need.
0

JS96

18259

3y

Very interesting answers so far.
Thank you for your time.

---
@Grumm of course in almost every language there is already a PadLeft function implemented with probably the best optimization possible, the point of this experiment was to have fun and guess which operations are faster, it wasn't about real utility.
---

Since the two algos use same string concatenation and arguments, the comparison is between:
1) While loop with length check of the string at every iteration.
2) For loop with precalculated number of iterations, but two additional int variables (leftPad and i) allocated in memory, and still a check of the status of i at every iteration.

As you already said probably there isn't just one valid answer because it can depend on the programming language used, but it's still interesting, in my opinion, also to think about that and wonder if for each programming language ever created the "standard" implementation of the "Pad" function has been made different to be well optimized, or was made once in the past and they just converted it without changing the logic.

---

I will wait for a few other comments and then post the results in C#.
3

CoreFusionX

3618

3y

@JS96

In the case of C#, and probably most compiled languages, two int variables make absolutely no difference since they are stack allocated and always in a cache line if not directly in registers.

Same with the for vs while. Both are very likely to compile to the same machine code.

Also, bear in mind that ensuring fair comparison, especially when both functions to test are part of the same assembly, is hard, since you don't know the state of cache lines or registers, and those can have a massive impact on perceived performance.
2

Lensflare

21912

3y

No one has mentioned it yet so I’ll do:
StringBuilder
1

CoreFusionX

3618

3y

@Lensflare

Last time I checked, both Java and C++ would compile these to a string builder implementation. It's fairly common when languages implement immutable reference typed strings that try really hard to act as value typed. (Even if that last part is only C#)
1

happygimp0

927

3y

C because i don't know C# well enough and i can't test it.

Fast creation (9 instruction, no branches no loops), maybe accessing it isn't so fast.
0

CoreFusionX

3618

3y

@happygimp0

This solution has implicit loops too.

You are explicitly passing in the string length, which in a C-string can't be known without looping the array. It would only be loop free if you know the string beforehand.
0

happygimp0

927

3y

@CoreFusionX Yes, you have to know the string length before, in most normal languages there is a type for string that stores the length. In the real world, when i code for hosted environments in C, for most string handling i use a type which also stores the length (and i think most not too small programs do that).
0

CoreFusionX

3618

3y

@happygimp0

Which brings us to the immutable, constant time size string case anyways 😂
0

happygimp0

927

3y

@CoreFusionX Not sure if i understand you correctly.

Just because strings are immutable doesn't mean time would be constant. And you don't need immutable strings to be able to store the length.
0

CoreFusionX

3618

3y

@happygimp0

If strings are immutable, size can be accessed in constant time since it can be precomputed. Which is what your struct implicitly does.

Since you don't provide a mutation function, they are assumed to be immutable.

Which brings us to the initial situation.
0

happygimp0

927

3y

@CoreFusionX But you can also access the size in constant time when they are mutable, they don't have to be immutable to do that, assuming you, your library or the language you use use a smart way to store them. It is as simple as storing a integer with the size and if the string changes, this integer will be updated.

I didn't provide a string library, correct. But it was only about padding them. Again, the question is what you want to do with the string (and also, from where you get the input). You can adapt this padding function to existing string libraries, many of them already have something like this.
1

JS96

18259

3y

---
C# Test
---

TL;DR:
The difference is irrelevant... but algo 2 wins slightly (not with the standard PadLeft though).
The more you increase the "minLength" the more the gap between 1 and 2 grows, ofc, but we're already talking about impossible scenarios.

- Test condition:
It was very difficult to make a proper test due to various factors (e.g. can't have just one assembly, the difference is so small the results change based on the order of execution, etc.).
The test has been made using 3 separate assemblies.

str = "5"
minLength = 100.000
padChar = '0'
# of tests = 1.000 per algo

- Results:

1) Algo 1 (While loop)
Max 1 = 00:00:00.9270011
Avg 1 = 00:00:00.6304220
Min 1 = 00:00:00.5910015

2) Algo 2 (For loop)
Max 2 = 00:00:00.9000194
Avg 2 = 00:00:00.6228235
Min 2 = 00:00:00.5890271

3) Algo 3 (Use of C# String.PadLeft())
Max 3 = 00:00:00.0040007
Avg 3 = 00:00:00.0001000
Min 3 = 00:00:00

- Conclusions:

Use string.PadLeft()...
0

CoreFusionX

3618

3y

@happygimp0

Not really. A mutable string does not have an amortized size access constant cost.

It becomes O(n) because you can't know if the string had been mutated, and every mutation involves a recalculation of size, which is O(n)
1

happygimp0

927

3y

@CoreFusionX Well, no. This is just completely false.

The only thing you have to do is to have a variable where you store the size of the string. If the string gets changed (mutated), you have to update this variable. That isn't that hard.
0

i-hate-algebra

9

3y

regarding using builtins

i recommend watching this (or other vids of Andrei)

https://youtube.com/watch/...

built-in fuctions are great in general case, but if you know something more about nature of your data, you can try to write faster code (you know something about your data that the implementators of standard libraries do not know)
0

CoreFusionX

3618

3y

@happygimp0

And you update it how? With a loop, which is O(n), and the only way to know the size of a C string.
1

happygimp0

927

3y

@CoreFusionX Depending on the source.

If you read from a file, lets say you use read() syscall, you get the size returned.

If you do if from a hardcoded string, the compiler knows the length and can optimize strlen() away, or you use a language where the compiler stores the string length automatically.

If you have a string and append or remove a single character, you can just increment or decrement this variable (of course you have to do other housekeeping, like allocating more memory).

If you combine 2 strings, you just add the size of both together.

If you have a string and want to split it based on a character, you have to search for that character anyway and need a loop.

...

I don' use plain C strings for this example. Most of my real world C code that handles strings used some sort of class (struct and a bunch of functions that can access it) to handle strings, and this struct contains a length information.
0

lorentz

15394

3y

I would've guessed them to be exactly identical, I'm surprised that C# doesn't cache the length of strings.
1

Grumm

1899

3y

@JS96 Here is my 2 cents.

The first method is mine. Algo1 and Algo2 are both yours.
2

Grumm

1899

3y

I added the default PadLeft and that one is still 4 times faster.

They use a span and the fill function to create an object with the correct size.

Then dotnet do some magic with Memmove xD

My second plan was to do the same. Like create a string with padding chars. Then use Take and Union but those are heavier operations than a for loop or a while...
0

JS96

18259

3y

@Grumm nice one!
0

Voxera

10883

3y

I would fail both due to looping string concatenation, that is almost always a bad strategy.

For very short paddings it might be equal to other solutions but its still bad practice.
0

Voxera

10883

3y

@Grumm that was what I expected the built in would do and if I had to do one my self I would have used span.

The memcopy I think is a result of the span, or actually, its how most interactions with a span is performed under the hood.

Span is just a safe way to show your intent so the compiler can ensure that using an otherwise unsafe memcopy is still safe to do.
1

Grumm

1899

3y

@Voxera It can get complicated very fast once you start at bit level xD

My version is not perfect too. There needs to be a lot more checks on the string.

the build-in padLeft does it all.
1

Voxera

10883

3y

@Grumm yes direct memory access is fast but there is no safety net ;).

But span is supposed to give you the power while still provide a safe way.

My experience comes mainly from building a safe string library for C back in the late 80’s, early 90’s, before C++ really took over.

Related Rants

Add Comment

question

game

algo race

padleft