Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API

From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
@jestdotty It's useful when you're running some wide algorithm over variable length data, e.g. utf 8 decoding where instead of reading each byte on its own you read the entire (potential) 4 bytes as one uint. Obviously for the last 3 bytes this would require reading past the end of the allocation
Though I think it's easier to just have both functions, one for the interior of the buffer and then one that handles the tail... -
@12bitfloat if I understood that correctly I think I typically make arrays into buffers not the other way around
-
@12bitfloat You would usually only call the wide algo on the well-aligned middle, in fact, some algos have versions for multiples of 8 bytes too just so they don't need to bounds check after each 64-bit item.
Any special reason you can't use SIMD? -
@lorentz Yeah I think that makes the most sense. For some reason I was worried about branch misprediction on the two consecutive loops but the remaining size of your buffer will literally be in L1 if not a register
In regards to SIMD, I haven't had a good idea for an efficient implementation yet. One major hurdle in utf8 decoding is that you have to smash the lower 6 bits of each of the up to 3 continuation bytes together. The only good ways i've found are the pext insn (which is microcoded and gigaslow on zen2 so I can't use it) and an unrolled loop
I've used the latter and it's surprisingly fast with a few other tricks (LUT for the first byte + fast path for 1 byte sequences)
Related Rants
Okay I'm about to do something diabolical
Anybody knows if reading beyond the bounds of an allocation is UB in llvm when I'm doing the read via inline assembly? 👀
rant
llvm
low level fuckery