1

Okay I'm about to do something diabolical

Anybody knows if reading beyond the bounds of an allocation is UB in llvm when I'm doing the read via inline assembly? 👀

Comments
  • 3
    Stop molesting those poor allocations!
  • 2
    why would you even have a need for such magic
  • 1
    @jestdotty It's useful when you're running some wide algorithm over variable length data, e.g. utf 8 decoding where instead of reading each byte on its own you read the entire (potential) 4 bytes as one uint. Obviously for the last 3 bytes this would require reading past the end of the allocation

    Though I think it's easier to just have both functions, one for the interior of the buffer and then one that handles the tail...
  • 1
    @12bitfloat if I understood that correctly I think I typically make arrays into buffers not the other way around
  • 2
    @12bitfloat You would usually only call the wide algo on the well-aligned middle, in fact, some algos have versions for multiples of 8 bytes too just so they don't need to bounds check after each 64-bit item.

    Any special reason you can't use SIMD?
  • 1
    @lorentz Yeah I think that makes the most sense. For some reason I was worried about branch misprediction on the two consecutive loops but the remaining size of your buffer will literally be in L1 if not a register

    In regards to SIMD, I haven't had a good idea for an efficient implementation yet. One major hurdle in utf8 decoding is that you have to smash the lower 6 bits of each of the up to 3 continuation bytes together. The only good ways i've found are the pext insn (which is microcoded and gigaslow on zen2 so I can't use it) and an unrolled loop

    I've used the latter and it's surprisingly fast with a few other tricks (LUT for the first byte + fast path for 1 byte sequences)
Add Comment