3
Hazarth
1y

@Wisecrack
Dude, it seems someone has actually done 1bit Quant for a transformer model:

https://arxiv.org/pdf/...

Comments
  • 0
    Wonder if this is academic or if it actually improves performance with only moderate sacrifices on loss. I read through it but don't know enough to say either way.

    What do you think?
  • 1
    @Wisecrack I just skimmed it, but as far as I can tell it doesn't improve performance. But it's rather interesting how *little* the performance suffers considering how little precission is left. Not to mention the energy requirements are significantly smaller to train BitNet over a proper F16 network

    Based on this paper it rather seems that parameter precission is not playing a big role in LLMs. Most of the information seems to be encoded in the complex structure of the resulting function and the nonlinearities applied between the neurons.

    But I didn't look closely at the code, so I'm not sure if there's any hidden variables playing part :D
Add Comment