gpt - @Wisecrack Dude, it seems someone has actually done 1bit Quant for a transformer model: https://arxiv.org/pdf/... - devRant

Ranter

Comments

0

Wisecrack

9302

2y

Wonder if this is academic or if it actually improves performance with only moderate sacrifices on loss. I read through it but don't know enough to say either way.

What do you think?
1

Hazarth

9179

2y

@Wisecrack I just skimmed it, but as far as I can tell it doesn't improve performance. But it's rather interesting how *little* the performance suffers considering how little precission is left. Not to mention the energy requirements are significantly smaller to train BitNet over a proper F16 network

Based on this paper it rather seems that parameter precission is not playing a big role in LLMs. Most of the information seems to be encoded in the complex structure of the resulting function and the nonlinearities applied between the neurons.

But I didn't look closely at the code, so I'm not sure if there's any hidden variables playing part :D

Related Rants

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service