Ranter
Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Comments
-
I think out has something to do with storage and the fact that there are billions of characters out there in all kinds of languages. Not sure though, but I've read somewhere that UTF8 reserves 2 to 4 bytes for each character. And UTF16 even more. Something with databases? But to be honest: I haven't got a clue.
But everything exists for a reason and everything has pro's and con's. Is there a charset-expert in the house? -
somebody7318y@kanduvisla UTF-8 has variable character lenght. US characters take one byte. Extended latin characters like 'čřá' take two bytes. Some asian character can take four bytes. Maximum is six I think. So depending on what you write, there is an overhead. Processing is also a little bit harder due to variable lenght, but it is nice from backward compatibility point of view. UTF-16 is fixed length, but uses two bytes all the time even for 7bit ascii. Similarly UTF-32. So for some Asian countries it might be more effective to use UTF-16 as they would use less space and it is easier to process. Maybe there will be case for UTF-32 as well. And old encodings like ISO8859-[2-...] managed to squeez all characters people cared about into one byte.
-
@miska thanks! by the way. reading your comment it looks like the Devrant database has some encoding issues as well. 😁
-
somebody7318y@kanduvisla I just put there examples from my native language :-) Those are real characters. I don't have means to type some kanji :-D Would have to search and copy paste.
-
Why isn't UTF16 the standard? Stop oppressing the Mandarin and Cyrillic typefaces, shit lord.
Related Rants
FUCKING ENCODING SHITHEADS!
WHY ISN'T UTF-8 THE FUCKING DEFAULT EVERYWHERE!
§¥~@#&•…≈!
undefined
encoding