Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "unicode support"
-
this.title = "gg Microsoft"
this.metadata = {
rant: true,
long: true,
super_long: true,
has_summary: true
}
// Also:
let microsoft = "dead" // please?
tl;dr: Windows' MAX_PATH is the devil, and it basically does not allow you to copy files with paths that exceed this length. No matter what. Even with official fixes and workarounds.
Long story:
So, I haven't had actual gainful employ in quite awhile. I've been earning just enough to get behind on bills and go without all but basic groceries. Because of this, our electronics have been ... in need of upgrading for quite awhile. In particular, we've needed new drives. (We've been down a server for two years now because its drive died!)
Anyway, I originally bought my external drive just for backup, but due to the above, I eventually began using it for everyday things. including Steam. over USB. Terrible, right? So, I decided to mount it as an internal drive to lower the read/write times. Finding SATA cables was difficult, the motherboard's SATA plugs are in a terrible spot, and my tiny case (and 2yo) made everything soo much worse. It was a miserable experience, but I finally got it installed.
However! It turns out the Seagate external drives use some custom drive header, or custom driver to access the drive, so Windows couldn't read the bare drive. ffs. So, I took it out again (joy) and put it back in the enclosure, and began copying the files off.
The drive I'm copying it to is smaller, so I enabled compression to allow storing a bit more of the data, and excluded a couple of directories so I could copy those elsewhere. I (barely) managed to fit everything with some pretty tight shuffling.
but. that external drive is connected via USB, remember? and for some reason, even over USB3, I was only getting ~20mb/s transfer rate, so the process took 20some hours! In the interim, I worked on some projects, watched netflix, etc., then locked my computer, and went to bed. (I also made sure to turn my monitors and keyboard light off so it wouldn't be enticing to my 2yo.) Cue dramatic music ~
Come morning, I go to check on the progress... and find that the computer is off! What the hell! I turn it on and check the logs... and found that it lost power around 9:16am. aslkjdfhaslkjashdasfjhasd. My 2yo had apparently been playing with the power strip and its enticing glowing red on/off switch. So. It didn't finish copying.
aslkjdfhaslkjashdasfjhasd x2
Anyway, finding the missing files was easy, but what about any that didn't finish? Filesizes don't match, so writing a script to check doesn't work. and using a visual utility like windirstat won't work either because of the excluded folders. Friggin' hell.
Also -- and rather the point of this rant:
It turns out that some of the files (70 in total, as I eventually found out) have paths exceeding Windows' MAX_PATH length (260 chars). So I couldn't copy those.
After some research, I learned that there's a Microsoft hotfix that patches this specific issue! for my specific version! woo! It's like. totally perfect. So, I installed that, restarted as per its wishes... tried again (via both drag and `copy`)... and Lo! It did not work.
After installing the hotfix. to fix this specific issue. on my specific os. the issue remained. gg Microsoft?
Further research.
I then learned (well, learned more about) the unicode path prefix `\\?\`, which bypasses Windows kernel's path parsing, and passes the path directly to ntfslib, thereby indirectly allowing ~32k path lengths. I tried this with the native `copy` command; no luck. I tried this with `robocopy` and cygwin's `cp`; they likewise failed. I tried it with cygwin's `rsync`, but it sees `\\?\` as denoting a remote path, and therefore fails.
However, `dir \\?\C:\` works just fine?
So, apparently, Microsoft's own workaround for long pathnames doesn't work with its own utilities. unless the paths are shorter than MAX_PATH? gg Microsoft.
At this point, I was sorely tempted to write my own copy utility that calls the internal Windows APIs that support unicode paths. but as I lack a C compiler, and haven't coded in C in like 15 years, I figured I'd try a few last desperate ideas first.
For the hell of it, I tried making an archive of the offending files with winRAR. Unsurprisingly, it failed to access the files.
... and for completeness's sake -- mostly to say I tried it -- I did the same with 7zip. I took one of the offending files and made a 7z archive of it in the destination folder -- and, much to my surprise, it worked perfectly! I could even extract the file! Hell, I could even work with paths >340 characters!
So... I'm going through all of the 70 missing files and copying them. with 7zip. because it's the only bloody thing that works. ffs
Third-party utilities work better than Microsoft's official fixes. gg.
...
On a related note, I totally feel like that person from http://xkcd.com/763 right now ;;21 -
Working on Unicode support for Linux Terminal apps, and I output an Emoji smiley face. The emulator I'm running (Termius SSH client) rendered it fine, but once the application exited, half the smiley face was left there as graphical garbage for some reason XD
Resetting my terminal did nothing, scrolling up and down did nothing... it was burned into my terminal for the rest of the session.
This is what I get for performing the unholy act of adding Unicode to terminals.6 -
Unicode support pl0x.
So I had an Windows account with AzureAD, and my real name has "ő" and "ó" in it, and software that did not support Unicde started flipping the fuck out.
I was intially going with junctioning every bullshit corrupted user folder name that showed up in the ENOENTs to my real user folder, but that didn't solve it for a couple of software.
I was trying to share my drives with Docker, but the same shit occurred. No error message, it just didn't work. I ended up creating a new user account for Docker to share the drive with.
I was trying to use the Travis CLI to set up releases, etc., but it replaced the "ő" with "?". Y U DO THAT?! Common knowledge is that "?" and other special characters cannot be in entity names. SO WHY DO YOU REPLACE THE UNKNOWN CHARACTER IN A PATH WITH THAT? And it wasn't a character not found character either! It was just a straight question mark.
I ended up creating a new user account because I couldn't change the name of the current one because fuck AzureAD, and Windows just decided to FUCKING TRASH MY ACCOUNT. I went over to the new one, copied over some files from the old one, tried to go back to the old one to copy env variables, but I noticed that the account has been purged from the registry... At least the files haven't been deleted.
I ended up reinstalling Windows.
After all my frustration, I recommend all companies with a CLI to visit the following website: http://uplz.skiilaa.me/
Thanks.1 -
Working on a project with third party web developers who don't know multiple language support, difference between UTF-8 and Unicode.
Other than that, life's good4 -
I hate it when apps I enjoy get no love or updates after being acquired by a bigger company. Why did you buy them in the first place? Skitch still cannot support Unicode cause Evernote doesn't care :-(3
-
!rant from a support guy
I was tasked to migrate an Exchange 2003 server (yes, those are still used) for an upcoming Office 365 deployment. There are no direct upgrade path from one another, as far as we know
My task was to export PSTs from mailboxes. Great, a native tool exist for that in 2003 (exmerge). But only for less than 2 GB mailboxes because ANSI/Unicode! Half of our mailbox busts that limit. Oh, it seems Exchange 2007 has a PowerShell command for exporting to PST as well! But pre-SP3, that command relies on a local installation of Outlook on the server (DAFUQ), and has been superseded by another "standalone" powershell command. So I install a bogus Windows 2012 server only for that purpose, with Exchange Management Tools (which, by the way, is bundled with the Exchange installation setup and REQUIRES to have IIS installed on the target machine. Also, if you install ONLY the Exchange 2007 Management Tools and wish to uninstall them afterwards, you can't because the uninstaller wants me to select an Exchange Role to remove, which are all unchecked in my tools-only setup). Never worked, and Google-fu says that the newer Exchange 2007 New-MailboxExportRequest command seems to have removed Exchange 2003 support.
So i'm back to installing a pre-SP3 Exchange 2007. Then the older Export-Mailbox powershell command whines about 64bits and 32bit incompatiblity-- actually I ***HAVE*** to have the whole OS/software stack 32bit ONLY. Don't ask me why!
Some article I found says I could fire up an XP virtual machine for that, I go for Win 7 x86. "Sorry, Microsoft Exchange won't be installed on a workstation environment because reasons." All right then, let's go for an old Windows Server 2003 x86. Have you tried to boot this up in an Hyper-V environment where mouse and keyboard support for Windows Server 2003 are apparently optional? No keyboard AND mouse events sent to the guest machine at all.
* Sigh *, let's use a Windows Server 2008, but WATCH OUT! Microsoft has discontinued x86 support on their W2008 R2 release, so non-R2 for me. Even then, mouse event wasn't sent until I installed guest additions.
After all, export-mailbox ended up working, but that costed me two days of banging my head against the wall. (Oh, and I take internal calls inbetween as well...)
And that's why I aspire to be a programmer. Thank you for nothing, Microsoft!4 -
FUCK YOU EMOJIS! FUCK YOU AND YOUR EVER FUCKING GOD DAMN SPECIAL WAY OF BEING HANDLED.
Now that I have that part out...
I really fucking hate emoji at this time. Currently I'm working on one of my projects that has markdown support. One of the things I'm extendending the parser with is github style emoji (eg. :smile:) now this part works great. The problem however is getting that short code into a unicode char for HTML. And at the same time I have to take any unicode emoji inserted into the text box by phones and stuff and convert them into the shortcode (My database does support emoji but it's much nicer to store all emoji with the same standard)
All of this has taken 5 hours of research (needed a database of unicode -> short names) and several hours of converting the data from someone elses json into something I can use. (AKA Shrinking the damn file to only what I need) and now I've spent 5 more hours working on the actual code. And I still don't have it working properly.3 -
TLDR;
Side project update.
Made simple nlp library in python and published it’s first version to open source.
Now I can feed it with parsed pdf text.
See rant https://devrant.com/rants/2192388/...
Why ?
Cause during reading book about nltk I couldn’t find simple extendible way to provide support for polish language and I wanted to abstract stemming, word normalization, tokenizer etc. so I can provide ex. different conditions for separate text files and don’t write much code what is an asset when you work solo.
It’s about 12GB of pdf public accessible law data I am trying to handle ( at first ) which is about 35000 files from last 90 years.
So far I automated downloading web pages and pdf documents from them. Extracting data from web pages and saving it to database. Extracting text from pdf files. I have about 5-6 projects to do all of it above maybe at the end I will put it to some workflow manager like Luigi or just run it by cronjob.
First thing for website version 1.0 part is find correlation between all documents inside law text using nlp library by building custom conditions. Then just generate directory structure and html files with links between documents.
Website version 2.0 is already in my mind but it will be creepy to make it and will take at least 1-2 months and I want to publish fast.
I have some pdfs with only images instead of text and tesseract worked quite good with them so maybe I will try to process them when everything go live.
Learned a lot about pdf as now I know that font in pdf is not always providing unicode characters ( stupid form of obfuscation) so when you extract text you need to build glyph vector to text map for every font.
Pdf is full vector representation - just like svg - what is logic if you think a bit and know that some printers are running using postscript.
Let’s hope next update will be about flutter mobile app which started all of shit above. It’s almost ready ( except getting data from api I am trying to do and logo for release version ). It’s last piece of puzzle.3 -
Before Unicode was ruined by millennials
Most systems didn't support it and
Old school expressions were in ascii
See
: - ) ())===========>. 0 - :
Oh the universal experience so many of you people are remembering right now
Merry Christmas5 -
i had a project in a networking class where the provided code was meant to act as a proxy (aka just passing bytes around), but because of the implementation, every byte had to be a valid unicode character
anyway lotta people were frustrated so we asked the course staff and their response was basically "we wanted to support python 2 and 3"
...1 -
Question for Droid gurus here.
Is there any way to use different fonts in android for different languages ?
I have changed ttf to add urdu fonts but then I'm seeing boxes instead of emoji and symbols. Can I add it to urdu only.
On web we have CSS unicode-range in @font-face which sets a boundary for different fonts to be used for different unicode characters/ranges.
Can this be done in android system some way ?
I'm not talking about using it in an app but in whole droid.1 -
I'm creating a bitmap font right now and wanted to automatically generate a image with some text so I can track my progress how it looks. gnome-font-viewer displays it fine, but it'd nothing compared to some real text. Well, how hard can it be?
First attempt: Use ImageMagick to create an image and draw some text. I found a forum post in the ImageMagick forums from 2017 claiming incorrect rendering of BDF fonts, which was promised to be fixed. Yet convert does exactly nothing besides saying “couldn't read font”.
Looking around, there is exactly one tool for the job I'm looking to get done: pbmtext. It works, but doesn't support Unicode. Egh.
Maybe I could write a short script to do it, then? Python's Pillow can import Bitmap fonts (cairo can't). Halfway done I notice it can't deal with anything outside of the character range 0..256.
Using FreeFont directly is out of the question as that seems to be equally much work as creating the font in the first place. I briefly tried SDL, but the font formats it understands are limited.
So how about converting the font then, you ask? Everyone seems to be only concerned about the other way (like OTF to BDF). I tried loading the font into FontForge and exporting an OTF or TTF but couldn't get anything out of it that ImageMagick recognizes as a font.
It seems fucking impossible to render text to an image with an Unicode BDF font in some automated way.
To add insult to injury, my searches containing “bdf” are always interpreted as with “pdf”. I'm not even a Franconian, I can distinguish B and P!4 -
Which keyboard do you use on phone? I got curious because lot is people are using Unicode character, so looking for some support other than copy paste from Unicode app or web page.4