Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Related Rants
Fuck my life! I have been given a task to extract text (with proper formatting) from Docx files.
They look good on the outside but it is absolute hell parsing these files, add to these shitty XML human error and you get a dev's worst nightmare.
I wrote a simple function to extract text written in 'heading(0-9)' paragraph style and got all sorts of shit.
One guy used a table with borders colored white to write text so that he didn't have to use tabs. It is absolute bullshit.
undefined
nlp
.docx
ooxml
microsoft
ms word