Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "ocr"
-
These "math question" captchas are really stupid.
It's not even an image that has to be OCR-ed, it's just plaintext. Why can't these people understand a captcha is supposed to be something only a person can do? This is math. Computers are amazing at math.23 -
I realize I've ranted about this before, but...
Fuck APIs.
First the fact that external services can throw back 500 errors or timeouts when their maintainer did a drunk deploy (but you properly handled that using caching, workers, retry handlers, etc, right? RIGHT?)...
Then the fact that they all speak a variety of languages and dialects (Oh fuck why does that endpoint return a JSON object with int keys instead of a simple array... wait the params are separated with pipe characters? And the other endpoint uses SOAP? Fuck I need to write another wrapper class around the client...)
But the worst thing: It makes developers live in this happy imaginary universe where "malicious" is not a word.
"I found this cloud service which checks our code style" — hmm ok, they seem trustworthy. Hope they don't sell our code, but whatever.
"And look at this thing, it automatically makes database backups, just have to connect to it to DigitalOcean" — uhhh wait...
"And I just built this API client which sends these forms to be OCR processed" — Fuck... stop it... there are bank accounts numbers on those forms... Where's that API even located? What company?
* read their privacy policy *
"We can not guarantee the safety of your personal data, use at your own risk [...] we are located in Russia".
I fucking hate these millennial devs who literally fail to get their head out of the cloud.
Somehow they think it's easier to write all these NodeJS handlers and layers around some API, which probably just calls ImageMagick + Tesseract on the other side.
If I wasn't so fucking exhausted, I'd chop of their heads... but they're like hydra, you seal one privacy breach and another is waiting to be merged, these kids just keep spewing their crap into easy packages, they keep deploying shitty heroku apps... ugh.
😖8 -
I wrote an app that tells me if a lottery ticket is winning. It takes a picture of the ticket, does OCR, finds the number lines and compares them with a remote json.
I live next door to a lottery shop.9 -
OCR (The exam board for my course) are fucking thick in the head when it comes to anything computing.
- I get a mark or two for saying open source software is worse than thier propritary counterparts
- ALL open source software forks must also be make open source. They spend so much time going over the legal stuff BUT HAVE NEVER HEARD OF OPEN SOURCE LICENCING!
- One exam paper had a not gate picture with 2 inputs...
- I have to differentiate between portable and handheld! YOU MEAN HANDHELD DEVICES ARE NOT PORTABLE!?!!?!?
- In level 2 education, OCR say 1 MB = 1024 KB - In level 3, they say 1 MB = 1000 KB, and 1 MiB = 1024 KiB, and expect you to differentiate. Why do you expect the wrong answer in level 2!?
- INFORMATION FORMATS AND STYLES ARE COMPLETELY DIFFERENT THINGS! If you look up synonyms for "style", "form" is there, and if you look up synonyms for "format", "style" is there.
- When asked for storage devices, I have to say "smartphone", "tablet", "desktop PC" - I mean yeah they store data but when you ask me for storage devices I will say "hard disk drive", "solid state drive", "SD card", etc. >.>
I could probably go on an on about this...
I sure do love being asked to copy-paste existing HTML/JS/CSS and being asked to just tweak it here and there, and then wait for other people's incompetence in copy-pasting... I sure do love being stuck with this sort of "education" ._.4 -
Won the 2nd prize in a Microsoft hosted hackathon. No for Windows but they really have good cognitive services. Used Azure vision api, one of the good ocr service available.2
-
I think the coolest all-nighter I can remember is when me and one of my best friends were still in school. We were up all night figuring out what to make. At the time we played a little browser click game, so we came up with the idea of creating a bot for it.
We're both PHP developers, but we figured that wouldn't be an appropriate language to write a bot in. So we went for C#. Both of us never worked with it.
At the end of the night we built a fully functioning bot, that could continue playing the game when we were at school. It could do all our manual tasks and could even decode Captchas with the Google OCR package.
That night was productive. -
Client wanted to send us e-mail addresses.
Client sent an image inside a Word document showing a list of e-mail addresses.
Luckily ShareX has an OCR feature.1 -
Me: why are we paying for OCR when the API offers both json and pdf format for the data?
Manager: because we need to have the data in a PDF format for reporting to this 3rd party
Me: sure, but can we not just request both json and PDF from the vendor (it’s the same data). send the json for the automated workflow (save time, money and get better accuracy) and send the PDF to the 3rd party?
Manager: we made a commercial decision to use PDF, so we will use PDF as the format.
Me: but ...4 -
My company just acquired another company from some losers.
Gotta load their pittance database onto our thing.
Their entire "Technology Department" is one old fart.
One even older fart runs their accounting.
I asked the IT boomer for their accounting data.
He tells me to get the head accountant.
The head accountant says they do not have any historical accounting data.
I threaten to call the (equivalent of the) IRS on them.
They give up, admit that they do have some historical data. But they attempt to pull a "malicious compliance" on me, send me a pallet full of old receipts, on paper.
I do what I have done one hundred times before, I go to the closest community college (equivalent) and ask/bribe a teacher to offer the most trustworthy kids some pretty pennies to scan all those files for me.
A dozen of them barely took a week to do it using their not-so-bad camera phones.
It all for about the same price as a couple of older-but-still-good iPhones.
Then it's on to some simple OCR and data normalization tasks.
This morning I had another meeting with the losers, the first since I told them their "data" had just arrived in the mail (but a couple weeks after that). They log in for the meeting all smug, thinking we would ask for more time to load their data, and it would be my team's fault for any delays.
Then the regional business evaluator logs in and said he reviewed their financials yesterday and we have a lot to talk about.
I will remember their "just got punched in the gut" faces forever :)7 -
Day 2 in ComSci class (following my last rant)
"Okay, so! All of the schoolwork and homework will be done on paper and pen, submit and I will grade it. Only once, no second chance"
Okay. Okay. This went over my head. What are you gonna do? OCR the code into the compiler, compile it and run to see if we fucked up to give us an F? What are you, god? Here's a brilliant idea, teach them Assembly! Guaranteed error to give us Fs! FUCK YOU3 -
Just another big rant story full of WTFs and completely true.
The company I work for atm is like the landlord for a big german city. We build houses and flats and rent them to normal people, just that we want to be very cheap and most nearly all our tenants are jobless.
So the company hired a lot of software-dev-companies to manage everything.
The company I want to talk about is "ABI...", a 40-man big software company. ABI sold us different software, e.g. a datawarehouse for our ERP System they "invented" for 300K or the software we talk about today: a document management system. It has workflows, a 100 year-save archive system, a history feature etc.
The software itself, called ELO (you can google it if you want) is a component based software in which every company that is a "partner" can develop things into, like ABI did for our company.
Since 2013 we pay ABI 150€ / hour (most of the time it feels like 300€ / hour, because if you want something done from a dev from ABI you first have to talk to the project manager of him and of course pay him too). They did thousand of hours in all that years for my company.
In 2017 they started to talk about a module in ELO called Invoice-Module. With that you can manage all your paper invoices digital, like scan that piece of paper, then OCR it, then fill formular data, add data and at the end you can send it to the ERP system automatically and we can pay the invoice automatically. "Digitization" is the key word.
After 1.5 years of project planning and a 3 month test phase, we talked to them and decided to go live at 01.01.2019. We are talking about already ~ 200 hours planning and work just from ABI for this (do the math. No. Please dont...).
I joined my actual company in October 2018 and I should "just overview" the project a bit, I mean, hey, they planned it since 1.5 years - how bad can it be, right?
In the first week of 2019 we found 25 bugs and users reporting around 50 feature requests, around 30 of them of such high need that they can't do their daily work with the invoices like they did before without ELO.
In the first three weeks of 2019 we where around 70 bugs deep, 20 of them fixed, with nearly 70 feature requests, 5 done. Around 10 bugs where so high, that the complete system would not work any more if they dont get fixed.
Want examples?
- Delete a Invoice (right click -> delete, no super deep hiding menu), and the server crashed until someone restarts it.
- missing dropdown of tax rate, everything was 19% (in germany 99,9% of all invoices are 19%, 7% or 0%).
But the biggest thing was, that the complete webservice send to ERP wasn't even finished in the code.
So that means we had around 600 invoices to pay with nearly 300.000€ of cash in the first 3 weeks and we couldn't even pay 1 cent - as a urban company!
Shortly after receiving and starting to discussing this high prio request with ABI the project manager of my assigned dev told me he will be gone the next day. He is getting married. And honeymoon. 1 Week. So: Wish him luck, when will his replacement here?
Deep breath.
Deep breath.
There was no replacement. They just had 1 developer. As a 40-people-software-house they had exactly one developer which knows ELO, which they sold to A LOT of companies.
He came back, 1 week gone, we asked for a meeting, they told us "oh, he is now in other ELO projects planned, we can offer you time from him in 4 weeks earliest".
To cut a long story short (it's to late for that, right?) we fought around 3 month with ABI to even rescue this project in any thinkable way. The solution mid February was, that I (software dev) would visit crash courses in ELO to be the second developer ABI didnt had, even without working for ABI....
Now its may and we decided to cut strings with ABI in ELO and switch to a new company who knows ELO. There where around 10 meetings on CEO-level to make this a "good" cut and not a bad cut, because we can't afford to scare them (think about the 300K tool they sold us...).
01.06.2019 we should start with the new company. 2 days before I found out, by accident, that there was a password on the project file on the server for one of the ELO services. I called my boss and my CEO. No one knows anything about it. I found out, that ABI sneaked into this folder, while working on another thing a week ago, and set this password to lock us out. OF OUR OWN FCKING FILE.
Without this password we are not able to fix any bug, develop any feature or even change an image within ELO, regardless, that we paid thausend of hours for that.
When we asked ABI about this, his CEO told us, it is "their property" and they will not remove it.
When I asked my CEO about it, they told me to do nothing, we can't scare them, we need them for the 300K tool.
No punt.
No finish.
Just the project file with a password still there today6 -
I hate these idiots that post source code examples as an image just so they can keep their cool highlighting and style. How the fuck am I supposed to test that without re-typing the whole thing myself? Ever try OCR on source code? Not too great, is it.12
-
Guys what I want to know is how do you secure your code so that they pay you after you deliver the code to them?
So recently I was in this internship that I secured with an over-the-phone interview and the guy who was contacting me was the CEO of the company (I'm going to refer to him as "the fucking cunt" from now on). He asked me to do some OCR and translations and I managed to write a few scripts that automate the entire process. The fucking cunt made me login remotely to his desktop which was connected to the server (who the fuck does that) and I had to operate on the server from his system. I helped him with the installation and taught him how to use the scripts by altering the parameters and stuff, and you know what the fucking cunt did from the next day onward? Dropped contact. Like completely. I kept bombing emails upon emails and tried calling him day after day, the fucking cunt either picked up and cut the call immediately on recognising its me or didn't pick up at all. And the reason he wasn't able to pay me was, and I quote, "I am in US right now, will pay you when I get back to India." I was like "The fuck was PayPal invented for?" Being the naive fool that I was, I believed him (it was my first time) and waited patiently till the date he mentioned and then lodged a complain in the portal itself where he had posted the job initially. They raised a concern with the employer and you know what the fucking cunt replied? "He has not been able to achieve enough accuracy on the translations". Doesn't even know good translation systems don't exist till date ( BTW I used a client for the google translate API). It has been weeks now and still the bitch has not yet resolved the issue.And the worst part of it was I got a signed contract and gave him a copy of my ID for verification purposes.
I'm thinking of making a mail bomb and nagging him every single day for the rest of his life. What do you guys think?7 -
I gave my project manager a prototype of the ocr app we're developing to play with, just for fun. The next morning, I enter the office to see this along with a well structure spreadsheet with some 40 columns.
Never underestimate the seriousness of a project manager.2 -
Soo my dad has a food printer he uses to print edible images on cakes our customers order. The food printer needs to run at least once a week (regularly) to kinda guarantee not to get fucked up with its ink, as that can damage the printer when it's dry. My dad though doesn't have regular orders...
The printer has a standard function to test all colors.
My dad asked me how this task could be managed regularly, as I'm the IT guy 🙄. His idea was to log all the dates on paper.
Now I'm trying to automate this task via Windows so we don't have to care about papers to manually log when the next test must run. On Windows the printer settings can be accessed to run this color check.
... I've got a feeling this will be another one of those tasks that I will overengineer over the top😅. I've already done my research with automated batch jobs (never done batch before) but the normally proposed code for a "Düsentestmuster", so the color check, prints a different overview I was not expecting, which doesn't fit the purpose.
Now I'm here and, as I currently see no way of simplifying it, I have to kinda simulate a person that opens these settings and runs this check. With Python, pyautogui and Tesseract OCR, to prevent the program from clicking anything wrong. Although I'm sure there should be an easier way for this, I haven't found it, so I guess I have to proceed on this path and take the experience I gain as a bonus...10 -
I used to think that I had matured. That I should stop letting my emotions get the better of me. Turns out there's only so much one can bottle up before it snaps.
Allow me to introduce you folks to this wonderful piece of software: PaddleOCR (https://github.com/PaddlePaddle/...). At this time I'll gladly take any free OCR library that isn't Tesseract. I saw the thing, thought: "Heh. 3 lines quick start. Cool.", and the accuracy is decent. I thought it was a treasure trove that I could shill to other people. That was before I found out how shit of a package it is.
First test, I found out that logging is enabled by default. Sure, logging is good. But I was already rocking my own logger, and I wanted it to shut the fuck up about its log because it was noise to the stuffs I actually wanted to log. Could not intercept its logging events, and somehow just importing it set the global logging level from INFO to DEBUG. Maybe it's Python's quirk, who knows. Check the source code, ah, the constructors gaves `show_log` arg to control logging. The fuck? Why? Why not let the user opt into your logs? Why is the logging on by default?
But sure, it's just logging. Surely, no big deal. SURELY, it's got decent documentation that is easily searchable. Oh, oh sweet summer child, there ain't. Docs are just some loosely bundled together Markdowns chucked into /doc. Hey, docs at least. Surely, surely there's something somewhere about all the args to the OCRer constructor somewhere. NOPE! Turns out, all the args, you gotta reference its `--help` switch on the command line. And like all "good" software from academia, unless you're part of academia, it's obtuse as fuck. Fine, fuck it, back to /doc, and it took me 10 minutes of rummaging to find the correct Markdown file that describes the params. And good-fucking-luck to you trying to translate all them command line args into Python constructor params.
"But PTH, you're overreacting!". No, fuck you, I'm not. Guess whose code broke today because of a 4th number version bump. Yes, you are reading correctly: My code broke, because of a 4th number version bump, from 2.6.0.1, to 2.6.0.2, introducing a breaking change. Why? Because apparently, upstream decided to nest the OCR result in another layer. Fuck knows why. They did change the doc. Guess what they didn't do. PROVIDING, A DAMN, RELEASE NOTE. Checked their repo, checked their tags, nothing marking any releases from the 3rd number. All releases goes straight to PyPI, quietly, silently, like a moron. And bless you if you tell me "Well you should have reviewed the docs". If you do that for your project, for all of your dependencies, my condolences.
Could I just fix it? Yes. Without ranting? Yes. But for fuck sake if you're writing software for a wide audience you're kinda expected to be even more sane in your software's structure and release conventions. Not this. And note: The people writing this, aren't random people without coding expertise. But man they feel like they are.5 -
I never felt this satisfied in my entire life,
So I was working on an open-source org where people can come and read books online for free. But they were facing the challenge of making books text selectable with the mouse pointer. But the problem was that their website renders scanned images of the books so it is impossible to select text from it.
So I solved this problem by building a small prototype that could do it. All of the books that they have in their database are having XML files associated with them which contains the coordinates of each word. So the logic was simple - select a rectangular region to pass its coordinates and check whether the coordinates of a word are lying in that rectangular region or not and display them. This trick is helpful because most of the OCR generates a similar XML file.
So if you wish to use this prototype for your own projects - you can check my GitHub repository https://github.com/ishank-dev/...
please star it if you like. -
Who knew running OCR on a 70 page scanned document would take 42 minutes and eat nearly all your RAM? RIP server 🤣🤣🤣1
-
iOS is rotting my soul.
I've been a user of iPhone for 6 years now. For the first couple years, I wasnt really mindful of software I use, or I guess I didnt really care. As long as it did the bare minimum, I.e. bank app, call, text, browse, watch youtube vids, I didnt really care. However, in the last couple years, ive become very interested in tech and have worked on small developer projects, spent a lot of time coding in my free time, found really inspiring software and apps on my regular computer that just blow my mind on how advanced they are, and how I, some dumb guy with internet access, can just download it on my PC and use it.
This led me into a kind of software honeymoon phase, where I created a shiny new Github account and started exploring what other cool tools are just out there, available to me for free. My software honeymoon was spent on the beaches and resorts of the open-source software ecosystem. Exploring the gem-bearing caves and beautiful forests of anything from free open-source OCR programs(I needed it to convert my dads manuscript from scanned PDF .jpeg's to actual UTF8 text) to open-source RGB lighting/keymapping software to escape the memory-and-CPU-hungry(and most likely advertising-ID-interested) proprietary software that comes with the brand of mouse/keyboard/controller/etc.
It was like I was a kid exploring Disneyland for the first time or something. But then... then... I got off my computer. Picked up my phone to check notifications. Ew, tinder is blowing up notification center with marketing shit. I go to settings. Notification settings. Tinder's at the bottom so I just want to use a search bar instead of scrolling. There's no search bar. Minor inconvenience. Dark mode isnt dark enough for me. I guess thats just too damn bad, because for the next two hours, I'll have to figure it out by messing with accessibility settings. Time for bed, and I'm just getting plum tired of having to turn on my alarms every night for work the next morning. So I used the 'Automations' app to do it for me. For the next two weeks, at the time specified, 'There was an error running your automation' until I just delete the automation. Browsing through the FaceID settings, I see 'Attention Aware Features'. Cool, maybe now my phone won't automatically dim the screen when im in the middle of reading notifications on my lock screen. Haha, nope still does it. After turning on my alarms, I go to sleep. I wake up an hour late for work because those handy 'Attention Aware Features' silenced my alarm immediately because I fell asleep watching a youtube video.
I could go on and on. Its actually making me feel depressed typing this on my phone, fighting with Apple's primitive autocorrect and annoying implementation of Swype to type.4 -
Fave IDE: Rube-Goldberg Distributed Physical Editor (RGDPE)
- 3x5 note cards, rite aid brand
- pilot rolling ball gel pen
- white out
- a scanner with OCR, email
- a raspberry pi running a local email server and dns
- a raspberry pi running an SMTP receiver and language service and a handler to invoke the compiler
- a speak and spell to print out the language service results
Why: why not?3 -
My new project: a camera sends an image of the electricity clock to a server that does ocr and submits the value to the electricity company on the 5th of every month
Current progress: spent 4 hours trying to get emails to work in scala when i found on an obscure forum that you have to enable insecure app access in your gmail to use smtp13 -
I'm mostly .NET Dev, working on OCR thingy, but I started as Java, Android Dev. After my boss's crappy management and burning out our two mobile devs he has assigned me to finish one app. For past four days I've worked around the clock to finish as much of functionalities as I could but it simply wasn't possible, especially because project was still changing when though deadline was around 15.12.17. Yesterday I've done as much as I could and now we have to wait for the client to either accept it or break the contract.
To be frank, I think that losing money would be like a bucket of cold water for my boss. All of us, me and those two mobile devs I have mentioned earlier, are students. We have exams right now. "Senior" Dev is only year older and will soon be applying for his engineering degree. Year after year situation like this occurs and boss haven't learn a thing.1 -
I CAN'T TAKE IT ANYMORE. We brought on a vendor to provide us some fancy OCR tool. It barely works. It's been barely working. And the vendor is so adamant it works it's so difficult to get them to send people onsight to work with us. We complained about it to the exec and we got "Oh they're a tech startup. You have to help them along with developing their process blah blah blah". Well they don't want help we offered they keep existing their shit is top secret (and it works). When they make changes remotely it's like they blindly make a change and then throw it to us to test. When we can get them to come in they hang around till the problem is fixed (more than once we've had to tell them how to fix it.) and they fly as fast as they fucking can through the door. A guy on my team even built something similar backed by Azure but we were given directive to work with them. And now we're getting pressure about delays in launch. But it's not our fault. The vendors asshole lying CEO keeps making shit up and we're told to work with it. Yet it's our fault that we missed deadlines? fuck this place !!! fuck all of this !!!6
-
Fuck Apache TIKA.
Its supposed to be a "universal file reader" or some shit. Im trying to use it as a PDF/image parser that does OCR when needed and yelds a full-file string. It does so, but the text ends up being IN THE WRONG FUCKING ORDER.
WTF would I want to parse the text out of a PDF in any order that is not the one the text is supposed to be read?!?!
"It is more efficient to work in random ordering", says the docs. No shit, really? Wouldn't it be even more efficient to just spit out random strings? Just as useful and 100% CPU-bound.
"You can add a property to forcefully put the text in the right order". THEN WHY THE FUCK IT IS NOT THE DEFAULT SETTING?
Srsly, what's the use case to a parser that yields scrambled text?!?1 -
Didn't know Android had such a Feature.
I know it's possible using OCR tools, but still, it's amazing.
https://i.imgur.com/EwEQDnK.gif10 -
Story of a first-time hackathon.
So, I took part in the COVID-19 Global Hackathon.
Long story short, I got excited at OCR and just went with the most challenging challenge - digitizing forms with handwritten text and checkboxes, ones which say whether you have been in contact with someone who could have Coronavirus.
And, unsurprisingly, it didn't work within 4 days. I joined up with 2 people, who both left halfway through - one announced, one silently - and another guy joined, said he had something working and then dissapeared.
We never settled on a stack - we started with a local docker running Tesseract, then Google Cloud Vision, then we found Amazon Textract. None worked easily.
Timezone differences were annoying too. There was a 15-hour difference across our zones. I spent hours in the Slack channel waiting.
We didn't manage the deadline, and the people who set the challenge needed the solution withing 10 days, a deadline we also missed. We ended up with a basic-bitch Vue app to take pictures with mock Amazon S3 functionality, empty TDD in Python and also some OCR work.
tbh, that stuff would've worked if we had 4 weeks. I understand why everyone left.
I guess the lesson from this is not to be over-ambitious with hackathons. And not to over-estimate computers' detection abilities.rant covid hackathon slack s3 google cloud vision python tdd aws tesseract textract covid-19 global hackathon2 -
I would have to say this online OCR software that I was forced into and expected to build for medical documents. The problem was the scanned documents were so unreadable, crooked, and in dot matrix style, so there was really no way to do this.4
-
The ocr a level in the UK is properly messed up - it's beyond outdated and irrelevant, with very little programming involved. The GCSE is even worse - this year they literally removed all programming (coursework) - like how is that supposed to teach you anything relevant? The GCSE from the year before was much more relevant, though still not perfect, as it had much more of a focus on programming and development. But hey, what can you do? The education system will do what it wants. All we need is to get people from the industry to create exams and the syllabus, to help ensure they are more relevant. I ranted on a bit but hey, hopefully we can change it for the future generations, as I find there are very few kids interested in programming these days. Here's to change
-
Hey guys I've a problem I've been trying to solve for a while. Also I'm a college student so my knowledge isn't going to be the greatest so go easy on me if it's simple to solve😂. So I'm creating a real time licence plate detector using yolo lite, my own deep learning ocr and plan to add the model to fast api. So as an input to the rest api, the user will submit a IP camera link for openCV to get individual frames for preprocessing before yolo predictions. The problem I have is how to I handle multiple real time IP camera feeds at once?. Ive been researching multi threading but read that it can cause issues with async definitions in fast api. Any advice will be greatly appreciated and if more information is needed just shout!.
-
So, there is this one client, who wants a website to be made for his hardware shop, and wants the inventory display and has given me their brochure's PDF and that fucking PDF contains Images and no text and he fucking expects me to write that shit down >:(
Tried all techniques to get text from the brochure , parser , OCR , everything.
None worked.
And the PDF is 100 pages long and I'm dire need of money .
FML :(8 -
Me doing ops analysis...
Day1: No way! How the fu
Day2: Nah, someone else will do this...
Day3: Fuck! Why dont we have a Legal Dept
Day4: Okay Im just gonna run an ml to find irregularities
Day5: These scans cannot be ocr-ed...
Day6: br-brute force
2 months later: so there's a problem regarding the Express Contractual Remedies of Contract A and Amendment C... -
People say using GPT4 as an OCR is not a good idea. But damn that formatting GPT4 vision does, is outstanding.. and I have realised proper formatting does well while prompting to get precise output.
I gotta say, test for ur usecases rather than relying on expert opinion blogs! -
Oh man, why is there no good api for ocr in PDFs? Once you are searching for this kind you will only find some kind of tesseract.
Why doesn't have Amazon an api for this???3 -
I love the flexibility and power you get with Wordpress, WooCommerce and its entire plugin ecosystem...
BUT FUCK ME! PHP IS SHIT!!
It's like writing code by hand with pen and paper, putting it through an OCR and then compiling it. Sure, it might work if you're lucky and maybe even look cool, but good luck trying to develop a sulution with any sort of speed!3 -
I want to make a program that interfaces to a game window and plays the game for me :P
I could do it.
with ocr and object recognition !11 -
I will build something that aids me on digitalizing expensive out-of-print books, What 4K DSLR do you know that is programmer friendly? Asking because I would like to interface it with my PC via USB and use custom-made software to control it, take pictures and (maybe) get send the pictures over the connection to my PC.
Maybe I could work better with an Android phone, tho, but I would prefer an DSLR.16 -
Is there a (F)OSS solution for a self hosted document management system that includes ocr, text based search on all documents and a web ui? So far my research hasn't been very successful, maybe someone has a hint? I've thought about building one myself. Then again someone else must've already built something similar, right?2
-
Need help with selecting a proper backend and website frameworks. After trying out a couple identity verification service providers we were dissapointed with their lack of support (takes weeks to do minimal changes).
So now we are having discussions about building in-house id verification system. We already have libraries for ios/android apps (ZOOM lib for face recognition and another lib for data extraction via OCR from document picture). So what we need is a proper backend and then a decent web framework with proper ux/ui design for our web/ios/android apps.
Currently thinking what kind of backend framework should we choose? Backend's main responsibility is for each client registered from website to assign an api key and to create a database/storage where his users would authenticate via clients app and upload a picture and a video.
Also wondering what kind of framework for website apps (main web app, dashboard app where we display pending verifications, and of course verification app) to choose. Should be go for angular? -
Obviously ai and autodocument recognition and data extraction is not usable yet
Excepting when it's a pdf not a scanned document or image
Ocr may be but shift the whole.image or bend it or remove a border from some white out
And then handwritten