Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "stt"
-
Several minutes waiting for site to work after clicking on "required cookies only". Is this really what privacy laws were aiming for?19
-
QA personal voice assistant that runs locally without cloud, it’s like never ending project. I look at it from time to time and time pass by. Chat bots arrived, some decent voice algorithms appeared. There is less and less stuff to code since people progress in that area a lot.
I want to save notes using voice, search trough them, hear them, find some stuff in public data sources like wikipedia and also hear that stuff without using hands, read news articles and stuff like that.
I want to spend, more time for math and core algorithms related to machine learning and deep learning.
Problem is once I remember how basic network layers, error correction algorithms work or how particular deep learning algorithm is constructed and why is that, it’s already a week passed and I don’t remember where I started.
I did it couple of times already and every time I remember more then before but understanding core requires me sitting down with pen and paper and math problems and I don’t have time for that.
Now when I’m thinking about it - maybe I should write it somewhere in organized way. Get back to blogging and write articles about what I learned. This would require two times the time but maybe it would help to not forget.
I’m mostly interested in nlp, tts, stt. Wavenet, tacotron, bert, roberta, sentiment analysis, graphs and qa stuff. And now crystallography cause crystals are just organized graphs in 3d.
Well maybe if I’m lucky I retire in the next decade or at least take a year or two years off to have plenty of time to finish this project. -
School stuff. Teachers have loads of pdfs and they're just giving them to us on Google Classrooms to do in our books. Maybe a tool to transform those into proper forms + marking.
Also, a revision app that scans your notes and tests you (TTS and STT).
Also, my blog and DevRant2 -
Now I know why no one uses the google cloud. Making TTS and STT working costed me the whole night. Gemini was easy tho. But fuck google, you costed me a lot of energy. You guys are crazy. Now my api connects in a magic way i don't even understand with the gcloud cli app. The rest of my application is totally rest, don't use much of the google library.
I implemented google TTS and STT into ChatGPT. I use for somethings google because it's cheaper. It works using a JBL Go! speaker. I can just turn it on and start chatting with it. I implemented google search and gave it a memory. It can remember numbers for me. It accepts dutch and english. I can say 'google' and google is the main action. It will fetch results from google and uses gpt to summary the results. It works perfect! BUT FUCKING AI. I want to know the color of the hair from Mona Lisa. Not freaking Ona Isa! I send it literally correct. The speechtotext works great. But fucking API with it's reading. Pathethic. How far is AI? Barely usable as home assistant. So far - besides auto completion and giving code snippets / concepts AI is freaking useless. You need more patience for AI than a kid.
I hope the inventor of oauth2 dies alone. He should.11 -
Several years ago I spent over two months working out how to integrate Text To Speech and Speech To Text (TTS/STT) into any windows program I wrote in Delphi, originally for a powerful flat-file search engine. Does anyone know if TTS/STT is useful on windows 10+ or have any use?
I was thinking about redeveloping the search engine into a stand alone program which can be used as a fast and light query tool with trigger functions, it can be made into a "reply bot" or used with a server like Apache, but without the old IBM mainframe mentality being readopted as "AI" and "social media" everywhere today. low-level Independent and secure droid like systems sound more fun to develop. -
I have made an interactive talking AI but it's not open source. It contains passwords/keys and tasks that are personal. But, a lot was learned while the code is nearly nothing. I spend many hours on research and didn't want to let it go to waste.
If you are interested in TTS / STT, this will be a nice resource: https://molodetz.nl/retoor/...
Side note: the builtin webkit TTS/STT engine is maybe even better and has a great API! Amazed by the quality of that thing.
This is python research. I hope that I can motivate someone but devRant is always empty on Saturday.
If someone needs help with an implementation regarding this, you know where to find me.5 -
Anybody know about a good open source speech to text engine?
I googled but there are tons of them and I don't have much time right now to try each them of out
What I actually want is just to convert the audio (in English) to text and would also want to note the time those sentences were spoke in the audio like a subtitle file.7 -
I'm dreaming about an assistant system, which is omnipresent.
Popping up on screens in public, sitting in many ear listening to my speech, with my personal feed off contextbased information.
Like a juxtaposition of Wikipedia, Facebook, Twitter, etc...
I don't want to pick my device from my pocket to type in a search or to push a button and say "ok buddy".
It just have to be omnipresent and focused on my requests.1