Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "image generation"
-
Wrote a image generation api which uses a headless browser within 2 days ^^ it generates an image like the one below in around 500ms while also loading external resources like that avatar and the icons. ^^2
-
My last post was a year ago. What brought me back here is the ability of AI to agree and apologize to anything and everything, while producing the worst hopeful code.
4 days I wasted, trying to make an android audio visualizer, but AI... sigh.
It gave me the wrong structure of FFT bytes emitted. I corrected it
It gave me the wrong logarithm calc, I corrected it
It gave me the wrong sampling rate, I corrected it.
It gave me the wrong texture order, I corrected it.
It gave me the wrong glsl sample2d, I corrected it.
It gave me the wrong textureID generation, I corrected it.
It gave me a render which was about 10 fps, I found out that instead of using native onDraw, I had a fcking delta time in my shader. I almost corrected it, I gave up
Lets go to code generators with Annotations.
Like always, starts very positive, until I start to correct it.
It gave me the wrong file locations, I corrected it.
It gave me the wrong order of find copy modify and write to .build, I didnt correct it.
It gave me regexes to find annotations. Im like So whats the use of an "ANNOTATION PROCESSOR"
It apologizes and used a fucking regex in the processor,..... I didnt correct it, in the end, I was left with a separate module, targetting iOS Android and JVM, with an annotation processor implemented in jvmMain, which tries to modify commonMain src by finding annotations with regexes, which wont run on app build or app sync project, but only on java -jre command pointing to that fucking .java class in that module, which takes at least 2 mins to run, and Finally generate 0 files.
I needed to rant, I understand LLMs are just models of words built and stolen from the most intelligent and dumbest people out there. But Im an idiot for getting my hopes high. I cant build anything new and unheard of. I used to do that. I once made a textView + image print util for a bluetooth printer just to say FU to libraries and heavy sdks. like literally rasterizing shit to bluetooth packets. I needed to let off some steam. I havent been here in a year so I dont know what reactions I can get from this rant. I bet someone will just say yeah we tired of 'Fuck AI' rants. but shit, it hurts. When I gave up on that visualizer, I downloaded an app, I think its called project M, like in reference to MilkDrop.. like the Winamp Milkdrop. I opened it, played something on spotify, and let my eyes go blind9 -
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.6 -
It's been a while since i stopped programming.....
It's been so busy with all the school work/assignments/ and the most important part is that school ends at 10pm, arrive home at 11pm, prepare for tomorrow school stuff, sleep at 2am, wake up at 7am next morning, and again ends at 10pm 5 days a week...
It is exhausting, but I am getting used to this routine.
Studying my own programming skills or working on a side project? Not sure when to do it... The only way to continue studying is at breaks at school, or sleep less and study....
But it is impossible....
I have some great projects that are waiting to go out to the world, to list a few:
- cloud gaming
- cloud storage with live streaming
- complete school schedule management
- home automation framework in dotnet
- deepfakes and ai image generation algorithm (~18 months of training till now)
- game cheat engine (20GB total omfg ^^)
- and more
and I don't have time to finish it. lol
I think it will see the bright world after 3 years of high school... By then, my projects will be ancient, probably....
TIme is really short.
24 hours equally, but feels like 8 hours a day....
Should I abandon the project rn and focus on studying? (probably should)
or should i sell the project or open source it?
Also, how do you manage your time between work(study) and side projects (especially big ones)?4 -
So I thought to myself.
Hey I'll go ahead and use python, it will make this easier than using c++.
So I start looking at python.
And I start looking at specific common functions that c/c++ and .net all offer.
Like writing a fucking png image.
And I start seeing 3rd party libs that are at version 0.2
And so I say, this is supposedly the language data people love. which would include searching gis data too right ?
Everybody touts this level for ai and machine learning and all this other bullshit but I can't even create a fucking image ? And every document points to this same lib where it comes to creating this image ? at version 0.2 ?? 20 years or more after PNG was created ?
So I look up geotiff, and see 0.4........ so..... what is this language good for again ? I can parse json in javascript and do the other things I want...
Oh scatterplot generation ? What is it being displayed in jpeg ? Maybe the jpeg implementation is good. because you know i just use scatterplots constantly. yup. most of the data I require to analyze uses scatterplots. not risk.
fun.
oh and look django.... who the fuck uses django ?
and omg it makes me format my text or the run bombs.....
jesus. rpg much ?
I'm just... I'm not seeing...
WHY ?????????
and then I have zimmermans voice buzzing in my head about just using goddamn .net26 -
Anyone tried converting speech waveforms to some type of image and then using those as training data for a stable diffusion model?
Hypothetically it should generate "ultrarealistic" waveforms for phonemes, for any given style of voice. The training labels are naturally the words or phonemes themselves, in text format (well, embedding vectors fwiw)
After that it's a matter of testing text-to-image, which should generate the relevant phonemes as images of waveforms (or your given visual representation, however you choose to pack it)
I would have tried this myself but I only have 3gb vram.
Even rudimentary voice generation that produces recognizable words from text input, would be interesting to see implemented and maybe a first for SD.
In other news:
Implementing SQL for an identity explorer. Basically the system generates sets of values for given known identities, and stores the formulas as strings, along with the values.
For any given value test set we can then cross reference to look up equivalent identities. And then we can test if these same identities hold for other test sets of actual variable values. If not, the identity string cam be removed, or gophered elsewhere in the database for further exploration and experimentation.
I'm hoping by doing this, I can somewhat automate the process of finding identities, instead of relying on logs and using the OS built-in text search for test value (which I can then look up in the files that show up, and cross reference the logged equations that produced those values), which I use to find new identities.
I was even considering processing the logs of equations and identities as some form of training data perhaps for a ML system that generates plausible new identities but that's a little outside my reach I think.
Finally, now that I know the new modular function converts semiprimes into numbers with larger factor trees, I'm thinking of writing a visual browser that maps the connections from factor tree to factor tree, making them expandable and collapsible, andallowong adjusting the formula and regenerating trees on the fly.7