Join devRant
Do all the things like
++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatar
Sign Up
Pipeless API
From the creators of devRant, Pipeless lets you power real-time personalized recommendations and activity feeds using a simple API
Learn More
Search - "lora"
-
Programmer's son asks his father:
Dad, why do the sun rises on the east and sets on the west ?
Father: It works ! Do not dare to touch it.4 -
Once Upon a Time.........
Three database SQL walked into a NoSQL bar.
A little while later they walked out Because they could not find a table.1 -
My girlfriend told me to take the spider out instead of killing it.
We went and had some drinks.Cool guy. Wants to be a web developer.3 -
Client is A BIG corporation
Client want an large app and has its own ui guidelines
Client sends design
Client doesn't follow its own guidelines......1 -
New models of LLM have realized they can cut bit rates and still gain relative efficiency by increasing size. They figured out its actually worth it.
However, and theres a caveat, under 4bit quantization and it loses a *lot* of quality (high perplexity). Essentially, without new quantization techniques, they're out of runway. The only direction they can go from here is better Lora implementations/architecture, better base models, and larger models themselves.
I do see one improvement though.
By taking the same underlying model, and reducing it to 3, 2, or even 1 bit, assuming the distribution is bit-agnotic (even if the output isn't), the smaller network acts as an inverted-supervisor.
In otherwords the larger model is likely to be *more precise and accurate* than a bitsize-handicapped one of equivalent parameter count. Sufficient sampling would, in otherwords, allow the 4-bit quantization model to train against a lower bit quantization of itself, on the theory that its hard to generate a correct (low perpelixyt, low loss) answer or sample, but *easy* to generate one thats wrong.
And if you have a model of higher accuracy, and a version that has a much lower accuracy relative to the baseline, you should be able to effectively bootstrap the better model.
This is similar to the approach of alphago playing against itself, or how certain drones autohover, where they calculate the wrong flight path first (looking for high loss) because its simpler, and then calculating relative to that to get the "wrong" answer.
If crashing is flying with style, failing at crashing is *flying* with style.15 -
The next step for improving large language models (if not diffusion) is hot-encoding.
The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.
Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.
Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.
Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.
Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.
In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.
I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.6 -
In programming world there is lot of stuff to learn and there is lot to great developers in the world after seeing code and project's of these developers I feel I am very weak in coding currently my confident is quite low cause I cannot make a simple project by my self without seeing a project tutorial video and I don't know how do I improve my dev skills and I feel stuck any suggestions?15
-
I wonder if anyone has considered building a large language model, trained on consuming and generating token sequences that are themselves the actual weights or matrix values of other large language models?
Run Lora to tune it to find and generate plausible subgraphs for specific tasks (an optimal search for weights that are most likely to be initialized by chance to ideal values, i.e. the winning lottery ticket hypothesis).
The entire thing could even be used to prune existing LLM weights, in a generative-adversarial model.
Shit, theres enough embedding and weight data to train a Meta-LLM from scratch at this point.
The sum total of trillions of parameter in models floating around the internet to be used as training data.
If the models and weights are designed to predict the next token, there shouldn't be anything to prevent another model trained on this sort of distribution, from generating new plausible models.
You could even do task-prompt-to-model-task embeddings by training on the weights for task specific models, do vector searches to mix models, etc, and generate *new* models,
not new new text, not new imagery, but new *models*.
It'd be a model for training/inferring/optimizing/generating other models.4 -
been looking to integrate some sort of new thing with a raspberry pi because I'm bored. I was thinking about trying out LoRa transceivers but it would be a waste of money because I don't know what I would use it for. I have some knowledge of servers and such but I don't have any ideas on what to use a RPI for. any ideas?4
-
I am aiming for google, I have some questions regarding to interview process, I choose to proficient myself into development world and data structure and algorithms, But I have almost 0 skills in competitive programming and I don't have any ranking all competitive programming platforms but I really want to work for google, How do I fulfill my goal to work for google and how do I clear google interview process?2
-
Hey! I want to create a note app where multiple user can work on same note in real time. I want anybody can become my partner into this project this is my github link.
https://github.com/priyanshuSharma-...
Tech stack-
MERN10 -
I have question related to hardware -
I recently order a M.2 SSD on amazon but I don't know is it compatible or not with my system and I order it can anybody help me I am really apricate it.
My SSD link on amazon -
https://amazon.in/gp/product/...
My system link -
https://amazon.in/Lenovo-Ideapad-33... -
CPP, python, javascript
Which one is best to learn DSA?
I'm really confused which one should I choose?8 -
Anyone knows any high power IoT LoRa boards? I have tried the adafruit and heltech ones, but didn't do much. I need it for a low permittivity environment.7