devRant - A fun community for developers to connect over code, tech & life as a programmer

Search - "lora"

55

dark-Lord

105

6y

Programmer's son asks his father:
Dad, why do the sun rises on the east and sets on the west ?

Father: It works ! Do not dare to touch it.

joke/meme

4
36

Lord-Squirrel

64

8y

Current level at Microsoft

undefined windows

4
23

dark-Lord

105

6y

Once Upon a Time.........
Three database SQL walked into a NoSQL bar.
A little while later they walked out Because they could not find a table.

joke/meme

1
16

dark-Lord

105

6y

My girlfriend told me to take the spider out instead of killing it.
We went and had some drinks.Cool guy. Wants to be a web developer.

joke/meme

3
13

Lord-Squirrel

64

8y

Client is A BIG corporation
Client want an large app and has its own ui guidelines
Client sends design
Client doesn't follow its own guidelines......

undefined design ui

1
12

error-lord

13

5y

If you can run Windows, believe me you're gonna make the Linux fly!

joke/meme dev coder linux windows programming os developer joke devmeme run fly

6
11

Lord-Squirrel

64

8y

*Try installing ruby on rails on windows*

undefined ruby on rails

5
9

Wisecrack

9365

2y

New models of LLM have realized they can cut bit rates and still gain relative efficiency by increasing size. They figured out its actually worth it.

However, and theres a caveat, under 4bit quantization and it loses a *lot* of quality (high perplexity). Essentially, without new quantization techniques, they're out of runway. The only direction they can go from here is better Lora implementations/architecture, better base models, and larger models themselves.

I do see one improvement though.
By taking the same underlying model, and reducing it to 3, 2, or even 1 bit, assuming the distribution is bit-agnotic (even if the output isn't), the smaller network acts as an inverted-supervisor.

In otherwords the larger model is likely to be *more precise and accurate* than a bitsize-handicapped one of equivalent parameter count. Sufficient sampling would, in otherwords, allow the 4-bit quantization model to train against a lower bit quantization of itself, on the theory that its hard to generate a correct (low perpelixyt, low loss) answer or sample, but *easy* to generate one thats wrong.
And if you have a model of higher accuracy, and a version that has a much lower accuracy relative to the baseline, you should be able to effectively bootstrap the better model.

This is similar to the approach of alphago playing against itself, or how certain drones autohover, where they calculate the wrong flight path first (looking for high loss) because its simpler, and then calculating relative to that to get the "wrong" answer.

If crashing is flying with style, failing at crashing is *flying* with style.

random ml . chatgpt diffusion machine learning

15
7

lord

7

4y

Hello!!I'm new here wassup

random

11
6

ultra-lord

14

5y

In programming world there is lot of stuff to learn and there is lot to great developers in the world after seeing code and project's of these developers I feel I am very weak in coding currently my confident is quite low cause I cannot make a simple project by my self without seeing a project tutorial video and I don't know how do I improve my dev skills and I feel stuck any suggestions?

rant javascript nodejs reactjs

15
6

Wisecrack

9365

2y

The next step for improving large language models (if not diffusion) is hot-encoding.

The idea is pretty straightforward:
Generate many prompts, or take many prompts as a training and validation set. Do partial inference, and find the intersection of best overall performance with least computation.

Then save the state of the network during partial inference, and use that for all subsequent inferences. Sort of like LoRa, but for inference, instead of fine-tuning.

Inference, after-all, is what matters. And there has to be some subset of prompt-based initializations of a network, that perform, regardless of the prompt, (generally) as well as a full inference step.

Likewise with diffusion, there likely exists some priors (based on the training data) that speed up reconstruction or lower the network loss, allowing us to substitute a 'snapshot' that has the correct distribution, without necessarily performing a full generation.

Another idea I had was 'semantic centering' instead of regional image labelling. The idea is to find some patch of an object within an image, and ask, for all such patches that belong to an object, what best describes the object? if it were a dog, what patch of the image is "most dog-like" etc. I could see it as being much closer to how the human brain quickly identifies objects by short-cuts. The size of such patches could be adjusted to minimize the cross-entropy of classification relative to the tested size of each patch (pixel-sized patches for example might lead to too high a training loss). Of course it might allow us to do a scattershot 'at a glance' type lookup of potential image contents, even if you get multiple categories for a single pixel, it greatly narrows the total span of categories you need to do subsequent searches for.

In other news I'm starting a new ML blackbook for various ideas. Old one is mostly outdated now, and I think I scanned it (and since buried it somewhere amongst my ten thousand other files like a digital hoarder) and lost it.

I have some other 'low-hanging fruit' type ideas for improving existing and emerging models but I'll save those for another time.

random ml chatgpt stable diffusion llm

6
5

dark-Lord

105

6y

Why should I hire a software Engineer, if I can just copy and paste code from Stack Overflow ???

random

9
4

Wisecrack

9365

2y

I wonder if anyone has considered building a large language model, trained on consuming and generating token sequences that are themselves the actual weights or matrix values of other large language models?

Run Lora to tune it to find and generate plausible subgraphs for specific tasks (an optimal search for weights that are most likely to be initialized by chance to ideal values, i.e. the winning lottery ticket hypothesis).

The entire thing could even be used to prune existing LLM weights, in a generative-adversarial model.

Shit, theres enough embedding and weight data to train a Meta-LLM from scratch at this point.
The sum total of trillions of parameter in models floating around the internet to be used as training data.

If the models and weights are designed to predict the next token, there shouldn't be anything to prevent another model trained on this sort of distribution, from generating new plausible models.

You could even do task-prompt-to-model-task embeddings by training on the weights for task specific models, do vector searches to mix models, etc, and generate *new* models,
not new new text, not new imagery, but new *models*.

It'd be a model for training/inferring/optimizing/generating other models.

random ai gpt all the letters of the alphabet ml

4
3

EthanSnowy

208

5y

been looking to integrate some sort of new thing with a raspberry pi because I'm bored. I was thinking about trying out LoRa transceivers but it would be a waste of money because I don't know what I would use it for. I have some knowledge of servers and such but I don't have any ideas on what to use a RPI for. any ideas?

question

4
3

Lord-Squirrel

64

8y

When you find a bug right after you've pushed to production

http://m.imgur.com/w8QMZA0?r

undefined

2
3

ultra-lord

14

5y

I am aiming for google, I have some questions regarding to interview process, I choose to proficient myself into development world and data structure and algorithms, But I have almost 0 skills in competitive programming and I don't have any ranking all competitive programming platforms but I really want to work for google, How do I fulfill my goal to work for google and how do I clear google interview process?

question job dream interview google

2
2

dark-Lord

105

6y

Program - a four letter word.
Programmer- person that converts caffeine to code.

random
2

ultra-lord

14

4y

Hey! I want to create a note app where multiple user can work on same note in real time. I want anybody can become my partner into this project this is my github link.

https://github.com/priyanshuSharma-...

Tech stack-
MERN

devrant mongodb reactjs express javascript

10
1

error-lord

13

5y

Last week Stackoverflow went down for few hours, and now I am jobless

joke/meme meme coding jobless dev errors codinglife coder stackoverflow programminglife

3
0

ultra-lord

14

4y

I have question related to hardware -
I recently order a M.2 SSD on amazon but I don't know is it compatible or not with my system and I order it can anybody help me I am really apricate it.

My SSD link on amazon -

https://amazon.in/gp/product/...

My system link -

https://amazon.in/Lenovo-Ideapad-33...

question hardware ssd computer
0

ultra-lord

14

4y

CPP, python, javascript
Which one is best to learn DSA?
I'm really confused which one should I choose?

question javascript python dsa cpp

6
0

PyVic

124

5y

Anyone knows any high power IoT LoRa boards? I have tried the adafruit and heltech ones, but didn't do much. I need it for a low permittivity environment.

question adafruit arduino lora iot

7

Top Tags

rant linux code windows fuck i java c programming android dev the is javascript js a life joke python

Weekly Rant

Most unrealistic deadline you've had?

devRant © 2021 Hexical Labs LLC
Privacy Policy | Terms of Service