1

So anyone gonna comment on how chatgpt doesn't continue when it's html ? No one fixed that yet ? They should have

Comments
  • 0
    What do.you mean html?
  • 1
    @asgs this
    Haven't tried it again yet but as I added descriptions to how the app should be built the html grew past the point the chat would show it and it wouldn't show me the rest like the rest of the code types when asked
  • 0
    Now it's fine heh
  • 0
    I'm impressed it knows major application part names
  • 2
    It wasn't "fixed" because it's essentially unfixable. It's hitting it's generation limit. you'd need to retrain the entire model with larger output.

    asking it to continue isn't a good fix, because that could be exploitable to just generate endless html in which case you could essentially DDOS the entire service. unless they limit it, in which case you're back to problem #1 with limited output anyway.
  • 0
    It wasn't "fixed" because it's essentially unfixable. It's hitting it's generation limit. you'd need to retrain the entire model with larger output.

    asking it to continue isn't a good fix, because that could be exploitable to just generate endless html in which case you could essentially DDOS the entire service. unless they limit it, in which case you're back to problem #1 with limited output anyway.
  • 2
    @Hazarth I love how your duplicate posts illustrate your point. lol
  • 0
    @Hazarth no it's fixed and I was able to add more lol
  • 0
    @Hazarth so is the token output some huge array of integers overtrained to specific real number values that have a corresponding code meaning ? Because that wouldn't make sense lol
  • 0
    @AvatarOfKaine The output is indeed an array of integers, it's token values that represent words in a dictionary actually.

    The important part here is the last 2 layers of the model. I'm not sure what the limit is, some sources say it's about 500 words, but you can easily find out what the limit is by reaching it and then counting the words.

    This means the second-to-last layer is a matrix of size N x M x 500, where N x M is a 2D vector representation of a single word.

    the last layer then uses a word encoding/decoding layer to turn each of those 500 2D vectors into specific positive integer indexes in a dictionary which contains all the words the AI "knows"

    this is why there's a limit to the output. This is true for all NN models we have so far except for RNN's, but GPT is a transformer and Transformers not RNN's as RNNs are super slow because of their recurrent nature and they have other issues too.
  • 0
    @AvatarOfKaine

    Ofc the AI model doesn't know "code meaning" at all. It's a language model, so what it learns is how each of the words it knows go together. The genius with transformers is the use of attention layers, which means that the AI at all steps of generation knows what you said, and also what it already said, and it knows at which words to look in the past so that the next predicted word makes the most sense. This is how it generates usable code. Because for example after the words "public class MyClass {" the word "Apple" makes no sense in any context it learned, but statistically the word "private" is going to be there almost 100% of the time... and so on and so forth until it filled It's entire prediction buffer.

    It could continue forever predicting the next word, but it'll start forgetting things past the 500th word as the buffer is limited and it might start to ramble eventually, especially with long code segments that push out the original context
  • 0
    @Hazarth ohhh it's generating probability the next word would be specific and choosing the best one based on a hodgepodge of input tokens ?
    Well.
    That's.
    Eee
  • 0
    @Hazarth then the translator chooses the one with highest probability and it does this somehow a word at a time ?
  • 0
    @Hazarth is sentence position an input neuron ?
  • 0
    @AvatarOfKaine indeed, the position of a word in the input sentence is part of the input. There's some clever vector math behind that but gist of it is that the network understand the order of input words pretty well. So It's not a hodgepodge like it would be with a naive model. It retains input structure during the whole generation process.

    And the word-at-a-time thing works by always predicting the 500 tokens, but 499 are passthrough shifted one to the left from the Input, the Network pretty much copies them using a mask and then adds It's own predicted token at the end...

    So you're mapping a 500 word input indices to 500 word output indices with the last one being the prediction. That's what "one word at a time" means. You can then take the entire output and feed it back in to repeat the process and get the next word, and so on.
  • 0
    No I mean the expected output word
  • 0
    @Hazarth is there any demo of this ?like a real demonstration of the arch for a developer ?
  • 0
    @Hazarth and no not web service endpoints
  • 0
    @Hazarth because I find myself thinking it can't be that cut And dry
  • 1
    @AvatarOfKaine well that's the fun part. No there isn't because OpenAI refuses to talk about it or make it open.

    What we have is this great description of GPT3 - https://jalammar.github.io/how-gpt3...

    And the paper "Attention is all you need" which introduces the architecture of GPT models - https://arxiv.org/abs/1706.03762

    That's the thing about AI models... It *is* that cut and drt, it just looks like magic until you understand it
Add Comment