10

How to mess with Github Copilot:
1. create open source repo with MIT license
2. create generic functions that solve specific problems
3. put violent, racist, sexist, phobic, political, etc phrases in the code
4. get other people to fork the repo and make their own versions
5. watch as people get upset over copilot being evil and putting shit into their code

Comments
  • 1
    Unless it's not actively trained with new codebases.
  • 4
    Unfortunately it's inner workings are not open source, it could just be an annual / monthly index for all we know.

    which is really a shame on MS / GitHub considering they used open source code bases to train it before getting it to a point to profit from it.

    Amazon have released their own, trained from their own code, so I'm sure it'll end up in a subscription service too, but that's not so bad when it's trained from their own code and being Amazon - it's not a product if they can't profit from it.

    https://aws.amazon.com/codewhispere...
  • 1
    @C0D4 you pose a very interesting question - if FOSS code is treated as data to train a proprietary model, is it being used in violation of it's license?

    I mean, I never heard of a licence that says "even just by reading this code, you must make everything you ever code available on a *** license as well".
    The usually say something like "by using this software in its entirety or significant parts of it in another software you must make the later software available through a *** license".

    And using software as a tool (like a compiler or IDE) doesn't count, I can use GCC to compile proprietary code.

    Maybe we need a clause "by using this FOSS software as data indexed or stored in a knowledge or training base of a model the model must also be FOSS", but it would be a mess since it means that Google and other search engines would be liable to opensource their recommendation models, and that is just not happening. How would we use stack overflow if not by googling questions and snippets?
  • 0
    @JsonBoa The license of the code should apply if it is copied from a repo. The problem here is defining what "copying" means if it is not a direct verbatim copy.

    I think MIT licensed projects are fair game for this. I generally slap an MIT license on code I don't care about. It gives me the don't sue me clause that I want. I would love to hear about people using this code, but I honestly don't care.

    The GPL license could be problematic. LGPL less so, but still could cause issues with some projects.
Add Comment