4
any-mesh
286d

Why do I have to install github lfs for files just over 100 mb.

Comments
  • 4
    Because every time you change that file, you will keep both the old and the new version of the file in the repo forever.

    And it's git lfs, github is a different thing.
  • 1
    @electrineer my mistake.
  • 1
    lfs feels like a big after thought in git.

    I don't think it's a great solution, but definetly better than having your .git grow into gigabytes in a handful of commits
  • 4
    Let's go back to what GIT actually is.

    The best introduction was by Linus Torvalds: A stupid content tracker.

    It's internally more or less a filesystem.

    Blob: Bunch of bytes representing **content** of a file. Content is important, a blob is not a representation of a file with a mimetype and stuff. It's really just the content of the file.

    A tree is then a collection of blobs, so simply put a kind of directory.

    If you ever wondered why you cannot store a directory in git without adding some file in it, for example a .gitkeep.... That's the reason. Cause there are no files inside GIT - you need the content of the .gitkeep, as you track the content. Thus without the content, no git entry.

    Now think about how this works when you start tracking 100 MiB of e.g. text files.

    GIT has to track it. Every operation, be it checkout, diffing, ... Anything. Will have to work on that 100 MiB content...

    Not a great idea. Especially since we're talking about content, e.g. a MPEG file cannot be diffed / tracked, it's binary gibberish.

    That's where GIT LFS comes in.

    It reduces the GIT operations to a reference, kinda like a foreign key, so GIT doesn't work on the content anymore. It just works now - more or less, simplified for reasons of understanding - on the reference.

    Which adapts GIT, which was meant to track file contents, to a tracker of file contents and file *references*, which then are actually files with a file type and not the content of files.

    It's actually pretty cool, given that it combines the strengths of GIT - especially performance - with the ability to store large files. Without turning GIT into a crawling slowpoke.

    So it wasn't really an after thought, GIT was just never designed this way.

    But the simple design of GIT allowed to adapt it to sth that does the job far better.

    Similar approach is found by the way in every database for e.g. CLOB / BLOB data. For example TOAST in PostgreSQL.

    https://postgresql.org/docs/...
  • 0
    In 95% of cases I've seen where large files are stored in Git... they shouldn't have been anyway.
  • 0
    @AlmondSauce but if you use unity, a lot of files are over the limit.
  • 1
    @any-mesh Sure, and there's cases like that where it makes sense.

    A lot of the time though I've seen it used as a dumping ground for large dependency installer files, build files, videos / media files that don't belong in source but are there for convenience, etc.
  • 1
    @AlmondSauce I kinda get what you mean

    I'm kind of tired of one certain project at my workplace that has like almost a gigabyte of compiled binaries of some proprietary library that we need to use in the repo.

    Of course it's in there once, the repo wants to be cross platform so there's different builds for different platforms...
  • 2
    @LotsOfCaffeine Yeah exactly this. What starts as "ah let's just chuck this binary here as it's convenient" turns into "Ah but 10 more multigig files here can't hurt that much more, surely..."
Add Comment