Summary
    Git LFS
    Empty

    Functional overview

    Add support of git lfs in Tuleap to allow a better management of big files in git.

    From a user stand point, it's the possibility to use git lfs transparently with a git repository remote on a Tuleap server, either in ssh or https.

    Once git lfs is available, the limit of file size allowed in a git repository should be lowered an enforced to a more aggressive value (50MB proposed).

    As git lfs comes with the ability to easily store huge files, it's proposed to develop this feature with a per-project quota for file storage.

    From a system administration standpoint, it should also be possible to use a cheap filestorage system to avoid storing huge file on high end storage (SAN & co). One option would be to allow usage of minio or openio storage (via AWS S3 compatibility layer) when availble. Storage on regular filesystem (for instance NFS) would also be supported.

    Technical overview

    API

    There is a list of end points to implements to comply with the LFS server API. The implementation of those end-points should leverage the work being done on git front router (request #11450). The end-point to implement:

    • https://tuleap.example.com/plugins/git/projectname/foo/bar.git/info/lfs/objects/batch [POST] +verify
    • https://tuleap.example.com/plugins/git/projectname/foo/bar.git/info/lfs/locks [POST|GET] + verify
    • https://tuleap.example.com/plugins/gitlfs [PUT|GET] (for storage & retrieval of files)

    Only "basic" transport would be supported (it's the only transport available for 100% of clients). Tus.io for chunked upload/download might come later.

    SSH

    ssh/gitolite must be updated to allow advertising of lfs end-points as well as manage authentication transfer (from ssh to https).

    ssh/gitolite should be updated to refuse files larger than 50MB

    HTTPS

    There is nothing special to change for support of lfs for people accessing in HTTPS

    Impact on existing features

    We have to take into account

    • fork of lfs based repositories
    • pullrequests on lfs objects
    • gitphp / git repository browsing of lfs objects (should be part of the work on Modern Git view epics #10400)

    PHP & nginx

    There is not technical constraints to manage efficiently very huge file in php as we are managing them with PUT and GET. Those 2 verbs in addition to nginx + fpm usage allow to manage arbitrary file size (tested up to 1GB with 256MB of RAM allowed to php-fpm). Nginx should be allowed to accept big client requests.

    Storage

    There are 2 approaches for storage management

    • one central "git lfs store" common to the platform
    • one "git lfs store" per repository

    The first one has the advantage of efficiency in term of space as the same file will be stored only once (given that the file is stored based on the sha256 of it's content). If the same video of 1GB is shared across 100 repositories in various projects & forks, only 1 GB will be used on FS.

    It's also very simple to manage fork of repositories (nothing to do as the reference to the file doesn't change).

    However it means that we need to keep a "reference counter" of the files used (which repository use which file) so we know which files can be garbage collected when repositories and projects are deleted.

    The "per repo" strategy get rid of the "references counter" but takes more space (no de-duplication) and there are trick to do on repositories fork.

    Quota & limits management

    We should implement and enforce a quota per project and platform

    • To limit the amout of storage consumed by a project (with defaults & exceptions). The current quota "informative only" feature for site admin could be re-used
    • To limit the max size of objects stored in LFS (even if we can push video of 10GB, do we really want them on our sever).

    3rd party storage

    It's a bit out of scope for this feature but if there is an object storage solution (like AWS S3) it should be possible to use it instead of storing on filesystem.

    Spike remaining:

    • ref management
    • authentication
    • verify upload sha256
    • flysystem to abstract storage

    Resources

    Progress
    Empty
    Empty
    Closed
    Details
    #11511
    Manuel Vacelet (vaceletm)
    2019-03-12 13:44
    2018-05-24 14:42
    Attachments
    Empty
    References

    Follow-ups

    User avatar
    Hello Manuel,

    i hope you're doing well !

    We' re currently working on git-lfs and we have faced some issues regarding the migration of repos from Tuleap to Gerrit, the problem is that once the repo is migrated to Gerrit it loses track to all the files tracked by git-lfs. so, if you run git clone you only get the file pointers and not the real content of the file.
    If there's is a solution to this issue it would we perfect to let me know.

    Thank you Manuel.
    User avatar
    • Description
      Something went wrong, the follow up content couldn't be loaded
      Only formatting have been changed, you should switch to markup to see the changes