git operations with lots of branches

sysadmin's Avatar

sysadmin

25 Sep, 2013 10:39 AM

I was just about to upgrade to Enterprise, when we started experiencing problems with git repos that have many branches; we get the following message:
2013-09-25 12:10:04.809 INFO [rhodecode.lib.middleware.simplegit] pull action on GIT repo "Mirror/test" by "someone" from my.ip.addr.ess
Exception in thread Thread-10:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/mnt/virtualenv/rhodecode/local/lib/python2.7/site-packages/rhodecode/lib/vcs/subprocessio.py", line 68, in run
    b = s.read(4096)
  File "/usr/lib/python2.7/gzip.py", line 256, in read
    self._read(readsize)
  File "/usr/lib/python2.7/gzip.py", line 283, in _read
    pos = self.fileobj.tell() # Save current position
AttributeError: 'uwsgi._Input' object has no attribute 'tell'

I tracked the issue down to the point where self.source suddenly becomes a 'GzipFile' instead of a '_LengthLimitedFile'.

Our setup is quite simple: we use uwsgi serving to a socket and nginx as a frontend.
uwsgi config (rhodecode.yaml):

uwsgi:
  plugins: python
  chdir: /mnt/virtualenv/rhodecode
  virtualenv: /mnt/virtualenv/rhodecode
  pythonpath: = /mnt/virtualenv/rhodecode
  mount: /=/mnt/virtualenv/rhodecode/dispatch.wsgi
  uid: rhodecode
  gid: rhodecode
  pid: /var/run/uwsgi/app/rhodecode/pid
  memory-report: 1
  reload-on-rss: 256
  #default socket is /run/uwsgi/app/redmine-default/socket
  chown-socket: www-data
  socket-timeout: 3600

and nginx-config is (stripped down):
(...all the ssl stuff and redirects and so on...)
    # deliver static file from nginx
    root /mnt/virtualenv/rhodecode/public;
     location / {

        try_files $uri @uwsgi;

    }
    location @uwsgi {
        # rhodecode is just upstream rhodecode { server unix:///run/uwsgi/app/rhodecode/socket; }
        uwsgi_pass rhodecode;
        uwsgi_param HTTPS on;
        uwsgi_param SCRIPT_NAME "";
        include uwsgi_params;
        uwsgi_read_timeout 3600;
        # buffering on/off has no effect on the issue
        #uwsgi_buffering off;
    }

The system is a Debian/Wheezy with nginx/1.2.1-2.2+wheezy1 (tried backported 1.4.1-3~bpo70+1 too with no change) and uwsgi/1.2.3+dfsg-5.

Is there anything I can do to quickly fix that error?

  1. Support Staff 1 Posted by Marcin Kuzminsk... on 25 Sep, 2013 10:46 AM

    Marcin Kuzminski's Avatar

    That's interesting issue, we're wrapping git requests in UnGzip middleware, but that's just for git operations like pull/push.

    1) during what operation that errors happens
    2) anyway we can get a copy of your repo(if it's public of course) to test
    3) if not can i get detailed specs about the repo, or a way to reproduce it (like creating many many branches, how many etc ?)

    Thanks

  2. 2 Posted by sysadmin on 25 Sep, 2013 11:20 AM

    sysadmin's Avatar

    ad 1) during 'git clone' in the first place.
    ad 2) I just noticed this happens with all git repos that have more than 10, 12, 15 (?) branches; steps to reproduce in our setup: create new repo, clone from http://vtk.org/VTK.git and then try to do a 'git clone' from the newly created repository.
    ad 3) I have no detailed numbers, but I guess it starts at about 10-15 branches.

    At first I thought this might be related to issues nginx had with chunked encoding (there are some postings from 2011 about that on nginx forum and uwsgi forum/mailinglist); therefor I did an upgrade of nginx to 1.4.1 from backports; but that didn't change a thing.

  3. 3 Posted by sysadmin on 25 Sep, 2013 11:23 AM

    sysadmin's Avatar

    ah, and what I forgot to mention: there are some git processes hanging around until I manually kill them after that error:
    /bin/sh -c git upload-pack --stateless-rpc "/path/to/repos/test"
    git upload-pack --stateless-rpc /path/to/repos/test
    git-upload-pack --stateless-rpc /path/to/repos/test

    btw. git version on the server is 1:1.7.10.4-1+wheezy1

  4. Support Staff 4 Posted by Marcin Kuzminsk... on 25 Sep, 2013 11:32 AM

    Marcin Kuzminski's Avatar

    Thanks, for all the info. It's now on my top priority to check this.

  5. Support Staff 5 Posted by Marcin Kuzminsk... on 25 Sep, 2013 11:54 AM

    Marcin Kuzminski's Avatar

    One note, did you try on pure paster with waitress as wsgi server ? It would be good to be sure it's not still NGINX or Uwsgi fault

  6. Support Staff 6 Posted by Marcin Kuzminsk... on 25 Sep, 2013 12:02 PM

    Marcin Kuzminski's Avatar

    As a test (while i download the VTK repo) i did android kernel clone with 22 branches and 277 tags and it worked fine, both with nginx and with pure paster+waitress

    I use GIT 1.8.4, nginx 1.5.0

    i use nginx conf as we have in the documentation.

  7. 7 Posted by sysadmin on 25 Sep, 2013 12:11 PM

    sysadmin's Avatar

    hmmm... it works with 'paster serve production.ini'. Those are the good news, thanks for pointing that out... ;-)
    I also logged a 'dir(self.source) in rhodecode/lib/vcs/subprocessio.py (line 68; where the trace points to) and there is a 'tell' listed:
    dir(self.source)
    ['__abstractmethods__', '__class__', '__delattr__', '__doc__', '__enter__', '__exit__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__metaclass__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '_abc_cache', '_abc_negative_cache', '_abc_negative_cache_version', '_abc_registry', '_add_read_data', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', '_check_closed', '_init_read', '_init_write', '_read', '_read_eof', '_read_gzip_header', '_unread', '_write_gzip_header', 'close', 'closed', 'detach', 'filename', 'fileno', 'flush', 'isatty', 'max_read_chunk', 'myfileobj', 'next', 'read', 'read1', 'readable', 'readinto', 'readline', 'readlines', 'rewind', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
    self.source.__doc__:
    The GzipFile class simulates most of the methods of a file object with
        the exception of the readinto() and truncate() methods.

    Ok, so what can I do next to get it all working?

  8. Support Staff 8 Posted by Marcin Kuzminsk... on 25 Sep, 2013 12:18 PM

    Marcin Kuzminski's Avatar
    # this is here -> https://secure.rhodecode.org/rhodecode-enterprise/files/fb563b8162ecfd82333b36db1544d91eb6f3c7bb/rhodecode/lib/middleware/pygrack.py#L200-203
    
    def make_wsgi_app(repo_name, repo_root, extras):  
        from dulwich.web import LimitedInputFilter, GunzipFilter
        app = GitDirectory(repo_root, repo_name, extras)
        return GunzipFilter(LimitedInputFilter(app))
    
    
    #GunzipFilter is implemented in dulwich.
    
    class GunzipFilter(object):  
        """WSGI middleware that unzips gzip-encoded requests before
        passing on to the underlying application.
        """
    
        def __init__(self, application):
            self.app = application
    
        def __call__(self, environ, start_response):
            if environ.get('HTTP_CONTENT_ENCODING', '') == 'gzip':
                environ.pop('HTTP_CONTENT_ENCODING')
                if 'CONTENT_LENGTH' in environ:
                    del environ['CONTENT_LENGTH']
                environ['wsgi.input'] = gzip.GzipFile(filename=None,
                    fileobj=environ['wsgi.input'], mode='r')
            return self.app(environ, start_response)
    
    #it uses the standard lib gzip.GzipFile
    

    We should maybe create new class that inherits from that and implement those methods ?

  9. 9 Posted by sysadmin on 25 Sep, 2013 12:31 PM

    sysadmin's Avatar

    hmmm... what I don't get: 'self.source' is of type gzip.GzipFile which implements a method 'tell'. Why in the world do we get that exception, then?
    Do you have some introspection functions at hand to overwrite the exception handler in a meaningful way?

  10. Support Staff 10 Posted by Marcin Kuzminsk... on 25 Sep, 2013 12:41 PM

    Marcin Kuzminski's Avatar

    I don't understand one thing you said that self.source is the GzipFile

    but from the exception i see:
    pos = self.fileobj.tell() # Save current position AttributeError: 'uwsgi._Input' object has no attribute 'tell'

    The interesting for me is what type if the self.fileobj, and why it representation says it's 'uwsgi._input'. I would simply copy create my own classes to handle gzip files and inject 'ipdb' (interactive python debugger)

    import ipdb;ipdb.set_trace()
    

    But it must be runned with stdout to console or something so you can get to the prompt.

  11. 11 Posted by sysadmin on 25 Sep, 2013 01:17 PM

    sysadmin's Avatar

    oh noooh! :-) running uwsgi interactive isn't that easy and needs for sure quite some changes to my configs, I guess.
    What I did for now is add some writes to a file to get some more debug information:

    ['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'fileno', 'next', 'read', 'readline', 'readlines']
    HTTP_AUTHORIZATIONBasic CENSORED
    AUTH_TYPE basic
    wsgi.multiprocess True
    pylons.status_code_redirect True
    SCRIPT_NAME
    REQUEST_METHOD POST
    PATH_INFO /test/git-upload-pack
    SERVER_PROTOCOL HTTP/1.1
    QUERY_STRING
    UWSGI_SCHEME https
    HTTP_USER_AGENT git/1.7.10.4
    SERVER_NAME rhodecode
    REMOTE_ADDR my.ip.add.ress
    wsgi.url_scheme http
    SERVER_PORT 443
    uwsgi.node rhodecode
    DOCUMENT_ROOT /mnt/virtualenv/rhodecode/public
    HTTP_CONTENT_LENGTH 1286
    wsgi._org_proto http
    wsgi.input
    REMOTE_USER test
    HTTP_HOST rhodecode
    HTTPS on
    wsgi.multithread False
    HTTP_CONTENT_TYPE application/x-git-upload-pack-request
    REQUEST_URI /test/git-upload-pack
    HTTP_ACCEPT application/x-git-upload-pack-result
    wsgi.file_wrapper
    wsgi.version (1, 0)
    paste.registry
    wsgi.run_once False
    wsgi.errors
    REMOTE_PORT 43269
    uwsgi.version 1.2.3-debian
    CONTENT_TYPE application/x-git-upload-pack-request
    HTTP_ACCEPT_ENCODING deflate, gzip

    I inserted the debug code in 'class GunzipFilter(object)' in dulwich/web.py into the '__call__' function and noticed that this function only gets called on repos with many branches; those with just a few seem to use a different code path?!
    Shall I try to set the wsgi._input environment to a '_LengthLimitedFile' object there?

  12. Support Staff 12 Posted by Marcin Kuzminsk... on 25 Sep, 2013 02:15 PM

    Marcin Kuzminski's Avatar

    GIT for some operations(involving a lot fo changesets or it seems also sending branch info) gzips the data sent over the wire. Dulwich detects that by checking the
    if environ.get('HTTP_CONTENT_ENCODING', '') == 'gzip': Headers, that's why sometimes it's a gzip file sometimes not.

    But very odd that you said it get's only called sometimes ? the call should always be called since it's in the wsgi middleware chain: wrapped_app = GunzipFilter(LimitedInputFilter(app))

    I suspect UWSGI does something to the headers or the 'input' We shoudl g-unzip what git sends before processing it, so changing to _LengthLimitedFile shouln't help IMHO

  13. 13 Posted by sysadmin on 25 Sep, 2013 02:49 PM

    sysadmin's Avatar
    • LengthLimitedFile: ah, yeah, I missed that somehow.
    • "sometimes called": only called when there are many branches, which corresponds to what you said about how git handles things.
    • uwsgi: I am about to backport a newer version of uwsgi and will try with that again.

    I will try to do some further debugging during the night when the server isn't so requently used. In case you have any more ideas for more conclusive tests, just tell me! ;-)

    Thanks for your help!

    PS: Quoted replys would be great...

  14. 14 Posted by sysadmin on 26 Sep, 2013 11:38 AM

    sysadmin's Avatar

    long story short: thank you very much for your help; it works just fine with uwsgi >= 1.9.X
    the reason for that is that they implemented 'seek' and 'tell' in 1.9 in April 2013. :-/

    https://github.com/unbit/uwsgi/commit/cf6a232aef22d386d46b14d691979... (and seek, which is required as well, dates back to february).

    Anyways, for the sake of completeness, a working rhodecode.yaml looks like this:

    uwsgi:
      plugins: python
      chdir: /mnt/virtualenv/rhodecode
      virtualenv: /mnt/virtualenv/rhodecode
      mount: /=/mnt/virtualenv/rhodecode/dispatch.wsgi
      uid: rhodecode
      gid: rhodecode
      pid: /run/uwsgi/app/rhodecode/pid
      log: /var/log/rhodecode/rhodecode.log
      memory-report: 1
      reload-on-rss: 256
      post-buffering: 4096
      #default socket is /run/uwsgi/app/redmine-default/socket
      chown-socket: www-data
      socket-timeout: 3600
    

    Thanks for your help!

  15. Support Staff 15 Posted by Marcin Kuzminsk... on 26 Sep, 2013 11:40 AM

    Marcin Kuzminski's Avatar

    Great to hear that ! I'm closing the ticket now. Enjoy RhodeCode ;)

  16. Marcin Kuzminski closed this discussion on 26 Sep, 2013 11:40 AM.

Comments are currently closed for this discussion. You can start a new one.

Keyboard shortcuts

Generic

? Show this help
ESC Blurs the current field

Comment Form

r Focus the comment reply box
^ + ↩ Submit the comment

You can use Command ⌘ instead of Control ^ on Mac