[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: celery connection problems to RabbitMQ
From: |
Ben Sturmfels |
Subject: |
Re: celery connection problems to RabbitMQ |
Date: |
Tue, 12 May 2020 13:12:42 +1000 |
User-agent: |
mu4e 1.4.4; emacs 26.3 |
Hi Andrew,
That sounds like a real pain - fixing this issue and getting you running
smoothly again should be a priority for us.
I can certainly offer help with uWSGI logging. My preferred approach is
to use uWSGI Emperor (uwsgi-emperor on Debian) because then you just
stick a config file in /etc/uwsgi-emperor/vassals/myapp.ini and you get
logging out of the box `tail -f /var/log/uwsgi/emperor.log`. No systemd
or other process managers required, except for Celery. Running `touch
/etc/uwsgi-emperor/vassals/myapp.ini` will trigger the Emperor to
restart that application.
Here's the config I use. It's mostly all boilerplate that I use on all
work projects:
[uwsgi]
plugins = python37
chdir = /srv/mediagoblin/sturm
home = /opt/mediagoblin
ini-paste = /srv/mediagoblin/sturm/paste.ini
env = PYTHON_EGG_CACHE=/srv/.python-eggs
env = CELERY_ALWAYS_EAGER=false
master = true
socket = 127.0.0.1:6544
cheaper = 1
processes = 2
harakiri = 20
max-requests = 5000
vacuum = true
# For sentry, see
https://docs.sentry.io/clients/python/advanced/#a-note-on-uwsgi.
enable-threads = true
# Handle Unicode chars in uploaded filenames.
env = LANG=en_AU.UTF-8
# Haven't decided how to securely handle code being able to write __pycache__
# directories and bytecode into read-only directories.
env = PYTHONDONTWRITEBYTECODE=true
# Per Django deployment checklist.
env = PYTHONHASHSEED=random
log-prefix = mediagoblin-sturm
log-format = [pid: %(pid)|app: ??|req: ??/??] %(addr) (%(user)) {%(vars) vars
in %(pktsize) bytes} [%(ctime)] %(method) %(host) %(uri) => generated %(rsize)
bytes in %(msecs) msecs (%(proto) %(status)) %(headers) headers in %(hsize)
bytes (%(switches) switches on core %(core))
Regards,
Ben
On Tue, 12 May 2020, ayleph wrote:
> Hi Fernando,
>
> I'm so glad you brought this up. I think I've had the same issues with Celery
> for months. It's been so bad that I had to disable uploads on
> goblinrefuge.com because every time I restarted Celery it would mark a lot of
> uploads failed and required too much manual intervention to fix.
>
> I don't have any useful debug information to contribute. When I switched from
> flup/fcgi to uwsgi, I lost a lot of my error logging and haven't been able to
> figure out how to get it back.
>
> May 11, 2020 03:44:42 Ben Sturmfels <address@hidden>:
>
>> Hi Fernando,
>>
>>
>> Please post a patch or a link to a remote branch to an issue on the
>>
>> issue tracker - ideally a separate change for celery and spectrograms.
>>
>>
>> Regarding PySoundFile, the file you're after is setup.py.
>>
>>
>> For what it's worth, I see that the Python SoundFile library is
>>
>> available in Debian, but PySoundFile doesn't appear to be. This isn't a
>>
>> complete showstopper, but it would help us when tackling distro
>>
>> packaging in the near future.
>>
>>
>> Regards,
>>
>> Ben
>>
>>
>> On Mon, 11 May 2020, Fernando Gutierrez wrote:
>>
>>
>>
>>
>>
>> > Hi Ben
>> >
>> >
>> > Sorry I think I didn't explain clearly. I only fixed the connection
>> > reset
>> >
>> > exceptions in celery but the bug with media changed to failed state is
>> >
>> > not fixed.
>> >
>> >
>> > I will continue debugging but it may take some time. I don't know
>> >
>> > why celery thinks a completed task needs to be run again.
>> >
>> >
>> > In the meantime I will submit a patch for the systemd file, the
>> >
>> > BROKER_HEARTBEAK issue and also a fix for the audio spectrogram
>> >
>> > code as I mentioned in the IRC channel.
>> >
>> >
>> > I have a couple of questions:
>> >
>> >
>> > 1) I'm not familiar with the development process, I already created
>> > an
>> >
>> > account in savannah.gnu.org but I don't see how to submit a patch
>> >
>> > for review.
>> >
>> > 2) For the spectrogram I used the PySoundFile package. What file do
>> >
>> > I need to modify so it gets pulled during setup?, in my setup I
>> >
>> > manually called ./bin/pip install PySoundFile
>> >
>> >
>> > Thanks
>> >
>> > Fernando
>> >
>> >
>> > On Sun, May 10, 2020 at 6:39 AM Ben Sturmfels
>> >
>> > <address@hidden> wrote:
>> >
>> >
>> > Hi Fernando,
>> >
>> >
>> > On Sun, 10 May 2020, Fernando Gutierrez wrote:
>> >
>> >
>> > > I recently asked in the IRC channel about RabbitMQ connection
>> >
>> > reset
>> >
>> > > errors in celeryd logs.
>> >
>> > >
>> >
>> > > I think there are two issues:
>> >
>> > >
>> >
>> > > 1) The example systemd file (mediagoblin-celeryd.service) from
>> >
>> > >
>> >
>> > https://mediagoblin.readthedocs.io/en/stable/siteadmin/deploying.html
>> >
>> >
>> > > does not specify that celeryd must be started after RabbitMQ,
>> >
>> > so it is
>> >
>> > > sometimes started before and fails because RabbitMQ is not
>> >
>> > running
>> >
>> > > yet.
>> >
>> > >
>> >
>> > > 2) In mediagoblin/mediagoblin/init/celery/__init__.py, it sets
>> >
>> > > celery_settings['BROKER_HEARTBEAT'] = 1. In slower systems
>> >
>> > or
>> >
>> > > under heavy load if the worker is too slow to respond in < 1
>> >
>> > second it
>> >
>> > > will miss the heartbeat and after a few missed heartbeats the
>> >
>> > > connection is considered dead and reset.
>> >
>> > > I'm not sure what is the purpose of changing
>> >
>> > BROKER_HEARTBEAT to
>> >
>> > > 1 but the celery docs recommend not using such a small
>> >
>> > value. In my
>> >
>> > > install I changed it to 20 and I no longer see any connection
>> >
>> > > problems.
>> >
>> > >
>> >
>> > > Are you willing to accept a patch for
>> >
>> > > mediagoblin/docs/source/siteadmin/deployment.rst and
>> >
>> > > mediagoblin/mediagoblin/init/celery/__init__.py to fix those two
>> >
>> > > problems?
>> >
>> >
>> > Thank you very much for diving in and investigating the issue.
>> >
>> > We'd be
>> >
>> > happy to take a patch on this. If you can add a comment to
>> >
>> > explain the
>> >
>> > new BROKER_HEARTBEAT value in the code, that would be great.
>> >
>> >
>> > I wonder if there there might still be a problem lurking here
>> >
>> > though,
>> >
>> > even if your system is now working properly Not being able to
>> >
>> > connect to
>> >
>> > RabbitMQ or an unresponsive celery worker probably shouldn't
>> >
>> > change
>> >
>> > existing processed media items to failed.
>> >
>> >
>> > Thanks for your work on this!
>> >
>> >
>> > Regards,
>> >
>> > Ben
>> >
>> >
>> >
>>
>>