I had a situation come up today at Experiment Engine that took a surprising amount of time to debug, so I’ll leave a note here in case it helps someone else — even if that “someone else” is me in the future.
We run our Celery workers with preforked child processes (using `-c $WORKER_CONCURRENCY` since we store our config in the environment). However, it looked like only one worker process (say, Process A) was executing the tasks:
-> send T1 to Process A # A executes T1 -> send T2 to Process A # A's buffer can still hold tasks -> send T3 to Process A # A's buffer can still hold tasks <- T1 complete # A begins processing T2 # Processes B, C, D sit around looking for stuff to do
Since I could see that the parent process had 4 child processes configured, this was pretty confusing. But I could also see that the worker had reserved several more tasks to run; it just wasn’t sending them out to the other child processes. Instead, a queued task would just get run by Process A when it was finished with the current one. Frustrating.
It turns out that Celery’s prefetch behavior isn’t just on a parent process level. Prefetching actually occurs for the child processes as well. This is documented, but the example sketched out there isn’t the problem I was seeing, so I skipped over it. Our case looks more like what I showed above.
Anyway, the solution in the documentation is right: just enable the `-Ofair` flag when you start your Celery worker and your tasks will be distributed to child processes correctly.
Thanks to Michael Linder for helping me figure this out.