[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180730191423.GN1206094@devbig004.ftw2.facebook.com>
Date: Mon, 30 Jul 2018 12:14:23 -0700
From: Tejun Heo <tj@...nel.org>
To: Michal Hocko <mhocko@...nel.org>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Roman Gushchin <guro@...com>,
Johannes Weiner <hannes@...xchg.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
David Rientjes <rientjes@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
should_reclaim_retry().
Hello, Michal.
On Mon, Jul 30, 2018 at 08:51:10PM +0200, Michal Hocko wrote:
> > Yeah, workqueue can choke on things like that and kthread indefinitely
> > busy looping doesn't do anybody any good.
>
> Yeah, I do agree. But this is much easier said than done ;) Sure
> we have that hack that does sleep rather than cond_resched in the
> page allocator. We can and will "fix" it to be unconditional in the
> should_reclaim_retry [1] but this whole thing is really subtle. It just
> take one misbehaving worker and something which is really important to
> run will get stuck.
Oh yeah, I'm not saying the current behavior is ideal or anything, but
since the behavior has been put in many years ago, it only became a
problem only a couple times and all cases were rather easy and obvious
fixes on the wq user side. It shouldn't be difficult to add a timer
mechanism on top. We might be able to simply extend the hang
detection mechanism to kick off all pending rescuers after detecting a
wq stall. I'm wary about making it a part of normal operation
(ie. silent timeout). per-cpu kworkers really shouldn't busy loop for
an extended period of time.
Thanks.
--
tejun
Powered by blists - more mailing lists