[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20180730185110.GB24267@dhcp22.suse.cz>
Date: Mon, 30 Jul 2018 20:51:10 +0200
From: Michal Hocko <mhocko@...nel.org>
To: Tejun Heo <tj@...nel.org>
Cc: Tetsuo Handa <penguin-kernel@...ove.sakura.ne.jp>,
Roman Gushchin <guro@...com>,
Johannes Weiner <hannes@...xchg.org>,
Vladimir Davydov <vdavydov.dev@...il.com>,
David Rientjes <rientjes@...gle.com>,
Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-mm <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] mm,page_alloc: PF_WQ_WORKER threads must sleep at
should_reclaim_retry().
On Mon 30-07-18 08:44:24, Tejun Heo wrote:
> Hello,
>
> On Tue, Jul 31, 2018 at 12:25:04AM +0900, Tetsuo Handa wrote:
> > WQ_MEM_RECLAIM guarantees that "struct task_struct" is preallocated. But
> > WQ_MEM_RECLAIM does not guarantee that the pending work is started as soon
> > as an item was queued. Same rule applies to both WQ_MEM_RECLAIM workqueues
> > and !WQ_MEM_RECLAIM workqueues regarding when to start a pending work (i.e.
> > when schedule_timeout_*() is called).
> >
> > Is this correct?
>
> WQ_MEM_RECLAIM guarantees that there's always gonna exist at least one
> kworker running the workqueue. But all per-cpu kworkers are subject
> to concurrency limiting execution - ie. if there are any per-cpu
> actively running on a cpu, no futher kworkers will be scheduled.
Well, in the ideal world we would _use_ that pre-allocated kworker if
there are no other available because they are doing something that takes
a long time to accomplish. Page allocator can spend a lot of time if we
are struggling to death to get some memory.
> > > We can add timeout mechanism to workqueue so that it
> > > kicks off other kworkers if one of them is in running state for too
> > > long, but idk, if there's an indefinite busy loop condition in kernel
> > > threads, we really should get rid of them and hung task watchdog is
> > > pretty effective at finding these cases (at least with preemption
> > > disabled).
> >
> > Currently the page allocator has a path which can loop forever with
> > only cond_resched().
>
> Yeah, workqueue can choke on things like that and kthread indefinitely
> busy looping doesn't do anybody any good.
Yeah, I do agree. But this is much easier said than done ;) Sure
we have that hack that does sleep rather than cond_resched in the
page allocator. We can and will "fix" it to be unconditional in the
should_reclaim_retry [1] but this whole thing is really subtle. It just
take one misbehaving worker and something which is really important to
run will get stuck.
That being said I will post the patch with updated changelog recording
this.
[1] http://lkml.kernel.org/r/ca3da8b8-1bb5-c302-b190-fa6cebab58ca@I-love.SAKURA.ne.jp
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists