[<prev] [next>] [day] [month] [year] [list]
Message-ID: <20180213122940.GS3443@dhcp22.suse.cz>
Date: Tue, 13 Feb 2018 13:29:40 +0100
From: Michal Hocko <mhocko@...nel.org>
To: Chris Wilson <chris@...is-wilson.co.uk>
Cc: linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...nel.org>,
Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] khungtaskd: Kick stuck processes
On Tue 13-02-18 12:08:12, Chris Wilson wrote:
> Quoting Michal Hocko (2018-02-13 11:56:42)
> > On Thu 08-02-18 19:07:53, Chris Wilson wrote:
> > > After spotting a stuck process, and having decided not to panic, give
> > > the task a kick to see if that helps it to recover (e.g. to paper over a
> > > missed wake up).
> >
> > huh, this is just no-no. watchdog is there to report problems not
> > interfere. You cannot never know whether the sleeper is prepared for
> > spurious wakeups. Do not paper over bugs...
>
> Aside from khungtaskd being a debug feature, we want to identify the bug
> by kicking the stuck process and seeing what squeals. Being told that
> khugepaged is stuck over and over again doesn't help resolve who is
> holding onto that lock_page, or if it was just a missed wakeup as all
> other processes are asleep.
And how exactly does kicking helps here? If the waiter uses lock_page
then it would go sleep again because of PG_locked. If the page is not
locked and this is a missed wake up then either unlock_page is wrong
(which doesn't seem to be the case AFAICS) or somebody messes up with
the page locking and this patch doesn't achieve anything.
> We are trying to paper over other bugs so that we can fix ours.
But you do not want to break existing code which might be sensible to
spurious wakeups. You could argue that such a code is broken already and
I would tend to agree, but an artificial wake up is just nogo.
--
Michal Hocko
SUSE Labs
Powered by blists - more mailing lists