lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151027092231.GC9891@dhcp22.suse.cz>
Date:	Tue, 27 Oct 2015 10:22:31 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	cl@...ux.com, htejun@...il.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org,
	rientjes@...gle.com, oleg@...hat.com, kwalker@...hat.com,
	akpm@...ux-foundation.org, hannes@...xchg.org,
	vdavydov@...allels.com, skozina@...hat.com, mgorman@...e.de,
	riel@...hat.com
Subject: Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable()
 checks

On Sun 25-10-15 19:52:59, Tetsuo Handa wrote:
[...]
> Three approaches are proposed for fixing this silent livelock problem.
> 
>  (1) Use zone_page_state_snapshot() instead of zone_page_state()
>      when doing zone_reclaimable() checks. This approach is clear,
>      straightforward and easy to backport. So far I cannot reproduce
>      this livelock using this change. But there might be more locations
>      which should use zone_page_state_snapshot().
> 
>  (2) Use a dedicated workqueue for vmstat_update item which is guaranteed
>      to be processed immediately. So far I cannot reproduce this livelock
>      using a dedicated workqueue created with WQ_MEM_RECLAIM|WQ_HIGHPRI
>      (patch proposed by Christoph Lameter). But according to Tejun Heo,
>      if we want to guarantee that nobody can reproduce this livelock, we
>      need to modify workqueue API because commit 3270476a6c0c ("workqueue:
>      reimplement WQ_HIGHPRI using a separate worker_pool") which went to
>      Linux 3.6 lost the guarantee.
> 
>  (3) Use a !TASK_RUNNING sleep inside page allocator side. This approach
>      is easy to backport. So far I cannot reproduce this livelock using
>      this approach. And I think that nobody can reproduce this livelock
>      because this changes the page allocator to obey the workqueue's
>      expectations. Even if we leave this livelock problem aside, not
>      entering into !TASK_RUNNING state for too long is an exclusive
>      occupation of workqueue which will make other items in the workqueue
>      needlessly deferred. We don't need to defer other items which do not
>      invoke a __GFP_WAIT allocation.
> 
> This patch does approach (3), by inserting an uninterruptible sleep into
> page allocator side before retrying, in order to make sure that other
> workqueue items (especially vmstat_update item) are given a chance to be
> processed.
> 
> Although a different problem, by using approach (3), we can alleviate
> needlessly burning CPU cycles even when we hit OOM-killer livelock problem
> (hang up after the OOM-killer messages are printed because the OOM victim
> cannot terminate due to dependency).

I really dislike this approach. Waiting without having an event to
wait for is just too ugly. I think 1) is easiest to backport to
stable kernels without causing any other regressions. 2) is the way
to move forward for next kernels and we should really think whether
WQ_MEM_RECLAIM should imply also WQ_HIGHPRI by default. If there is a
general consensus that there are legitimate WQ_MEM_RECLAIM users which
can do without the other flag then I am perfectly OK to use it for
vmstat and oom sysrq dedicated workqueues.

> Signed-off-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
[...]
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ