lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20111101122801.GB25123@suse.de>
Date:	Tue, 1 Nov 2011 12:28:01 +0000
From:	Mel Gorman <mgorman@...e.de>
To:	Colin Cross <ccross@...roid.com>
Cc:	linux-kernel@...r.kernel.org,
	Andrew Morton <akpm@...ux-foundation.org>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@...fujitsu.com>,
	Andrea Arcangeli <aarcange@...hat.com>,
	David Rientjes <rientjes@...gle.com>, linux-mm@...ck.org
Subject: Re: [PATCH] mm: avoid livelock on !__GFP_FS allocations

On Tue, Oct 25, 2011 at 10:08:58AM -0700, Colin Cross wrote:
> On Tue, Oct 25, 2011 at 4:23 AM, Mel Gorman <mgorman@...e.de> wrote:
> > On Tue, Oct 25, 2011 at 02:26:56AM -0700, Colin Cross wrote:
> >> On Tue, Oct 25, 2011 at 2:09 AM, Mel Gorman <mgorman@...e.de> wrote:
> >> > On Mon, Oct 24, 2011 at 11:39:49PM -0700, Colin Cross wrote:
> >> >> Under the following conditions, __alloc_pages_slowpath can loop
> >> >> forever:
> >> >> gfp_mask & __GFP_WAIT is true
> >> >> gfp_mask & __GFP_FS is false
> >> >> reclaim and compaction make no progress
> >> >> order <= PAGE_ALLOC_COSTLY_ORDER
> >> >>
> >> >> These conditions happen very often during suspend and resume,
> >> >> when pm_restrict_gfp_mask() effectively converts all GFP_KERNEL
> >> >> allocations into __GFP_WAIT.
> >> > b>
> >> >> The oom killer is not run because gfp_mask & __GFP_FS is false,
> >> >> but should_alloc_retry will always return true when order is less
> >> >> than PAGE_ALLOC_COSTLY_ORDER.
> >> >>
> >> >> Fix __alloc_pages_slowpath to skip retrying when oom killer is
> >> >> not allowed by the GFP flags, the same way it would skip if the
> >> >> oom killer was allowed but disabled.
> >> >>
> >> >> Signed-off-by: Colin Cross <ccross@...roid.com>
> >> >
> >> > Hi Colin,
> >> >
> >> > Your patch functionally seems fine. I see the problem and we certainly
> >> > do not want to have the OOM killer firing during suspend. I would prefer
> >> > that the IO devices would not be suspended until reclaim was completed
> >> > but I imagine that would be a lot harder.
> >> >
> >> > That said, it will be difficult to remember why checking __GFP_NOFAIL in
> >> > this case is necessary and someone might "optimitise" it away later. It
> >> > would be preferable if it was self-documenting. Maybe something like
> >> > this? (This is totally untested)
> >>
> >> This issue is not limited to suspend, any GFP_NOIO allocation could
> >> end up in the same loop.  Suspend is the most likely case, because it
> >> effectively converts all GFP_KERNEL allocations into GFP_NOIO.
> >>
> >
> > I see what you mean with GFP_NOIO but there is an important difference
> > between GFP_NOIO and suspend.  A GFP_NOIO low-order allocation currently
> > implies __GFP_NOFAIL as commented on in should_alloc_retry(). If no progress
> > is made, we call wait_iff_congested() and sleep for a bit. As the system
> > is running, kswapd and other process activity will proceed and eventually
> > reclaim enough pages for the GFP_NOIO allocation to succeed. In a running
> > system, GFP_NOIO can stall for a period of time but your patch will cause
> > the allocation to fail. While I expect callers return ENOMEM or handle
> > the situation properly with a wait-and-retry loop, there will be
> > operations that fail that used to succeed. This is why I'd prefer it was
> > a suspend-specific fix unless we know there is a case where a machine
> > livelocks due to a GFP_NOIO allocation looping forever and even then I'd
> > wonder why kswapd was not helping.
> 
> OK, I see the change in behavior you are trying to avoid.  With your
> patch GFP_NOIO allocations can still fail during suspend, is that OK?

What is the alternative? We are not making forward progress. This might
mean that the suspend operation fails but I'm not seeing a better
alternative.

> I'm also worried about GFP_NOIO allocations looping forever when swap
> is not enabled, but I've never seen it happen, and it would probably
> recover eventually when another tried tried a GFP_KERNEL allocation
> and oom killed something.

Or when dirty pages backed by the filesystem are cleaned and reclaimed.

-- 
Mel Gorman
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ