lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 23 Feb 2015 11:48:10 +0100
From:	Michal Hocko <mhocko@...e.cz>
To:	Johannes Weiner <hannes@...xchg.org>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Theodore Ts'o <tytso@....edu>,
	Dave Chinner <david@...morbit.com>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	dchinner@...hat.com, linux-mm@...ck.org, rientjes@...gle.com,
	oleg@...hat.com, mgorman@...e.de, torvalds@...ux-foundation.org,
	xfs@....sgi.com, linux-ext4@...r.kernel.org
Subject: Re: How to handle TIF_MEMDIE stalls?

On Sat 21-02-15 19:20:58, Johannes Weiner wrote:
> On Sat, Feb 21, 2015 at 01:19:07AM -0800, Andrew Morton wrote:
> > Short term, we need to fix 3.19.x and 3.20 and that appears to be by
> > applying Johannes's akpm-doesnt-know-why-it-works patch:
> > 
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2382,8 +2382,15 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> >  		if (high_zoneidx < ZONE_NORMAL)
> >  			goto out;
> >  		/* The OOM killer does not compensate for light reclaim */
> > -		if (!(gfp_mask & __GFP_FS))
> > +		if (!(gfp_mask & __GFP_FS)) {
> > +			/*
> > +			 * XXX: Page reclaim didn't yield anything,
> > +			 * and the OOM killer can't be invoked, but
> > +			 * keep looping as per should_alloc_retry().
> > +			 */
> > +			*did_some_progress = 1;
> >  			goto out;
> > +		}
> >  		/*
> >  		 * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
> >  		 * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
> > 
> > Have people adequately confirmed that this gets us out of trouble?
> 
> I'd be interested in this too.  Who is seeing these failures?
> 
> Andrew, can you please use the following changelog for this patch?
> 
> ---
> From: Johannes Weiner <hannes@...xchg.org>
> 
> mm: page_alloc: revert inadvertent !__GFP_FS retry behavior change
> 
> Historically, !__GFP_FS allocations were not allowed to invoke the OOM
> killer once reclaim had failed, but nevertheless kept looping in the
> allocator.  9879de7373fc ("mm: page_alloc: embed OOM killing naturally
> into allocation slowpath"), which should have been a simple cleanup
> patch, accidentally changed the behavior to aborting the allocation at
> that point.  This creates problems with filesystem callers (?) that
> currently rely on the allocator waiting for other tasks to intervene.
> 
> Revert the behavior as it shouldn't have been changed as part of a
> cleanup patch.

OK, if this a _short term_ change. I really think that all the requests
except for __GFP_NOFAIL should be able to fail. I would argue that it
should be the caller who should be fixed but it is true that the patch
was introduced too late (rc7) and so it caught other subsystems
unprepared so backporting to stable makes sense to me. But can we please
move on and stop pretending that allocations do not fail for the
upcoming release?

> Fixes: 9879de7373fc ("mm: page_alloc: embed OOM killing naturally into allocation slowpath")
> Signed-off-by: Johannes Weiner <hannes@...xchg.org>

Acked-by: Michal Hocko <mhocko@...e.cz>

-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ