linux-kernel - Re: [PATCH v6] mm: Add memory allocation watchdog kernel thread.

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170126075753.GD8456@dhcp22.suse.cz>
Date:   Thu, 26 Jan 2017 08:57:53 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Johannes Weiner <hannes@...xchg.org>
Cc:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v6] mm: Add memory allocation watchdog kernel thread.

On Wed 25-01-17 14:22:45, Johannes Weiner wrote:
> On Wed, Jan 25, 2017 at 07:45:49PM +0100, Michal Hocko wrote:
> > On Wed 25-01-17 13:11:50, Johannes Weiner wrote:
> > [...]
> > > >From 6420cae52cac8167bd5fb19f45feed2d540bc11d Mon Sep 17 00:00:00 2001
> > > From: Johannes Weiner <hannes@...xchg.org>
> > > Date: Wed, 25 Jan 2017 12:57:20 -0500
> > > Subject: [PATCH] mm: page_alloc: __GFP_NOWARN shouldn't suppress stall
> > >  warnings
> > > 
> > > __GFP_NOWARN, which is usually added to avoid warnings from callsites
> > > that expect to fail and have fallbacks, currently also suppresses
> > > allocation stall warnings. These trigger when an allocation is stuck
> > > inside the allocator for 10 seconds or longer.
> > > 
> > > But there is no class of allocations that can get legitimately stuck
> > > in the allocator for this long. This always indicates a problem.
> > > 
> > > Always emit stall warnings. Restrict __GFP_NOWARN to alloc failures.
> > 
> > Tetsuo has already suggested something like this and I didn't really
> > like it because it makes the semantic of the flag confusing. The mask
> > says to not warn while the kernel log might contain an allocation splat.
> > You are right that stalling for 10s seconds means a problem on its own
> > but on the other hand I can imagine somebody might really want to have
> > clean logs and the last thing we want is to have another gfp flag for
> > that purpose.
> 
> I don't think it's confusing. __GFP_NOWARN tells the allocator whether
> an allocation failure can be handled or whether it constitutes a bug.
> 
> If we agree that stalling for 10s is a bug, then we should emit the
> warnings.

Yes, in many cases it would be a bug in the MM. Some of them would be
inherent because the allocator doesn't implement any fairness and
starvation cannot be ruled out (would that be a bug?). In general,
looping/spending a lot of time in kernel can be seen as a bug. We have
watchdogs to report those cases and the time has told us that we had to
develop ways to silent those lockups because in some cases we couldn't
handle them. I am worried we will eventually find cases like that for
allocation stalls as well. I might be over sensitive because we have
made some mistakes in the gfp flags land already and I would like to
prevent more to come.

That being said, I will not stand in the way...
-- 
Michal Hocko
SUSE Labs