linux-kernel - Re: [PATCH] mm,vmscan: Use accurate values for zone

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151027091603.GB9891@dhcp22.suse.cz>
Date:	Tue, 27 Oct 2015 10:16:03 +0100
From:	Michal Hocko <mhocko@...nel.org>
To:	Tejun Heo <htejun@...il.com>
Cc:	Christoph Lameter <cl@...ux.com>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	linux-mm@...ck.org, linux-kernel@...r.kernel.org,
	torvalds@...ux-foundation.org,
	David Rientjes <rientjes@...gle.com>, oleg@...hat.com,
	kwalker@...hat.com, akpm@...ux-foundation.org, hannes@...xchg.org,
	vdavydov@...allels.com, skozina@...hat.com, mgorman@...e.de,
	riel@...hat.com
Subject: Re: [PATCH] mm,vmscan: Use accurate values for zone_reclaimable()
 checks

On Sat 24-10-15 03:21:09, Tejun Heo wrote:
> Hello,
> 
> On Fri, Oct 23, 2015 at 01:11:45PM +0200, Michal Hocko wrote:
> > > The problem here is not lack
> > > of execution resource but concurrency management misunderstanding the
> > > situation. 
> > 
> > And this sounds like a bug to me.
> 
> I don't know.  I can be argued either way, the other direction being a
> kernel thread going RUNNING non-stop is buggy.  Given how this has
> been a complete non-issue for all the years, I'm not sure how useful
> plugging this is.

Well, I guess we haven't noticed because this is a pathological case. It
also triggers OOM livelocks which were not reported in the past either.
You do not reach this state normally unless you rely _want_ to kill your
machine

And vmstat is not the only instance. E.g. sysrq oom trigger is known
to stay behind in similar cases. It should be changed to a dedicated
WQ_MEM_RECLAIM wq and it would require runnable item guarantee as well.

> > Don't we have some IO related paths which would suffer from the same
> > problem. I haven't checked all the WQ_MEM_RECLAIM users but from the
> > name I would expect they _do_ participate in the reclaim and so they
> > should be able to make a progress. Now if your new IMMEDIATE flag will
> 
> Seriously, nobody goes full-on RUNNING.

Looping with cond_resched seems like general pattern in the kernel when
there is no clear source to wait for. We have io_schedule when we know
we should wait for IO (in case of congestion) but this is not necessarily
the case - as you can see here. What should we wait for? A short nap
without actually waiting on anything sounds like a dirty workaround to
me.

> > guarantee that then I would argue that it should be implicit for
> > WQ_MEM_RECLAIM otherwise we always risk a similar situation. What would
> > be a counter argument for doing that?
> 
> Not serving any actual purpose and degrading execution behavior.

I dunno, I am not familiar with WQ internals to see the risks but to me
it sounds like WQ_MEM_RECLAIM gives an incorrect impression of safety
wrt. memory pressure and as demonstrated it doesn't do that. Even if you
consider cond_resched behavior of the page allocator as bug we should be
able to handle this gracefully.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/