linux-kernel - Re: Silent hang up caused by pages being not scanned?

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20151014145938.GI28333@dhcp22.suse.cz>
Date:	Wed, 14 Oct 2015 16:59:38 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Cc:	rientjes@...gle.com, oleg@...hat.com,
	torvalds@...ux-foundation.org, kwalker@...hat.com, cl@...ux.com,
	akpm@...ux-foundation.org, hannes@...xchg.org,
	vdavydov@...allels.com, linux-mm@...ck.org,
	linux-kernel@...r.kernel.org, skozina@...hat.com
Subject: Re: Silent hang up caused by pages being not scanned?

On Wed 14-10-15 23:38:00, Tetsuo Handa wrote:
> Michal Hocko wrote:
[...]
> > Why hasn't balance_dirty_pages throttled writers and allowed them to
> > make the whole LRU dirty? What is your dirty{_background}_{ratio,bytes}
> > configuration on that system.
> 
> All values are defaults of plain CentOS 7 installation.

So this is 3.10 kernel, right?

> # sysctl -a | grep ^vm.
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 30
[...]

OK, this is nothing unusual. And I _suspect_ that the throttling simply
didn't cope with the writer speed and a large anon memory consumer.
Dirtyable memory was quite high until your anon hammer bumped in
and reduced dirtyable memory down so the file LRU is full of dirty pages
when we get under serious memory pressure. Anonymous pages are not
reclaimable so the whole memory pressure goes to file LRUs and bang.

> > Also why throttle_vm_writeout haven't slown the reclaim down?
> 
> Too difficult question for me.
> 
> > 
> > Anyway this is exactly the case where zone_reclaimable helps us to
> > prevent OOM because we are looping over the remaining LRU pages without
> > making progress... This just shows how subtle all this is :/
> > 
> > I have to think about this much more..
> 
> I'm suspicious about tweaking current reclaim logic.
> Could you please respond to Linus's comments?

Yes I plan to I just didn't get to finish my email yet.
 
> There are more moles than kernel developers can find. I think that
> what we can do for short term is to prepare for moles that kernel
> developers could not find, and for long term is to reform page
> allocator for preventing moles from living.

This is much easier said than done :/ The current code is full of
heuristics grown over time based on very different requirements from
different kernel subsystems. There is no simple solution for this
problem I am afraid.
-- 
Michal Hocko
SUSE Labs
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/