linux-kernel - Re: [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160726072530.GC32462@dhcp22.suse.cz>
Date:	Tue, 26 Jul 2016 09:25:30 +0200
From:	Michal Hocko <mhocko@...nel.org>
To:	Mikulas Patocka <mpatocka@...hat.com>
Cc:	NeilBrown <neilb@...e.com>, linux-mm@...ck.org,
	Ondrej Kozina <okozina@...hat.com>,
	David Rientjes <rientjes@...gle.com>,
	Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>,
	Mel Gorman <mgorman@...e.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>, dm-devel@...hat.com
Subject: Re: [RFC PATCH 2/2] mm, mempool: do not throttle PF_LESS_THROTTLE
 tasks

On Mon 25-07-16 17:52:17, Mikulas Patocka wrote:
> 
> 
> On Sat, 23 Jul 2016, NeilBrown wrote:
> 
> > "dirtying ... from the reclaim context" ??? What does that mean?
> > According to
> >   Commit: 26eecbf3543b ("[PATCH] vm: pageout throttling")
> > From the history tree, the purpose of throttle_vm_writeout() is to
> > limit the amount of memory that is concurrently under I/O.
> > That seems strange to me because I thought it was the responsibility of
> > each backing device to impose a limit - a maximum queue size of some
> > sort.
> 
> Device mapper doesn't impose any limit for in-flight bios.
> 
> Some simple device mapper targets (such as linear or stripe) pass bio 
> directly to the underlying device with generic_make_request, so if the 
> underlying device's request limit is reached, the target's request routine 
> waits.
> 
> However, complex dm targets (such as dm-crypt, dm-mirror, dm-thin) pass 
> bios to a workqueue that processes them. And since there is no limit on 
> the number of workqueue entries, there is no limit on the number of 
> in-flight bios.
> 
> I've seen a case when I had a HPFS filesystem on dm-crypt. I wrote to the 
> filesystem, there was about 2GB dirty data. The HPFS filesystem used 
> 512-byte bios. dm-crypt allocates one temporary page for each incoming 
> bio. So, there were 4M bios in flight, each bio allocated 4k temporary 
> page - that is attempted 16GB allocation. It didn't trigger OOM condition 
> (because mempool allocations don't ever trigger it), but it temporarily 
> exhausted all computer's memory.

OK, that is certainly not good and something that throttle_vm_writeout
aimed at protecting from. It is a little bit poor protection because
it might fire much more earlier than necessary. Shouldn't those workers
simply backoff when the underlying bdi is congested? It wouldn't help
to queue more IO when the bdi is hammered already.
 
> I've made some patches that limit in-flight bios for device mapper in the 
> past, but there were not integrated into upstream.

Care to revive them? I am not an expert in dm but unbounded amount of
inflight IO doesn't really sound good.

[...]
-- 
Michal Hocko
SUSE Labs