linux-ext4 - Re: [PATCH 1/8] ext4 crypto: fix memory leaks in ext4_encrypted

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150531142601.GA11642@thunk.org>
Date:	Sun, 31 May 2015 10:26:01 -0400
From:	Theodore Ts'o <tytso@....edu>
To:	Jaegeuk Kim <jaegeuk@...nel.org>
Cc:	Ext4 Developers List <linux-ext4@...r.kernel.org>,
	mhalcrow@...gle.com
Subject: Re: [PATCH 1/8] ext4 crypto: fix memory leaks in
 ext4_encrypted_zeroout

On Fri, May 29, 2015 at 11:08:34AM -0700, Jaegeuk Kim wrote:
> Not sure why __GFP_WAIT is defined here.
> Even GFP_NOFS has __GFP_WAIT.
> 
> #define GFP_NOFS	(__GFP_WAIT | __GFP_IO)
> 
> IMO, __GFP_NOFAIL should be used here?
> Otherwise, we seem to handle ENOMEM instead.

You're right, thanks for pointing that out.  I've since changed the
patch so that we just return ENOMEM, and then made sure that we could
correctly handle ENOMEM.

On a different front, I've been thinking more about your proposal to
swap alloc_pages() and mempool_alloc(), since you were apparently
seeing OOM killers with a sufficiently aggressive fio workload on a
memory constrained device.

I took a closer look at how mempool_alloc() works, and in fact it will
try alloc_pages() first; but with GFP flags which causes it to not try
very hard.  (__GFP_NOMEMALLOC, __GFP_NORETRY, __GFP_NOWARN,
!__GFP_WAIT, !__GFP_IO).

If this fails, it tries to use the memory pool, and then assuming the
original request includes __GFP_WAIT, it will retry the alloc_pages()
using the original gfp mask, if this *still* fails, it will wait for a
page to be released to the mempool.  As a result, mempool_alloc() will
never fail when called with __GFP_WAIT, which means my original
concerns over mempool_alloc() failing were misplaced.

This has lead me to the following thoughts/conclusions:

1) We can just drop the alloc_page() call from alloc_bounce_page(),
since mempool_alloc() will call alloc_page() first.  This also allows
us to remove all of the complexity relating to the
EXT4_BOUNCE_PAGE_REQUIRES_FREE_ENCRYPT_FL flag.

2) Since mempool_alloc() will end up calling alloc_pages() --- and in
fact will try to alloc_pages() *twice* --- once with a "don't try
hard", and then a second time the way we would have alwyas called it
--- in practice this won't reduce the number of calls to allocate and
free pages assuming that there is enough memory in the system.

3) I don't think increasing num_prealloc_crypto_pages() to BIO_MAX is
the best idea; (a) it sequesters a large amount of memory, and (b)
even 256 pages is guaranteed to be enough.  On a Samsung 840 EVO (for
example), you can have up to 32767k in a single request, and up to 31
requests in its NCQ.  So in the worst case, you can have close to a
gigabyte of outstanding writes, and thus with a sufficiently evil set
of fio options, we could end up needing that many bounce pages.  And
remember, the mempool is a shared resource; if we have multiple
devices with encrypted file systems, the memory pressure could be even
greater.

4) I need to run the experiment, but what might make sense is to just
call mempool_alloc() with GFP_NOWAIT.  This will allow us to use
memory if it is available, but if not, we will simply retry the page
for writeback.  The mempool is there simply to provide a bit of extra
spare capacity so that we send out reasonably sized I/O's when under
very high memory pressure.  We can make the size of the mempool a
tunable which is dynamically adjustable using a sysfs file, and we can
see what seems to be the best tradeoff between having a fixed memory
allocation and performance under high memory pressure and heavy write
loads.

What do you think?  Does that make sense?  If so, you might want to
try a similar experiment; try removing the alloc_page() fallback
altogether, try calling mempool_alloc() with GFP_NOWAIT, and then try
different settings for num_prealloc_crypto_pages.

						- Ted

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html