lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 25 Jan 2017 11:46:05 +0100
From:   Michal Hocko <mhocko@...nel.org>
To:     Christoph Hellwig <hch@....de>
Cc:     Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>, mgorman@...e.de,
        viro@...IV.linux.org.uk, linux-mm@...ck.org, hannes@...xchg.org,
        linux-kernel@...r.kernel.org
Subject: Re: [RFC PATCH 1/2] mm, vmscan: account the number of isolated pages
 per zone

On Wed 25-01-17 11:19:57, Christoph Hellwig wrote:
> On Wed, Jan 25, 2017 at 11:15:17AM +0100, Michal Hocko wrote:
> > I think we are missing a check for fatal_signal_pending in
> > iomap_file_buffered_write. This means that an oom victim can consume the
> > full memory reserves. What do you think about the following? I haven't
> > tested this but it mimics generic_perform_write so I guess it should
> > work.
> 
> Hi Michal,
> 
> this looks reasonable to me.  But we have a few more such loops,
> maybe it makes sense to move the check into iomap_apply?

I wasn't sure about the expected semantic of iomap_apply but now that
I've actually checked all the callers I believe all of them should be
able to handle EINTR just fine. Well iomap_file_dirty, iomap_zero_range,
iomap_fiemap and iomap_page_mkwriteseem do not follow the standard
pattern to return the number of written pages or an error but it rather
propagates the error out. From my limited understanding of those code
paths that should just be ok. I was not all that sure about iomap_dio_rw
that is just too convoluted for me. If that one is OK as well then
the following patch should be indeed better.
---
>From d99c9d4115bed69a5d71281f59c190b0b26627cf Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@...e.com>
Date: Wed, 25 Jan 2017 11:06:37 +0100
Subject: [PATCH] fs: break out of iomap_file_buffered_write on fatal signals

Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion. He has tracked
this down to the following path
	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier. But iomap_file_buffered_write and other callers
of iomap_apply loop to complete the full request. We need to check for
fatal signals and back off with a short write instead.

Fixes: 68a9f5e7007c ("xfs: implement iomap based buffered write path")
Cc: stable # 4.8+
Reported-by: Tetsuo Handa <penguin-kernel@...ove.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@...e.com>
---
 fs/iomap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/iomap.c b/fs/iomap.c
index e57b90b5ff37..a58190f7a3e4 100644
--- a/fs/iomap.c
+++ b/fs/iomap.c
@@ -46,6 +46,9 @@ iomap_apply(struct inode *inode, loff_t pos, loff_t length, unsigned flags,
 	struct iomap iomap = { 0 };
 	loff_t written = 0, ret;
 
+	if (fatal_signal_pending(current))
+		return -EINTR;
+
 	/*
 	 * Need to map a range from start position for length bytes. This can
 	 * span multiple pages - it is only guaranteed to return a range of a
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ