linux-kernel - Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150921092429.GB9028@quack.suse.cz>
Date:	Mon, 21 Sep 2015 11:24:29 +0200
From:	Jan Kara <jack@...e.cz>
To:	Dave Chinner <david@...morbit.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jens Axboe <jaxboe@...ionio.com>, Chris Mason <clm@...com>,
	Jan Kara <jack@...e.cz>, Josef Bacik <jbacik@...com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Neil Brown <neilb@...e.de>, Christoph Hellwig <hch@....de>,
	Tejun Heo <tj@...nel.org>, linux-mm@...ck.org,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()

On Sat 19-09-15 08:17:14, Dave Chinner wrote:
> On Thu, Sep 17, 2015 at 11:04:03PM -0700, Linus Torvalds wrote:
> > On Thu, Sep 17, 2015 at 10:40 PM, Dave Chinner <david@...morbit.com> wrote:
> > > PS: just hit another "did this just get broken in 4.3-rc1" issue - I
> > > can't run blktrace while there's a IO load because:
> > >
> > > $ sudo blktrace -d /dev/vdc
> > > BLKTRACESETUP(2) /dev/vdc failed: 5/Input/output error
> > > Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
> > > ....
> > >
> > > [  641.424618] blktrace: page allocation failure: order:5, mode:0x2040d0
> > > [  641.438933]  [<ffffffff811c1569>] kmem_cache_alloc_trace+0x129/0x400
> > > [  641.440240]  [<ffffffff811424f8>] relay_open+0x68/0x2c0
> > > [  641.441299]  [<ffffffff8115deb1>] do_blk_trace_setup+0x191/0x2d0
> > >
> > > gdb) l *(relay_open+0x68)
> > > 0xffffffff811424f8 is in relay_open (kernel/relay.c:582).
> > > 577                     return NULL;
> > > 578             if (subbuf_size > UINT_MAX / n_subbufs)
> > > 579                     return NULL;
> > > 580
> > > 581             chan = kzalloc(sizeof(struct rchan), GFP_KERNEL);
> > > 582             if (!chan)
> > > 583                     return NULL;
> > > 584
> > > 585             chan->version = RELAYFS_CHANNEL_VERSION;
> > > 586             chan->n_subbufs = n_subbufs;
> > >
> > > and struct rchan has a member struct rchan_buf *buf[NR_CPUS];
> > > and CONFIG_NR_CPUS=8192, hence the attempt at an order 5 allocation
> > > that fails here....
> > 
> > Hm. Have you always had MAX_SMP (and the NR_CPU==8192 that it causes)?
> > From a quick check, none of this code seems to be new.
> 
> Yes, I always build MAX_SMP kernels for testing, because XFS is
> often used on such machines and so I want to find issues exactly
> like this in my testing rather than on customer machines... :/
> 
> > That said, having that
> > 
> >         struct rchan_buf *buf[NR_CPUS];
> > 
> > in "struct rchan" really is something we should fix. We really should
> > strive to not allocate things by CONFIG_NR_CPU's, but by the actual
> > real CPU count.
> 
> *nod*. But it doesn't fix the problem of the memory allocation
> failing when there's still gigabytes of immediately reclaimable
> memory available in the page cache. If this is failing under page
> cache memory pressure, then we're going to be doing an awful lot
> more falling back to vmalloc in the filesystem code where large
> allocations like this are done e.g. extended attribute buffers are
> order-5, and used a lot when doing things like backups which tend to
> also produce significant page cache memory pressure.
> 
> Hence I'm tending towards there being a memory reclaim behaviour
> regression, not so much worrying about whether this specific
> allocation is optimal or not.

Yup, looks like a regression in reclaim. Added linux-mm folks to CC.

								Honza
-- 
Jan Kara <jack@...e.com>
SUSE Labs, CR
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/