linux-kernel - Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150918003735.GR3902@dastard>
Date:	Fri, 18 Sep 2015 10:37:35 +1000
From:	Dave Chinner <david@...morbit.com>
To:	Chris Mason <clm@...com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Jan Kara <jack@...e.cz>, Josef Bacik <jbacik@...com>,
	LKML <linux-kernel@...r.kernel.org>,
	linux-fsdevel <linux-fsdevel@...r.kernel.org>,
	Neil Brown <neilb@...e.de>, Christoph Hellwig <hch@....de>,
	Tejun Heo <tj@...nel.org>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()

On Thu, Sep 17, 2015 at 07:56:47PM -0400, Chris Mason wrote:
> On Thu, Sep 17, 2015 at 04:08:19PM -0700, Linus Torvalds wrote:
> > On Thu, Sep 17, 2015 at 3:42 PM, Chris Mason <clm@...com> wrote:
> > >
> > > Playing around with the plug a little, most of the unplugs are coming
> > > from the cond_resched_lock().  Not really sure why we are doing the
> > > cond_resched() there, we should be doing it before we retake the lock
> > > instead.
> > >
> > > This patch takes my box (with dirty thresholds at 1.5GB/3GB) from 195K
> > > files/sec up to 213K.  Average IO size is the same as 4.3-rc1.
> > 
> > Ok, so at least for you, part of the problem really ends up being that
> > there's a mix of the "synchronous" unplugging (by the actual explicit
> > "blk_finish_plug(&plug);") and the writeback that is handed off to
> > kblockd_workqueue.
> >
> > I'm not seeing why that should be an issue. Sure, there's some CPU
> > overhead to context switching, but I don't see that it should be that
> > big of a deal.

It may well change the dispatch order of enough IOs for it to be
significant on an IO bound device.

> > I wonder if there is something more serious wrong with the kblockd_workqueue.
> 
> I'm driving the box pretty hard, it's right on the line between CPU
> bound and IO bound.  So I've got 32 fs_mark processes banging away and
> 32 CPUs (16 really, with hyperthreading).

I'm only using 8 threads right now, so I have ~6-7 idle CPUs on this
workload. Hence if it's CPU load related, I probably won't see any
change in behaviour.

> They are popping in and out of balance_dirty_pages() so I have high CPU
> utilization alternating with high IO wait times.  There no reads at all,
> so all of these waits are for buffered writes.
> 
> People in balance_dirty_pages are indirectly waiting on the unplug, so
> maybe the context switch overhead on a loaded box is enough to explain
> it.  We've definitely gotten more than 9% by inlining small synchronous
> items in btrfs in the past, but those were more explicitly synchronous.
> 
> I know it's painfully hand wavy.  I don't see any other users of the
> kblockd workqueues, and the perf profiles don't jump out at me.  I'll
> feel better about the patch if Dave confirms any gains.

In outright performance on my test machine, the difference in
files/s is noise. However, the consistency looks to be substantially
improved and the context switch rate is now running at under
3,000/sec. Numbers, including the std deviation of the files/s
number output during the fsmark run (averaged across 3 separate
benahmark runs):

			files/s		std-dev		wall time
4.3-rc1-noplug		34400		2.0e04		5m25s
4.3-rc1			56600		2.3e04		3m23s
4.3-rc1-flush		56079		1.4e04		3m14s

std-dev is well down, and the improvement in wall time is large
enough to be significant.

Looks good to me.

Cheers,

Dave.
-- 
Dave Chinner
david@...morbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/