[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150912230027.GE4150@ret.masoncoding.com>
Date: Sat, 12 Sep 2015 19:00:27 -0400
From: Chris Mason <clm@...com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Josef Bacik <jbacik@...com>, LKML <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Dave Chinner <david@...morbit.com>, Neil Brown <neilb@...e.de>,
Jan Kara <jack@...e.cz>, Christoph Hellwig <hch@....de>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()
On Fri, Sep 11, 2015 at 04:36:39PM -0700, Linus Torvalds wrote:
> On Fri, Sep 11, 2015 at 4:16 PM, Chris Mason <clm@...com> wrote:
> >
> > For 4.3 timeframes, what runs do you want to see numbers for:
> >
> > 1) revert
> > 2) my hack
> > 3) plug over multiple sbs (on different devices)
> > 4) ?
>
> Just 2 or 3.
>
> I don't think the plain revert is all that interesting, and I think
> the "anything else" is far too late for this merge window.
I did the plain revert as well, just to have a baseline. This box is a
little different from Dave's. Bare metal two socket box (E5-2660 v2 @
2.20Ghz) with 144GB of ram. I have two pcie flash devices, one nvme and
one fusionio, and I put a one FS on each device (two mounts total).
The test created 1.6M files, 4K each. I used Dave's fs_mark command
line, spread out over 16 directories from each mounted filesystem. In
btrfs they are spread over subvolumes to cut down lock contention.
I need to change around the dirty ratios more to smooth out the IO, and
I had trouble with both XFS and btrfs getting runs that were not CPU
bound. I included the time to run sync at the end of the run because
the results were not very consistent without it.
The XFS runs generally had one CPU pegged at 100%, and I think this is
throwing off the results. On Monday I'll redo them with two (four?)
filesystems per flash device, hopefully that'll break things up.
The btrfs runs generally had all the CPUs pegged at 100%. I switched to
mount -o nodatasum and squeezed out an extra 50K files/sec at much lower
CPU utilization.
wall time fs_mark files/sec bytes written/sec
XFS:
baseline v4.2: 5m6s 118,578 272MB/s
Dave's patch: 4m46s 151,421 294MB/s
my hack: 5m5s 150,714 275MB/s
Linus plug: 5m15s 147,735 266MB/s
Btrfs (nodatasum):
baseline v4.2: 4m39s 242,643 313MB/s
Dave's patch: 3m46s 252,452 389MB/s
my hack: 3m48s 257,924 379MB/s
Linus plug: 3m58s 247,528 369MB/s
Bottom line, not as conclusive as I'd like. My hack doesn't seem to
hurt, but FS internals are consuming enough CPU that this lock just
isn't showing up.
Linus' plug patch is consistently slower, and I don't have a great
explanation. My guesses: not keeping the flash pipelines full, or the
imbalance between the different speed flash is averaging the overall
result down, or its my kblockd vs explicit unplug handwaving from
yesterday.
So, next step is either more runs on flash or grab a box with a bunch of
spindles. I'd rather do the spindle runs, I agree with Dave that his
patch should help much more on actual drives.
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists