[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150916151621.GA8624@ret.masoncoding.com>
Date: Wed, 16 Sep 2015 11:16:21 -0400
From: Chris Mason <clm@...com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
CC: Dave Chinner <david@...morbit.com>, Josef Bacik <jbacik@...com>,
LKML <linux-kernel@...r.kernel.org>,
linux-fsdevel <linux-fsdevel@...r.kernel.org>,
Neil Brown <neilb@...e.de>, Jan Kara <jack@...e.cz>,
Christoph Hellwig <hch@....de>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()
On Mon, Sep 14, 2015 at 01:06:25PM -0700, Linus Torvalds wrote:
> On Sun, Sep 13, 2015 at 4:12 PM, Dave Chinner <david@...morbit.com> wrote:
> >
> > Really need to run these numbers on slower disks where block layer
> > merging makes a difference to performance.
>
> Yeah. We've seen plugging and io schedulers not make much difference
> for high-performance flash (although I think the people who argued
> that noop should generally be used for non-rotating media were wrong,
> I think - the elevator ends up still being critical to merging, and
> while merging isn't a life-or-death situation, it tends to still
> help).
Yeah, my big concern was that holding the plug longer would result in
lower overall perf because we weren't keeping the flash busy. So I
started with the flash boxes to make sure we weren't regressing past 4.2
levels at least.
I'm still worried about that, but this probably isn't the right
benchmark to show it. And if it's really a problem, it'll happen
everywhere we plug and not just here.
>
> For rotating rust with nasty seek times, the plugging is likely to
> make the biggest difference.
For rotating storage, I grabbed a big box and did the fs_mark run
against 8 spindles. These are all behind a megaraid card as jbods, so I
flipped the card's cache to write-through.
I changed around the run a bit, making enough files for fs_mark to run
for ~10 minutes, and I took out the sync. I ran only xfs to cut down on
the iterations, and after the fs_mark run, I did short 30 second run with
blktrace in the background to capture the io sizes.
v4.2: 178K files/sec
Chinner: 192K files/sec
Mason: 192K files/sec
Linus: 193K files/sec
I added support to iowatcher to graph IO size, and attached the graph.
Short version, Linus' patch still gives bigger IOs and similar perf to
Dave's original. I should have done the blktrace runs for 60 seconds
instead of 30, I suspect that would even out the average sizes between
the three patches.
-chris
Download attachment "fs_mark.png" of type "application/octet-stream" (34363 bytes)
Powered by blists - more mailing lists