linux-kernel - Re: Bad SSD performance with recent kernels

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120130222643.GH30245@redhat.com>
Date:	Mon, 30 Jan 2012 17:26:43 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Eric Dumazet <eric.dumazet@...il.com>
Cc:	Wu Fengguang <wfg@...ux.intel.com>,
	Shaohua Li <shaohua.li@...el.com>,
	Herbert Poetzl <herbert@...hfloor.at>,
	Andrew Morton <akpm@...ux-foundation.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>, Tejun Heo <tj@...nel.org>
Subject: Re: Bad SSD performance with recent kernels

On Mon, Jan 30, 2012 at 03:51:49PM +0100, Eric Dumazet wrote:
> Le lundi 30 janvier 2012 à 22:28 +0800, Wu Fengguang a écrit :
> > On Mon, Jan 30, 2012 at 06:31:34PM +0800, Li, Shaohua wrote:
> > 
> > > Looks the 2.6.39 block plug introduces some latency here. deleting
> > > blk_start_plug/blk_finish_plug in generic_file_aio_read seems
> > > workaround
> > > the issue. The plug seems not good for sequential IO, because readahead
> > > code already has plug and has fine grained control.
> > 
> > Why not remove the generic_file_aio_read() plug completely? It
> > actually prevents unplugging immediately after the readahead IO is
> > submitted and in turn stalls the IO pipeline as showed by Eric's
> > blktrace data.
> > 
> > Eric, will you test this patch? Thank you.

Can you please run the blktrace again with this patch applied. I am curious
to see how does traffic pattern look like now.

In your previous trace, there were so many small 8 sector requests which
were merged into 512 sector requests before dispatching to disk. (I am
not sure why those requests are not bigger. Shouldn't readahead logic
submit a bigger request?) Now with plug/unplug logic removed, I am assuming
we should be doing less merging and dispatching more smaller requests. May be
that is helping and cutting down on disk idling time.

In previous logs, 512 sector request seems to be taking around 1ms to
complete after dispatch. In between requests disk seems to be idle
for around .5 to .6 ms. Out of this .3 ms seems to be gone in just
coming up with new request after completion of previous one and another
.3ms seems to be consumed in merging the smaller IOs. So if we don't wait
for merging, it should keep disk busier for .3ms more which is 30% of time
it takes to complete 512 sector request. So theoritically it can give
30% boost for this workload. (Assuming request size will not impact the
disk throughput very severely).

Anyway, some blktrace data will shed some light..

Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/