linux-kernel - Re: fio mmap randread 64k more than 40% regression with 2.6.33-rc1

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 19 Jan 2010 16:40:46 -0500
From:	Vivek Goyal <vgoyal@...hat.com>
To:	Corrado Zoccolo <czoccolo@...il.com>
Cc:	"jmoyer@...hat.com" <jmoyer@...hat.com>,
	"Zhang, Yanmin" <yanmin_zhang@...ux.intel.com>,
	Jens Axboe <jens.axboe@...cle.com>,
	Shaohua Li <shaohua.li@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: fio mmap randread 64k more than 40% regression with 2.6.33-rc1

On Tue, Jan 19, 2010 at 09:10:33PM +0100, Corrado Zoccolo wrote:
> On Mon, Jan 18, 2010 at 4:06 AM, Zhang, Yanmin
> <yanmin_zhang@...ux.intel.com> wrote:
> > On Sat, 2010-01-16 at 17:27 +0100, Corrado Zoccolo wrote:
> >> Hi Yanmin
> >> On Mon, Jan 4, 2010 at 7:28 PM, Corrado Zoccolo <czoccolo@...il.com> wrote:
> >> > Hi Yanmin,
> >> >> When low_latency=1, we get the biggest number with kernel 2.6.32.
> >> >> Comparing with low_latency=0's result, the prior one is about 4% better.
> >> > Ok, so 2.6.33 + corrado (with low_latency =0) is comparable with
> >> > fastest 2.6.32, so we can consider the first part of the problem
> >> > solved.
> >> >
> >> I think we can return now to your full script with queue merging.
> >> I'm wondering if (in arm_slice_timer):
> >> -       if (cfqq->dispatched)
> >> +      if (cfqq->dispatched || (cfqq->new_cfqq && rq_in_driver(cfqd)))
> >>                return;
> >> gives the same improvement you were experiencing just reverting to rq_in_driver.
> > I did a quick testing against 2.6.33-rc1. With the new method, fio mmap randread 46k
> > has about 20% improvement. With just checking rq_in_driver(cfqd), it has
> > about 33% improvement.
> >
> Jeff, do you have an idea why in arm_slice_timer, checking
> rq_in_driver instead of cfqq->dispatched gives so much improvement in
> presence of queue merging, while it doesn't have noticeable effect
> when there are no merges?

Performance improvement because of replacing cfqq->dispatched with
rq_in_driver() is really strange. This will mean we will do even lesser
idling on the cfqq. That means faster cfqq switching and that should mean more
seeks (for this test case) and reduce throughput. This is just opposite to your approach of treating a random read mmap queue as sync where we will idle on
the queue.

Thanks
Vivek

> 
> Thanks,
> Corrado
> 
> >
> >>
> >> We saw that cfqq->dispatched worked fine when there was no queue
> >> merging happening, so it must be something concerning merging,
> >> probably dispatched is not accurate when we set up for a merging, but
> >> the merging was not yet done.
> >
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/