[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1305245503.21534.2090.camel@debian>
Date: Fri, 13 May 2011 08:11:43 +0800
From: "Alex,Shi" <alex.shi@...el.com>
To: Jens Axboe <jaxboe@...ionio.com>
Cc: "James.Bottomley@...senpartnership.com"
<James.Bottomley@...senpartnership.com>,
"Li, Shaohua" <shaohua.li@...el.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Perfromance drop on SCSI hard disk
On Fri, 2011-05-13 at 04:29 +0800, Jens Axboe wrote:
> On 2011-05-10 08:40, Alex,Shi wrote:
> > commit c21e6beba8835d09bb80e34961 removed the REENTER flag and changed
> > scsi_run_queue() to punt all requests on starved_list devices to
> > kblockd. Yes, like Jens mentioned, the performance on slow SCSI disk was
> > hurt here. :) (Intel SSD isn't effected here)
> >
> > In our testing on 12 SAS disk JBD, the fio write with sync ioengine drop
> > about 30~40% throughput, fio randread/randwrite with aio ioengine drop
> > about 20%/50% throughput. and fio mmap testing was hurt also.
> >
> > With the following debug patch, the performance can be totally recovered
> > in our testing. But without REENTER flag here, in some corner case, like
> > a device is keeping blocked and then unblocked repeatedly,
> > __blk_run_queue() may recursively call scsi_run_queue() and then cause
> > kernel stack overflow.
> > I don't know details of block device driver, just wondering why on scsi
> > need the REENTER flag here. :)
>
> This is a problem and we should do something about it for 2.6.39. I knew
> that there would be cases where the async offload would cause a
> performance degredation, but not to the extent that you are reporting.
> Must be hitting the pathological case.
>
> I can think of two scenarios where it could potentially recurse:
>
> - request_fn enter, end up requeuing IO. Run queue at the end. Rinse,
> repeat.
> - Running starved list from request_fn, two (or more) devices could
> alternately recurse.
>
> The first case should be fairly easy to handle. The second one is
> already handled by the local list splice.
>
> Looking at the code, is this a real scenario? Only potential recurse I
> see is:
>
> scsi_request_fn()
> scsi_dispatch_cmd()
> scsi_queue_insert()
> __scsi_queue_insert()
> scsi_run_queue()
>
> Why are we even re-running the queue immediately on a BUSY condition?
> Should only be needed if we have zero pending commands from this
> particular queue, and for that particular case async run is just fine
> since it's a rare condition (or performance would suck already).
Yeah, this is correct way to fix it. Let me try the patch on our
machine.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists