[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1305247704.2373.32.camel@sli10-conroe>
Date: Fri, 13 May 2011 08:48:24 +0800
From: Shaohua Li <shaohua.li@...el.com>
To: Jens Axboe <jaxboe@...ionio.com>
Cc: "Shi, Alex" <alex.shi@...el.com>,
"James.Bottomley@...senpartnership.com"
<James.Bottomley@...senpartnership.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: Perfromance drop on SCSI hard disk
On Fri, 2011-05-13 at 04:29 +0800, Jens Axboe wrote:
> On 2011-05-10 08:40, Alex,Shi wrote:
> > commit c21e6beba8835d09bb80e34961 removed the REENTER flag and changed
> > scsi_run_queue() to punt all requests on starved_list devices to
> > kblockd. Yes, like Jens mentioned, the performance on slow SCSI disk was
> > hurt here. :) (Intel SSD isn't effected here)
> >
> > In our testing on 12 SAS disk JBD, the fio write with sync ioengine drop
> > about 30~40% throughput, fio randread/randwrite with aio ioengine drop
> > about 20%/50% throughput. and fio mmap testing was hurt also.
> >
> > With the following debug patch, the performance can be totally recovered
> > in our testing. But without REENTER flag here, in some corner case, like
> > a device is keeping blocked and then unblocked repeatedly,
> > __blk_run_queue() may recursively call scsi_run_queue() and then cause
> > kernel stack overflow.
> > I don't know details of block device driver, just wondering why on scsi
> > need the REENTER flag here. :)
>
> This is a problem and we should do something about it for 2.6.39. I knew
> that there would be cases where the async offload would cause a
> performance degredation, but not to the extent that you are reporting.
> Must be hitting the pathological case.
async offload is expected to increase context switch. But the real root
cause of the issue is fairness issue. Please see my previous email.
> I can think of two scenarios where it could potentially recurse:
>
> - request_fn enter, end up requeuing IO. Run queue at the end. Rinse,
> repeat.
> - Running starved list from request_fn, two (or more) devices could
> alternately recurse.
>
> The first case should be fairly easy to handle. The second one is
> already handled by the local list splice.
this isn't true to me. if you unlock host_lock in scsi_run_queue, other
cpus can add sdev to the starved device list again. In the recursive
call of scsi_run_queue, the starved device list might not be empty. So
the local list_splice doesn't help.
>
> Looking at the code, is this a real scenario? Only potential recurse I
> see is:
>
> scsi_request_fn()
> scsi_dispatch_cmd()
> scsi_queue_insert()
> __scsi_queue_insert()
> scsi_run_queue()
>
> Why are we even re-running the queue immediately on a BUSY condition?
> Should only be needed if we have zero pending commands from this
> particular queue, and for that particular case async run is just fine
> since it's a rare condition (or performance would suck already).
>
> And it should only really be needed for the 'q' being passed in, not the
> others. Something like the below.
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0bac91e..0b01c1f 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -74,7 +74,7 @@ struct kmem_cache *scsi_sdb_cache;
> */
> #define SCSI_QUEUE_DELAY 3
>
> -static void scsi_run_queue(struct request_queue *q);
> +static void scsi_run_queue_async(struct request_queue *q);
>
> /*
> * Function: scsi_unprep_request()
> @@ -161,7 +161,7 @@ static int __scsi_queue_insert(struct scsi_cmnd *cmd, int reason, int unbusy)
> blk_requeue_request(q, cmd->request);
> spin_unlock_irqrestore(q->queue_lock, flags);
>
> - scsi_run_queue(q);
> + scsi_run_queue_async(q);
so you could still recursivly run into starved list. Do you want to put
the whole __scsi_run_queue into workqueue?
Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists