[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LRH.2.03.1405291601150.25112@AMR>
Date: Thu, 29 May 2014 16:34:00 -0600 (MDT)
From: Keith Busch <keith.busch@...el.com>
To: Jens Axboe <axboe@...nel.dk>
cc: Keith Busch <keith.busch@...el.com>,
Matias Bjørling <m@...rling.me>,
willy@...ux.intel.com, sbradshaw@...ron.com,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3] NVMe: basic conversion to blk-mq
On Thu, 29 May 2014, Jens Axboe wrote:
> On 2014-05-28 21:07, Keith Busch wrote:
> Barring any bugs in the code, then yes, this should work. On the scsi-mq
> side, extensive error injection and pulling has been done, and it seems to
> hold up fine now. The ioctl path would need to be audited.
It's a little different than scsi. This would be like pulling the drive and
the HBA. In any case, it still looks like it works as expected.
>>> +static void req_completion(struct nvme_queue *nvmeq, void *ctx,
>>> struct nvme_completion *cqe)
>>> {
>>> struct nvme_iod *iod = ctx;
>>> - struct bio *bio = iod->private;
>>> + struct request *req = iod->private;
>>> +
>>> u16 status = le16_to_cpup(&cqe->status) >> 1;
>>>
>>> - if (unlikely(status)) {
>>> - if (!(status & NVME_SC_DNR ||
>>> - bio->bi_rw & REQ_FAILFAST_MASK) &&
>>> - (jiffies - iod->start_time) < IOD_TIMEOUT) {
>>> - if (!waitqueue_active(&nvmeq->sq_full))
>>> - add_wait_queue(&nvmeq->sq_full,
>>> - &nvmeq->sq_cong_wait);
>>> - list_add_tail(&iod->node, &nvmeq->iod_bio);
>>> - wake_up(&nvmeq->sq_full);
>>> - return;
>>> - }
>>> - }
>>
>> Is blk-mq going to retry intermittently failed commands for me? It
>> doesn't look like it will.
>
> Not sure what kind of behavior you are looking for here. If you can expand on
> the above a bit, I'll gladly help sort it out. Only the driver really knows
> if a particular request should be failed hard or retried. So you'd probably
> have to track retry counts in the request and reinsert/end as appropriate.
Some vendor's drives return a failure status for a command but fully
expect a retry to be successul. It'd be addressing this bug:
bugzilla.kernel.org/show_bug.cgi?id=61061
The code being removed at the top of this function in the latest patch was
taking care of the requeuing. I wasn't sure how many retries would be
necessary, so I capped it at a total time instead of total tries. I'm told
from 3rd parties that what we're doing is successful in their tests.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists