linux-kernel - Re: [PATCH V3] NVMe: basic conversion to blk-mq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <5387BD73.9030607@kernel.dk>
Date:	Thu, 29 May 2014 17:06:27 -0600
From:	Jens Axboe <axboe@...nel.dk>
To:	Keith Busch <keith.busch@...el.com>
CC:	Matias Bjørling <m@...rling.me>,
	willy@...ux.intel.com, sbradshaw@...ron.com,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH V3] NVMe: basic conversion to blk-mq

On 05/29/2014 04:34 PM, Keith Busch wrote:
> On Thu, 29 May 2014, Jens Axboe wrote:
>> On 2014-05-28 21:07, Keith Busch wrote:
>> Barring any bugs in the code, then yes, this should work. On the
>> scsi-mq side, extensive error injection and pulling has been done, and
>> it seems to hold up fine now. The ioctl path would need to be audited.
> 
> It's a little different than scsi. This would be like pulling the drive and
> the HBA. In any case, it still looks like it works as expected.

That is true, but block/blk-mq generally only cares about whether the
device goes or not. So it should be pretty much the same for this case.

>>>> +static void req_completion(struct nvme_queue *nvmeq, void *ctx,
>>>>                         struct nvme_completion *cqe)
>>>> {
>>>>     struct nvme_iod *iod = ctx;
>>>> -    struct bio *bio = iod->private;
>>>> +    struct request *req = iod->private;
>>>> +
>>>>     u16 status = le16_to_cpup(&cqe->status) >> 1;
>>>>
>>>> -    if (unlikely(status)) {
>>>> -        if (!(status & NVME_SC_DNR ||
>>>> -                bio->bi_rw & REQ_FAILFAST_MASK) &&
>>>> -                (jiffies - iod->start_time) < IOD_TIMEOUT) {
>>>> -            if (!waitqueue_active(&nvmeq->sq_full))
>>>> -                add_wait_queue(&nvmeq->sq_full,
>>>> -                            &nvmeq->sq_cong_wait);
>>>> -            list_add_tail(&iod->node, &nvmeq->iod_bio);
>>>> -            wake_up(&nvmeq->sq_full);
>>>> -            return;
>>>> -        }
>>>> -    }
>>>
>>> Is blk-mq going to retry intermittently failed commands for me? It
>>> doesn't look like it will.
>>
>> Not sure what kind of behavior you are looking for here. If you can
>> expand on the above a bit, I'll gladly help sort it out. Only the
>> driver really knows if a particular request should be failed hard or
>> retried. So you'd probably have to track retry counts in the request
>> and reinsert/end as appropriate.
> 
> Some vendor's drives return a failure status for a command but fully
> expect a retry to be successul. It'd be addressing this bug:
> 
> bugzilla.kernel.org/show_bug.cgi?id=61061
> 
> The code being removed at the top of this function in the latest patch was
> taking care of the requeuing. I wasn't sure how many retries would be
> necessary, so I capped it at a total time instead of total tries. I'm told
> from 3rd parties that what we're doing is successful in their tests.

Ah I see, yes that code apparently got axed. The attached patch brings
it back. Totally untested, I'll try and synthetically hit it to ensure
that it does work. Note that it currently does unmap and iod free, so
the request comes back pristine. We could preserve that if we really
wanted to, I'm guessing it's not a big deal.

-- 
Jens Axboe


View attachment "nvme-retry.patch" of type "text/x-patch" (605 bytes)