linux-kernel - RE: [PATCH] nvme: replace meaningless judgement by checking whether req is null

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <be133ab59334475dacd4a52a2834fe71@h3c.com>
Date:   Mon, 21 Sep 2020 15:49:09 +0000
From:   Tianxianting <tian.xianting@....com>
To:     Keith Busch <kbusch@...nel.org>
CC:     "axboe@...com" <axboe@...com>, "hch@....de" <hch@....de>,
        "sagi@...mberg.me" <sagi@...mberg.me>,
        "linux-nvme@...ts.infradead.org" <linux-nvme@...ts.infradead.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH] nvme: replace meaningless judgement by checking whether
 req is null

HI Keith,
Thanks for your comments,
I will submit a new patch of version 2 for the further reviewing,  v2 patch will contains:
1, retain existing judgement and dev_warn;
2, add the check whether req is null(already did in this patch)
3, simplify and make the changelog succinct according to you said " This is what I'm thinking:".
Is it right?
Should I retain the nvme_irq crash log in changelog, mention the difference between nvmeq->q_depth and tagset queue_depth? 

Thanks

-----Original Message-----
From: Keith Busch [mailto:kbusch@...nel.org] 
Sent: Monday, September 21, 2020 11:08 PM
To: tianxianting (RD) <tian.xianting@....com>
Cc: axboe@...com; hch@....de; sagi@...mberg.me; linux-nvme@...ts.infradead.org; linux-kernel@...r.kernel.org
Subject: Re: [PATCH] nvme: replace meaningless judgement by checking whether req is null

On Mon, Sep 21, 2020 at 10:10:52AM +0800, Xianting Tian wrote:
> @@ -940,13 +940,6 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
>  	struct nvme_completion *cqe = &nvmeq->cqes[idx];
>  	struct request *req;
>  
> -	if (unlikely(cqe->command_id >= nvmeq->q_depth)) {
> -		dev_warn(nvmeq->dev->ctrl.device,
> -			"invalid id %d completed on queue %d\n",
> -			cqe->command_id, le16_to_cpu(cqe->sq_id));
> -		return;
> -	}
> -
>  	/*
>  	 * AEN requests are special as they don't time out and can
>  	 * survive any kind of queue freeze and often don't respond to @@ 
> -960,6 +953,13 @@ static inline void nvme_handle_cqe(struct nvme_queue *nvmeq, u16 idx)
>  	}
>  
>  	req = blk_mq_tag_to_rq(nvme_queue_tagset(nvmeq), cqe->command_id);
> +	if (unlikely(!req)) {
> +		dev_warn(nvmeq->dev->ctrl.device,
> +			"req is null for tag %d completed on queue %d\n",
> +			cqe->command_id, le16_to_cpu(cqe->sq_id));
> +		return;
> +	}

This is making sense now, though I think we should retain the existing
dev_warn() since it's still accurate and provides continuity for people who are used to looking for these sorts of messages.

Your changelog is a bit much though. I think we can say it a bit more succinctly. This is what I'm thinking:

  The driver registers interrupts for queues before initializing the
  tagset because it uses the number of successful request_irq() calls
  to configure the tagset parameters. This allows a race condition with
  the current tag validity check if the controller happens to produce
  an interrupt with a corrupted CQE before the tagset is initialized.

  Replace the driver's indirect tag check with the one already provided
  by the block layer.