linux-kernel - Re: 6.2 nvme-pci: something wrong

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6715d48b-7670-527-38ab-42f865fe3c10@google.com>
Date:   Sun, 25 Dec 2022 00:33:13 -0800 (PST)
From:   Hugh Dickins <hughd@...gle.com>
To:     Christoph Hellwig <hch@...radead.org>
cc:     Keith Busch <kbusch@...nel.org>, Hugh Dickins <hughd@...gle.com>,
        Jens Axboe <axboe@...nel.dk>, Sagi Grimberg <sagi@...mberg.me>,
        Chaitanya Kulkarni <kch@...dia.com>,
        Linus Torvalds <torvalds@...ux-foundation.org>,
        Thorsten Leemhuis <regressions@...mhuis.info>,
        linux-block@...r.kernel.org, linux-nvme@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: 6.2 nvme-pci: something wrong

On Sat, 24 Dec 2022, Christoph Hellwig wrote:
> On Sat, Dec 24, 2022 at 03:06:38PM -0700, Keith Busch wrote:
> > Your observation is a queue-wrap condition that makes it impossible for
> > the controller know there are new commands.
> > 
> > Your patch does look like the correct thing to do. The "zero means one"
> > thing is a confusing distraction, I think. It makes more sense if you
> > consider sqsize as the maximum number of tags we can have outstanding at
> > one time and it looks like all the drivers set it that way. We're
> > supposed to leave one slot empty for a full NVMe queue, so adding one
> > here to report the total number slots isn't right since that would allow
> > us to fill all slots.
> 
> Yes, and pcie did actually do the ‐ 1 from q_depth, so we should
> drop the +1 for sqsize.  And add back the missing BLK_MQ_MAX_DEPTH.
> But we still need to keep sqsize updated as well.
> 
> > Fabrics drivers have been using this method for a while, though, so
> > interesting they haven't had a simiar problem.
> 
> Fabrics doesn't have a real queue and thus no actual wrap, so
> I don't think they will be hit as bad by this.
> 
> So we'll probably need something like this, split into two patches.
> And then for 6.2 clean up the sqsize vs q_depth mess for real.

This patch is working fine for me; and, in the light of Keith's
explanation, so far as I can tell, seems the right thing to do.

Thanks!
Hugh

> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 95c488ea91c303..5b723c65fbeab5 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -4926,7 +4926,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
>  
>  	memset(set, 0, sizeof(*set));
>  	set->ops = ops;
> -	set->queue_depth = ctrl->sqsize + 1;
> +	set->queue_depth = min_t(unsigned, ctrl->sqsize, BLK_MQ_MAX_DEPTH - 1);
>  	/*
>  	 * Some Apple controllers requires tags to be unique across admin and
>  	 * the (only) I/O queue, so reserve the first 32 tags of the I/O queue.
> diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> index f0f8027644bbf8..ec5e1c578a710b 100644
> --- a/drivers/nvme/host/pci.c
> +++ b/drivers/nvme/host/pci.c
> @@ -2332,10 +2332,12 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
>  	if (dev->cmb_use_sqes) {
>  		result = nvme_cmb_qdepth(dev, nr_io_queues,
>  				sizeof(struct nvme_command));
> -		if (result > 0)
> +		if (result > 0) {
>  			dev->q_depth = result;
> -		else
> +			dev->ctrl.sqsize = dev->q_depth - 1;
> +		} else {
>  			dev->cmb_use_sqes = false;
> +		}
>  	}
>  
>  	do {
> @@ -2536,7 +2538,6 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>  
>  	dev->q_depth = min_t(u32, NVME_CAP_MQES(dev->ctrl.cap) + 1,
>  				io_queue_depth);
> -	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
>  	dev->db_stride = 1 << NVME_CAP_STRIDE(dev->ctrl.cap);
>  	dev->dbs = dev->bar + 4096;
>  
> @@ -2577,7 +2578,7 @@ static int nvme_pci_enable(struct nvme_dev *dev)
>  		dev_warn(dev->ctrl.device, "IO queue depth clamped to %d\n",
>  			 dev->q_depth);
>  	}
> -
> +	dev->ctrl.sqsize = dev->q_depth - 1; /* 0's based queue depth */
>  
>  	nvme_map_cmb(dev);
>  
>