lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5a883e670b6f38fc7c3edde7343a205fbc56474b.camel@linux.ibm.com>
Date:   Tue, 08 Nov 2022 18:11:01 +0100
From:   Gerd Bayer <gbayer@...ux.ibm.com>
To:     Christoph Hellwig <hch@....de>
Cc:     Jens Axboe <axboe@...com>, Sagi Grimberg <sagi@...mberg.me>,
        Niklas Schnelle <schnelle@...ux.ibm.com>,
        linux-kernel@...r.kernel.org, linux-next@...r.kernel.org,
        linux-nvme@...ts.infradead.org
Subject: Re: nvme-pci: NULL pointer dereference in nvme_dev_disable() on
 linux-next

Hi Christoph,

with your minimal fix

On Tue, 2022-11-08 at 08:48 +0100, Christoph Hellwig wrote:
> Below is the minimal fix.  I'll see if I sort out the mess that is
> probe/reset failure vs ->remove a bit better, though.
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index f94b05c585cbc..577bacdcfee08 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5160,6 +5160,8 @@ EXPORT_SYMBOL_GPL(nvme_start_freeze);
>  
>  void nvme_stop_queues(struct nvme_ctrl *ctrl)
>  {
> +	if (!ctrl->tagset)
> +		return;
>  	if (!test_and_set_bit(NVME_CTRL_STOPPED, &ctrl->flags))
>  		blk_mq_quiesce_tagset(ctrl->tagset);
>  	else
> @@ -5169,6 +5171,8 @@ EXPORT_SYMBOL_GPL(nvme_stop_queues);
>  
>  void nvme_start_queues(struct nvme_ctrl *ctrl)
>  {
> +	if (!ctrl->tagset)
> +		return;
>  	if (test_and_clear_bit(NVME_CTRL_STOPPED, &ctrl->flags))
>  		blk_mq_unquiesce_tagset(ctrl->tagset);
>  }

on next-20221108 the kernel does not crash any
more when I run the short test-script.

dmesg shows:
Nov 08 17:38:51 a46lp24.lnxne.boe kernel: nvme nvme0: pci function 0004:00:00.0
Nov 08 17:38:51 a46lp24.lnxne.boe kernel:
nvme nvme0: failed to mark controller CONNECTING
Nov 08 17:38:51 a46lp24.lnxne.boe kernel: nvme nvme0: Removing after
probe failure status: -16
Nov 08 17:38:52 a46lp24.lnxne.boe kernel: pci 0004:00:00.0: Removing from iommu group 0

while kernel remains up.
I can even do 
- rescan on the pci bus (to bring back the nvme drive), and
- run the test script
multiple times.

So from my point of view this band-aid is valuable to be incorporated while the larger 
overhaul in

https://lore.kernel.org/linux-nvme/20221108150252.2123727-1-hch@lst.de/
is out for review and test.

Feel free to add my
Tested-by: Gerd Bayer <gbayer@...ux.ibm.com>

Thank you,
Gerd Bayer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ