lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 29 Aug 2023 11:13:45 +0200
From:   Daniel Wagner <dwagner@...e.de>
To:     linux-nvme@...ts.infradead.org
Cc:     linux-kernel@...r.kernel.org, Hannes Reinecke <hare@...e.de>,
        Sagi Grimberg <sagi@...mberg.me>,
        Jason Gunthorpe <jgg@...pe.ca>,
        James Smart <james.smart@...adcom.com>,
        Chaitanya Kulkarni <kch@...dia.com>,
        Christoph Hellwig <hch@....de>, Daniel Wagner <dwagner@...e.de>
Subject: [RFC v1 0/4] nvmet-fc blktests & autoconnect fixes

Currently, blktests will pass with the patches [1] and the revert of
[2]. This is possible because blktests is still disables the
nvmf-autoconnect auto connect service [3].

As I previously reported, blktests is able to trigger various kernel
panics with the system auto-connect running in the background. Let's try
to fix these problems.

The first two patches are fixing nvmet ftrace infrastructure. I think
they could go in right now.

The third patch changes the way the refcounting for association and
queues is done. There is a cycling dependency between these two objects
and this makes the shutdown path very complex and error prone. As the
life time of the queues is coupled to the association, I decided to drop
the refcounting of the queues and only rely on the refcounts of the
association. This made the code a bit simpler to follow and also allowed
to cleanup path to split into two halfs. The first one is to remove the
association from the association RCU list and wait for an grace period
so we know that now new I/Os will enter any queues. Then we drop the
refcounts and then actually remove any resources when the refcount drops
to 0 (all in-flight I/O has been processed). nvme/003 is particular good
in triggering crashes in this path.

nvme/005 is triggering crashes in get discovery log page. The req->port
pointer was never assign a valid pointer. This looks like there is way
to have no port entry binding (remember we have the external autoconnect
running in background).

Unfortunately, there are still some more fallouts, but I though I post
these patches now when my memory is fresh if there are any questions.

[1] https://lore.kernel.org/linux-nvme/sgoyzwj6ckrdrpq22u6fhtcemul5rqj6de4l5gw73vz77o3ils@vmv3jue4rom7/
[2] linux: ee6fdc5055e9 ("nvme-fc: fix race between error recovery and creating association")
[3] blktests: 0478dce70696 ("nvme/rc: Avoid triggering host nvme-cli autoconnect")

Daniel Wagner (4):
  nvmet-trace: avoid dereferencing pointer too early
  nvmet-trace: null terminate device name string correctly
  nvmet-fc: untangle cross refcounting objects
  nvmet-discovery: do not use invalid port

 drivers/nvme/target/discovery.c |  9 +++++
 drivers/nvme/target/fc.c        | 67 ++++++++++++++++-----------------
 drivers/nvme/target/trace.c     |  6 +--
 drivers/nvme/target/trace.h     | 28 +++++++-------
 4 files changed, 60 insertions(+), 50 deletions(-)

-- 
2.41.0

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ