linux-kernel - [PATCH v3 0/6] Handle update hardware queues and queue freeze more carefully

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <20210720124353.127959-1-dwagner@suse.de>
Date:   Tue, 20 Jul 2021 14:43:47 +0200
From:   Daniel Wagner <dwagner@...e.de>
To:     linux-nvme@...ts.infradead.org
Cc:     linux-kernel@...r.kernel.org,
        James Smart <james.smart@...adcom.com>,
        Keith Busch <kbusch@...nel.org>,
        Ming Lei <ming.lei@...hat.com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Daniel Wagner <dwagner@...e.de>
Subject: [PATCH v3 0/6] Handle update hardware queues and queue freeze more carefully

Hi,

I've replaced my 'nvme_start_freeze' patch with the two patches from
James and gave it another test run on top of Ming's 'v2 fix
blk_mq_alloc_request_hctx' series. All looks good.

Thanks,
Daniel

v1:
 - https://lore.kernel.org/linux-nvme/20210625101649.49296-1-dwagner@suse.de/
v2:
 - https://lore.kernel.org/linux-nvme/20210708092755.15660-1-dwagner@suse.de/
 - reviewed tags collected
 - added 'update hardware queues' for all transport
 - added fix for fc hanger in nvme_wait_freeze_timeout
v3:
 - dropped 'nvme-fc: Freeze queues before destroying them'
 - added James' two patches

Initial cover letter:

this is a followup on the crash I reported in

  https://lore.kernel.org/linux-block/20210608183339.70609-1-dwagner@suse.de/

By moving the hardware check up the crash was gone. Unfortuntatly, I
don't understand why this fixes the crash. The per-cpu access is
crashing but I can't see why the blk_mq_update_nr_hw_queues() is
fixing this problem.

Even though I can't explain why it fixes it, I think it makes sense to
update the hardware queue mapping bevore we recreate the IO
queues. Thus I avoided in the commit message to say it fixes
something.

Also during testing I observed the we hang indivinetly in
blk_mq_freeze_queue_wait(). Again I can't explain why we get stuck
there but given a common pattern for the nvme_wait_freeze() is to use
it with a timeout I think the timeout should be used too :)

Anyway, someone with more undertanding of the stack can explain the
problems.

Daniel Wagner (3):
  nvme-fc: Update hardware queues before using them
  nvme-rdma: Update number of hardware queues before using them
  nvme-fc: Wait with a timeout for queue to freeze

Hannes Reinecke (1):
  nvme-tcp: Update number of hardware queues before using them

James Smart (2):
  nvme-fc: avoid race between time out and tear down
  nvme-fc: fix controller reset hang during traffic

 drivers/nvme/host/fc.c   | 28 +++++++++++++++++++---------
 drivers/nvme/host/rdma.c | 13 ++++++-------
 drivers/nvme/host/tcp.c  | 14 ++++++--------
 3 files changed, 31 insertions(+), 24 deletions(-)

-- 
2.29.2