lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20210820084832.nlsbiztn26fv3b73@carbon.lan>
Date:   Fri, 20 Aug 2021 10:48:32 +0200
From:   Daniel Wagner <dwagner@...e.de>
To:     linux-nvme@...ts.infradead.org
Cc:     linux-kernel@...r.kernel.org,
        James Smart <james.smart@...adcom.com>,
        Keith Busch <kbusch@...nel.org>,
        Ming Lei <ming.lei@...hat.com>,
        Sagi Grimberg <sagi@...mberg.me>,
        Hannes Reinecke <hare@...e.de>,
        Wen Xiong <wenxiong@...ibm.com>,
        Himanshu Madhani <himanshu.madhani@...cle.com>
Subject: Re: [PATCH v5 0/3] Handle update hardware queues and queue freeze
 more carefully

On Wed, Aug 18, 2021 at 02:05:27PM +0200, Daniel Wagner wrote:
> I've dropped all non FC patches as they were bogus. I've retested this
> version with all combinations and all looks good now. Also I gave
> nvme-tcp a spin and again all is good.

I forgot to mention I also dropped the first three patches from v4.
Which seems to break her testing again.

Wendy reported all her tests pass with Ming's V7 of 'blk-mq: fix
blk_mq_alloc_request_hctx' and this series *only* if 'nvme-fc: Update
hardware queues before using them' from previous version is also used.

After starring at it once more, I think I finally understood the
problem. So when we do

        ret = nvme_fc_create_hw_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
        if (ret)
                goto out_free_io_queues;

        ret = nvme_fc_connect_io_queues(ctrl, ctrl->ctrl.sqsize + 1);
        if (ret)
                goto out_delete_hw_queues;

and the number of queues has changed, the connect call will fail:

 nvme2: NVME-FC{2}: create association : host wwpn 0x100000109b5a4dfa rport wwpn 0x50050768101935e5: NQN "nqn.1986-03.com.ibm:nvme:2145.0000020420006CEA"
 nvme2: Connect command failed, error wo/DNR bit: -16389

and we stop the current reconnect attempt and reschedule a new
reconnect attempt:

 nvme2: NVME-FC{2}: reset: Reconnect attempt failed (-5)
 nvme2: NVME-FC{2}: Reconnect attempt in 2 seconds

Then we try to do the same thing again which fails, thus we never
make progress.

So clearly we need to update number of queues at one point. What would
be the right thing to do here? As I understood we need to be careful
with frozen requests. Can we abort them (is this even possible in this
state?) and requeue them before we update the queue numbers?

Daniel

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ