linux-kernel - Re: [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <08f3d804-f94b-4a2f-897b-7fee3411e6fc@suse.de>
Date: Thu, 7 Mar 2024 12:45:17 +0100
From: Hannes Reinecke <hare@...e.de>
To: Sagi Grimberg <sagi@...mberg.me>, Daniel Wagner <dwagner@...e.de>,
 James Smart <james.smart@...adcom.com>
Cc: Keith Busch <kbusch@...nel.org>, Christoph Hellwig <hch@....de>,
 linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v3 0/2] nvme-fabrics: short-circuit connect retries

On 3/7/24 12:30, Sagi Grimberg wrote:
> 
> 
> On 07/03/2024 12:37, Hannes Reinecke wrote:
>> On 3/7/24 09:00, Sagi Grimberg wrote:
>>>
>>> On 05/03/2024 10:00, Daniel Wagner wrote:
>>>> I've picked up Hannes' DNR patches. In short the make the transports 
>>>> behave the same way when the DNR bit set on a re-connect attempt. We
>>>> had a discussion this
>>>> topic in the past and if I got this right we all agreed is that the 
>>>> host should honor the DNR bit on a connect attempt [1]
>>> Umm, I don't recall this being conclusive though. The spec ought to 
>>> be clearer here I think.
>>
>> I've asked the NVMexpress fmds group, and the response was pretty 
>> unanimous that the DNR bit on connect should be evaluated.
> 
> OK.
> 
>>
>>>>
>>>> The nvme/045 test case (authentication tests) in blktests is a good 
>>>> test case for this after extending it slightly. TCP and RDMA try to
>>>> reconnect with an
>>>> invalid key over and over again, while loop and FC stop after the 
>>>> first fail.
>>>
>>> Who says that invalid key is a permanent failure though?
>>>
>> See the response to the other patchset.
>> 'Invalid key' in this context means that the _client_ evaluated the 
>> key as invalid, ie the key is unusable for the client.
>> As the key is passed in via the commandline there is no way the client
>> can ever change the value here, and no amount of retry will change 
>> things here. That's what we try to fix.
> 
> Where is this retried today, I don't see where connect failure is 
> retried, outside of a periodic reconnect.
> Maybe I'm missing where what is the actual failure here.

static void nvme_tcp_reconnect_ctrl_work(struct work_struct *work)
{
         struct nvme_tcp_ctrl *tcp_ctrl =
                         container_of(to_delayed_work(work),
                         struct nvme_tcp_ctrl, connect_work);
         struct nvme_ctrl *ctrl = &tcp_ctrl->ctrl;

         ++ctrl->nr_reconnects;

         if (nvme_tcp_setup_ctrl(ctrl, false))
                 goto requeue;

         dev_info(ctrl->device, "Successfully reconnected (%d attempt)\n",
                         ctrl->nr_reconnects);

         ctrl->nr_reconnects = 0;

         return;

requeue:
         dev_info(ctrl->device, "Failed reconnect attempt %d\n",

and nvme_tcp_setup_ctrl() returns either a negative errno or an NVMe 
status code (which might include the DNR bit).

Cheers,

Hannes