lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <7dbdce29-621a-4286-a832-2918ec1794e4@grimberg.me>
Date: Thu, 11 Apr 2024 11:37:59 +0300
From: Sagi Grimberg <sagi@...mberg.me>
To: Hannes Reinecke <hare@...e.de>, Daniel Wagner <dwagner@...e.de>
Cc: Christoph Hellwig <hch@....de>, Keith Busch <kbusch@...nel.org>,
 James Smart <james.smart@...adcom.com>, linux-nvme@...ts.infradead.org,
 linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 1/6] nvme: authentication error are always
 non-retryable



On 11/04/2024 10:11, Hannes Reinecke wrote:
> On 4/10/24 15:50, Sagi Grimberg wrote:
>>
>>
>> On 10/04/2024 15:05, Hannes Reinecke wrote:
>>> On 4/10/24 12:21, Sagi Grimberg wrote:
>>>>
>>>>
>>>> On 10/04/2024 9:52, Daniel Wagner wrote:
>>>>> On Tue, Apr 09, 2024 at 11:26:00PM +0300, Sagi Grimberg wrote:
>>>>>>
>>>>>> On 09/04/2024 12:35, Daniel Wagner wrote:
>>>>>>> From: Hannes Reinecke <hare@...e.de>
>>>>>>>
>>>>>>> Any authentication errors which are generated internally are always
>>>>>>> non-retryable, so use negative error codes to ensure they are not
>>>>>>> retried.
>>>>>> The patch title says that any authentication error is not 
>>>>>> retryable, and
>>>>>> the patch body says "authentication errors which are generated 
>>>>>> locally
>>>>>> are non-retryable" so which one is it?
>>>>> Forgot to update the commit message. What about:
>>>>>
>>>>>    All authentication errors are non-retryable, so use negative error
>>>>>    codes to ensure they are not retried.
>>>>>
>>>>> ?
>>>>
>>>> I have a question, what happens if nvmet updated its credentials 
>>>> (by the admin) and in the period until the host got his credentials 
>>>> updated, it
>>>> happens to disconnect/reconnect. It will see an authentication
>>>> error, so it will not retry and remove the controller altogether?
>>>>
>>>> Sounds like an issue to me.
>>>
>>> Usual thing: we cannot differentiate (on the host side) whether the
>>> current PSK is _about_ to be replaced; how should the kernel
>>> know that the admin will replace the PSK in the next minutes?
>>>
>>> But that really is an issue with the standard. Currently there is no
>>> way how a target could inform the initiator that the credentials have
>>> been updated.
>>
>> I'd say that the sane thing for the host to do in this case is to 
>> reconnect
>> until giving up with hope that it may work. This seems like a better 
>> approach
>> than to abruptly remove the controller no?
>>
>>>
>>> We would need to define a new status code for this.
>>> In the meantime the safe operations model is to set a lifetime
>>> for each PSK, and ensure that the PSK is updated on both sides
>>> during the lifetime. With that there is a timeframe during which
>>> both PSKs are available (on the target), and the older will expire
>>> automatically once the lifetime limit is reached.
>>
>> That is a good solution, and will also prevent a loss of service until
>> the host credentials are updated as well.
>>
>> But regardless I have a feeling that simply removing the controller upon
>> an authentication error is not the right thing to do here.
>
> Guess what; that's what I tried to do initially. But then Christoph 
> objected that we shouldn't generate NVMe status codes internally.
> But if we can't do that then we'll have to invent yet another way to 
> return a retryable error, leading to even more band-aid.
> So I am not quite sure how we could achieve that, short of making 
> _every_ error retryable...

So this whole thing is that you want to make the host to not reconnect 
if the controller
sent a DNR and reconnect otherwise?

What are you returning today if the authentication failed? Am I reading 
it right that you are
returning -ECONNREFUSED? I think that for the specific case of 
credentials mismatch (that and only
that) you may want to return -EKEYREJECTED. That according to the 
documentation (/* Key was rejected by service */)
is specific enough that perhaps we can treat it specially when asking 
"should I reconnect?"

Thoughts?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ