netdev - Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:   Tue, 28 Feb 2023 16:01:32 +0000
From:   Chuck Lever III <chuck.lever@...cle.com>
To:     Hannes Reinecke <hare@...e.de>
CC:     Chuck Lever <cel@...nel.org>, "kuba@...nel.org" <kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kernel-tls-handshake@...ts.linux.dev" 
        <kernel-tls-handshake@...ts.linux.dev>
Subject: Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for
 handling handshake requests



> On Feb 28, 2023, at 10:48 AM, Hannes Reinecke <hare@...e.de> wrote:
> 
> On 2/28/23 15:28, Chuck Lever III wrote:
>>> On Feb 28, 2023, at 1:58 AM, Hannes Reinecke <hare@...e.de> wrote:
>>> 
>>> On 2/27/23 19:10, Chuck Lever III wrote:
>>> 
>> What about the narrow set of DONE status values? You've
>> recently wanted to add ENOMEM, ENOKEY, and EINVAL to
>> this set. My experience is that these status values are
>> nearly always obscured before they can get back to the
>> requesting user.
>> Can the kernel make use of ENOMEM, for example? It might
>> be able to retry, I suppose... retrying is not sensible
>> for the server side.
> The usual problem: Retry or no retry.
> Sadly error numbers are no good indicator to that.
> Maybe we should take the NVMe approach and add a _different_
> attribute indicating whether this particular error status
> should be retried.

ENOMEM is obviously temporary. The others are permanent
errors. This is handled simply via a tiny protocol
specification, which I can add near tls_handshake_done().


>>> So the only bone of contention is the timeout; as we won't
>>> be implementing signals I still think that we should have
>>> a 'timeout' attribute. And if only to feed the TLS timeout
>>> parameter for gnutls ...
>> I'm still not seeing the case for making it an individual
>> parameter for each handshake request. Maybe a config
>> parameter, if a short timeout is actually needed... even
>> then, maybe a built-in timeout is preferable to yet another
>> tuning knob that can be abused.
> The problem I see is that the kernel-side needs to make forward
> progress eventually, and calling into userspace is a good recipe
> of violating that principle.

That's why RPC-with-TLS uses wait-interruptible-timeout.


> Sending a timeout value as a netlink parameter has the advantage
> the both sides are aware that there _is_ a timeout.
> The alternative would be an unconditional wait in the kernel,
> and a very real possibility of a stuck process.

I'm not following you. Why isn't wait-interruptible-timeout
in the kernel adequate?


>> I'd like to see some testing results to determine that a
>> short timeout is the only way to handle corner cases.
> Short timeouts are especially useful for testing and debugging;
> timeout handlers are prone to issues, and hence need a really good
> bashing to hash out issues.
> And not having a timeout is also not a good idea, see above.

RPC-with-TLS has a timeout. The kernel is in complete control
of it. After a few seconds, the kernel abandons the handshake
attempt and closes the socket. It doesn't care what the handler
agent does at that point.


> But yeah, in theory we could use a configuration timeout in tlshd.
> 
> In the end, it's _just_ another netlink attribute, which might
> (or might not) be present. Which replaces a built-in value.
> I hadn't thought this to be such an issue ...

It's an issue because you have not identified a particular
corner case (via reproducer) where user and kernel have to
agree on exactly the same timeout value, and it might be
different per-request.

Show me one, and I will agree to add it. So far, I haven't
seen sufficient justification.


--
Chuck Lever