netdev - Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for handling handshake requests

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <006c4e44-572b-a6f8-9af0-5f568913e7a0@suse.de>
Date:   Mon, 27 Feb 2023 16:14:56 +0100
From:   Hannes Reinecke <hare@...e.de>
To:     Chuck Lever III <chuck.lever@...cle.com>
Cc:     Chuck Lever <cel@...nel.org>, "kuba@...nel.org" <kuba@...nel.org>,
        "pabeni@...hat.com" <pabeni@...hat.com>,
        "edumazet@...gle.com" <edumazet@...gle.com>,
        "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
        "kernel-tls-handshake@...ts.linux.dev" 
        <kernel-tls-handshake@...ts.linux.dev>
Subject: Re: [PATCH v5 1/2] net/handshake: Create a NETLINK service for
 handling handshake requests

On 2/27/23 15:59, Chuck Lever III wrote:
> 
> 
>> On Feb 27, 2023, at 4:24 AM, Hannes Reinecke <hare@...e.de> wrote:
>>
>> On 2/24/23 20:19, Chuck Lever wrote:
[ .. ]
>>> +	req = sock->sk->sk_handshake_req;
>>> +	if (!req) {
>>> +		err = -EBUSY;
>>> +		goto out_status;
>>> +	}
>>> +
>>> +	trace_handshake_cmd_done(net, req, sock, fd);
>>> +
>>> +	status = -EIO;
>>> +	if (tb[HANDSHAKE_A_DONE_STATUS])
>>> +		status = nla_get_u32(tb[HANDSHAKE_A_DONE_STATUS]);
>>> +
>> And this makes me ever so slightly uneasy.
>>
>> As 'status' is a netlink attribute it's inevitably defined as 'unsigned'.
>> Yet we assume that 'status' is a negative number, leaving us _technically_ in unchartered territory.
> 
> Ah, that's an oversight.
> 
> 
>> And that is notwithstanding the problem that we haven't even defined _what_ should be in the status attribute.
> 
> It's now an errno value.
> 
> 
>> Reading the code I assume that it's either '0' for success or a negative number (ie the error code) on failure.
>> Which implicitely means that we _never_ set a positive number here.
>> So what would we lose if we declare 'status' to carry the _positive_ error number instead?
>> It would bring us in-line with the actual netlink attribute definition, we wouldn't need
>> to worry about possible integer overflows, yadda yadda...
>>
>> Hmm?
> 
> It can also be argued that errnos in user space are positive-valued,
> therefore, this user space visible protocol should use a positive
> errno.
> 
> 
Thanks.

[ .. ]
>>> +
>>> +/**
>>> + * handshake_req_cancel - consumer API to cancel an in-progress handshake
>>> + * @sock: socket on which there is an ongoing handshake
>>> + *
>>> + * XXX: Perhaps killing the user space agent might also be necessary?
>>
>> I thought we had agreed that we would be sending a signal to the userspace process?
> 
> We had discussed killing the handler, but I don't think it's necessary.
> I'd rather not do something that drastic unless we have no other choice.
> So far my testing hasn't shown a need for killing the child process.
> 
> I'm also concerned that the kernel could reuse the handler's process ID.
> handshake_req_cancel would kill something that is not a handshake agent.
> 
Hmm? If that were the case, wouldn't we be sending the netlink message 
to the
wrong process, to?

And in the absence of any timeout handler: what do we do if userspace is 
stuck / doesn't make forward progress?
At one point TCP will timeout, and the client will close the connection.
Leaving us with (potentially) broken / stuck processes. Sure we would 
need to initiate some cleanup here, no?

>> Ideally we would be sending a SIGHUP, wait for some time on the userspace
>> process to respond with a 'done' message, and send a 'KILL' signal if we
>> haven't received one.
>>
>> Obs: Sending a KILL signal would imply that userspace is able to cope with
>> children dying. Which pretty much excludes pthreads, I would think.
>>
>> Guess I'll have to consult Stevens :-)
> 
> Basically what cancel does is atomically disarm the "done" callback.
> 
> The socket belongs to the kernel, so it will live until the kernel is
> good and through with it.
> 
Oh, the socket does. But the process handling the socket is not.
So even if we close the socket from the kernel there's no guarantee that 
userspace will react to it.

Problem here is with using different key materials.
As the current handshake can only deal with one key at a time the only 
chance we have for several possible keys is to retry the handshake with 
the next key.
But out of necessity we have to use the _same_ connection (as tlshd 
doesn't control the socket). So we cannot close the socket, and hence we 
can't notify userspace to give up the handshake attempt.
Being able to send a signal would be simple; sending SIGHUP to 
userspace, and wait for the 'done' call.
If it doesn't come we can terminate all attempts.
But if we get the 'done' call we know it's safe to start with the next 
attempt.

> 
>>> + *
>>> + * Request cancellation races with request completion. To determine
>>> + * who won, callers examine the return value from this function.
>>> + *
>>> + * Return values:
>>> + *   %0 - Uncompleted handshake request was canceled or not found
>>> + *   %-EBUSY - Handshake request already completed
>>
>> EBUSY? Wouldn't be EAGAIN more approriate?
> 
> I don't think EAGAIN would be appropriate at all. The situation
> is that the handshake completed, so there's no need to call cancel
> again. It's synonym, EWOULDBLOCK, is also not a good semantic fit.
> 
> 
>> After all, the request is everything _but_ busy...
> 
> I'm open to suggestion.
> 
> One option is to use a boolean return value instead of an errno.
> 
> 
Yeah, that's probably better.

BTW: thanks for the tracepoints!

Cheers,

Hannes