[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <18955196-3b95-b223-ad12-4db1534786a1@suse.de>
Date: Tue, 14 Feb 2023 12:09:08 +0100
From: Hannes Reinecke <hare@...e.de>
To: Chuck Lever <chuck.lever@...cle.com>, kuba@...nel.org,
pabeni@...hat.com, edumazet@...gle.com
Cc: netdev@...r.kernel.org, hare@...e.com, dhowells@...hat.com,
bcodding@...hat.com, kolga@...app.com, jmeneghi@...hat.com
Subject: Re: [PATCH v3 0/2] Another crack at a handshake upcall mechanism
On 2/14/23 10:44, Hannes Reinecke wrote:
> On 2/7/23 22:41, Chuck Lever wrote:
>> Hi-
>>
>> Here is v3 of a series to add generic support for transport layer
>> security handshake on behalf of kernel consumers (user space
>> consumers use a security library directly, of course).
>>
>> This version of the series does away with the listen/poll/accept/
>> close design and replaces it with a full netlink implementation
>> that handles much of the same function.
>>
>> The first patch in the series adds a new netlink family to handle
>> the kernel-user space interaction to request a handshake. The second
>> patch demonstrates how to extend this new mechanism to support a
>> particular transport layer security protocol (in this case,
>> TLSv1.3).
>>
>> Of particular interest is that the user space handshake agent now
>> must perform a second downcall when the handshake is complete,
>> rather than simply closing the socket descriptor. This enables the
>> user space agent to pass down a session status, whether the session
>> was mutually authenticated, and the identity of the remote peer.
>> (Although these facilities are plumbed into the netlink protocol,
>> they have yet to be fully implemented by the kernel or the sample
>> user space agent below).
>>
>> Certificates and pre-shared keys are made available to the user
>> space agent via keyrings, or the agent can use authentication
>> materials residing in the local filesystem.
>>
>> The full patch set to support SunRPC with TLSv1.3 is available in
>> the topic-rpc-with-tls-upcall branch here, based on v6.1.10:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>
>> A sample user space handshake agent with netlink support is
>> available in the "netlink" branch here:
>>
>> https://github.com/oracle/ktls-utils
>>
>> ---
>>
>> Changes since v2:
>> - PF_HANDSHAKE replaced with NETLINK_HANDSHAKE
>> - Replaced listen(2) / poll(2) with a multicast notification service
>> - Replaced accept(2) with a netlink operation that can return an
>> open fd and handshake parameters
>> - Replaced close(2) with a netlink operation that can take arguments
>>
>> Changes since RFC:
>> - Generic upcall support split away from kTLS
>> - Added support for TLS ServerHello
>> - Documentation has been temporarily removed while API churns
>>
>> Chuck Lever (2):
>> net/handshake: Create a NETLINK service for handling handshake
>> requests
>> net/tls: Support AF_HANDSHAKE in kTLS
>>
>> The use of AF_HANDSHAKE in the short description here is stale. I'll
>> fix that in a subsequent posting.
>>
> Have been playing around with this patchset, and for some reason I get a
> weird crash:
>
> [ 5101.640941] nvme nvme0: queue 0: start TLS with key 15982809
> [ 5111.769538] nvme nvme0: queue 0: TLS handshake complete, tmo 2500,
> error -110
> [ 5111.769545] BUG: kernel NULL pointer dereference, address:
> 0000000000000068
> [ 5111.770089] #PF: supervisor read access in kernel mode
> [ 5111.770460] #PF: error_code(0x0000) - not-present page
> [ 5111.770828] PGD 0 P4D 0
> [ 5111.771019] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 5111.771344] CPU: 0 PID: 8611 Comm: nvme Kdump: loaded Tainted: G [
> 5111.772193] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS [
> 5111.772864] RIP: 0010:kernel_sock_shutdown+0x9/0x20
>
> which looks to me as if the socket had been deallocated once the netlink
> handshake has completed.
> And indeed, handshake_accept() has the 'CLOEXEC' flag set.
> So if the userprocess exits it'll close the socket, and we're hosed.
> Which seems to be what is happening here.
>
> Let's see if things work out better without the CLOEXEC flag.
>
Nope, that doesn't work.
Turns out to be an issue with netlink timeout handling.
In my code I've added a 'wait_for_completion' loop, seeing that I need
to get the result from the upcall such that I can continue.
But as I'm triggering the infamous 'assert' in gnutls (regarding PSK
identity length), userspace does _not_ return, but rather waits
indefinitely. Or, rather, longer than I'm prepared to wait.
Once the timeout is triggered I find that the socket has been released,
causing _quite_ some friction with the code :-)
Looks like I'll have to add timeout handling to the netlink handshake;
plan is to transmit the timeout parameter from the kernel to userspace,
and set the timeout via gnutls_handshake_set_timeout().
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@...e.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
Powered by blists - more mailing lists