lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <18955196-3b95-b223-ad12-4db1534786a1@suse.de>
Date:   Tue, 14 Feb 2023 12:09:08 +0100
From:   Hannes Reinecke <hare@...e.de>
To:     Chuck Lever <chuck.lever@...cle.com>, kuba@...nel.org,
        pabeni@...hat.com, edumazet@...gle.com
Cc:     netdev@...r.kernel.org, hare@...e.com, dhowells@...hat.com,
        bcodding@...hat.com, kolga@...app.com, jmeneghi@...hat.com
Subject: Re: [PATCH v3 0/2] Another crack at a handshake upcall mechanism

On 2/14/23 10:44, Hannes Reinecke wrote:
> On 2/7/23 22:41, Chuck Lever wrote:
>> Hi-
>>
>> Here is v3 of a series to add generic support for transport layer
>> security handshake on behalf of kernel consumers (user space
>> consumers use a security library directly, of course).
>>
>> This version of the series does away with the listen/poll/accept/
>> close design and replaces it with a full netlink implementation
>> that handles much of the same function.
>>
>> The first patch in the series adds a new netlink family to handle
>> the kernel-user space interaction to request a handshake. The second
>> patch demonstrates how to extend this new mechanism to support a
>> particular transport layer security protocol (in this case,
>> TLSv1.3).
>>
>> Of particular interest is that the user space handshake agent now
>> must perform a second downcall when the handshake is complete,
>> rather than simply closing the socket descriptor. This enables the
>> user space agent to pass down a session status, whether the session
>> was mutually authenticated, and the identity of the remote peer.
>> (Although these facilities are plumbed into the netlink protocol,
>> they have yet to be fully implemented by the kernel or the sample
>> user space agent below).
>>
>> Certificates and pre-shared keys are made available to the user
>> space agent via keyrings, or the agent can use authentication
>> materials residing in the local filesystem.
>>
>> The full patch set to support SunRPC with TLSv1.3 is available in
>> the topic-rpc-with-tls-upcall branch here, based on v6.1.10:
>>
>>     https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git
>>
>> A sample user space handshake agent with netlink support is
>> available in the "netlink" branch here:
>>
>>     https://github.com/oracle/ktls-utils
>>
>> ---
>>
>> Changes since v2:
>> - PF_HANDSHAKE replaced with NETLINK_HANDSHAKE
>> - Replaced listen(2) / poll(2) with a multicast notification service
>> - Replaced accept(2) with a netlink operation that can return an
>>    open fd and handshake parameters
>> - Replaced close(2) with a netlink operation that can take arguments
>>
>> Changes since RFC:
>> - Generic upcall support split away from kTLS
>> - Added support for TLS ServerHello
>> - Documentation has been temporarily removed while API churns
>>
>> Chuck Lever (2):
>>        net/handshake: Create a NETLINK service for handling handshake 
>> requests
>>        net/tls: Support AF_HANDSHAKE in kTLS
>>
>> The use of AF_HANDSHAKE in the short description here is stale. I'll
>> fix that in a subsequent posting.
>>
> Have been playing around with this patchset, and for some reason I get a 
> weird crash:
> 
> [ 5101.640941] nvme nvme0: queue 0: start TLS with key 15982809
> [ 5111.769538] nvme nvme0: queue 0: TLS handshake complete, tmo 2500, 
> error -110
> [ 5111.769545] BUG: kernel NULL pointer dereference, address: 
> 0000000000000068
> [ 5111.770089] #PF: supervisor read access in kernel mode
> [ 5111.770460] #PF: error_code(0x0000) - not-present page
> [ 5111.770828] PGD 0 P4D 0
> [ 5111.771019] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 5111.771344] CPU: 0 PID: 8611 Comm: nvme Kdump: loaded Tainted: G [ 
> 5111.772193] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS [ 
> 5111.772864] RIP: 0010:kernel_sock_shutdown+0x9/0x20
> 
> which looks to me as if the socket had been deallocated once the netlink
> handshake has completed.
> And indeed, handshake_accept() has the 'CLOEXEC' flag set.
> So if the userprocess exits it'll close the socket, and we're hosed.
> Which seems to be what is happening here.
> 
> Let's see if things work out better without the CLOEXEC flag.
> 
Nope, that doesn't work.
Turns out to be an issue with netlink timeout handling.
In my code I've added a 'wait_for_completion' loop, seeing that I need 
to get the result from the upcall such that I can continue.

But as I'm triggering the infamous 'assert' in gnutls (regarding PSK 
identity length), userspace does _not_ return, but rather waits 
indefinitely. Or, rather, longer than I'm prepared to wait.

Once the timeout is triggered I find that the socket has been released, 
causing _quite_ some friction with the code :-)

Looks like I'll have to add timeout handling to the netlink handshake;
plan is to transmit the timeout parameter from the kernel to userspace,
and set the timeout via gnutls_handshake_set_timeout().

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@...e.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ