linux-kernel - Re: [PATCH v2 6/7] nvme-tcp: Support KeyUpdate

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <f1a7b0b5-65e3-4cd0-9c62-50bbb554e589@suse.de>
Date: Tue, 16 Sep 2025 15:04:13 +0200
From: Hannes Reinecke <hare@...e.de>
To: alistair23@...il.com, chuck.lever@...cle.com, hare@...nel.org,
 kernel-tls-handshake@...ts.linux.dev, netdev@...r.kernel.org,
 linux-kernel@...r.kernel.org, linux-doc@...r.kernel.org,
 linux-nvme@...ts.infradead.org, linux-nfs@...r.kernel.org
Cc: kbusch@...nel.org, axboe@...nel.dk, hch@....de, sagi@...mberg.me,
 kch@...dia.com, Alistair Francis <alistair.francis@....com>
Subject: Re: [PATCH v2 6/7] nvme-tcp: Support KeyUpdate

On 9/5/25 04:46, alistair23@...il.com wrote:
> From: Alistair Francis <alistair.francis@....com>
> 
> If the nvme_tcp_try_send() or nvme_tcp_try_recv() functions return
> EKEYEXPIRED then the underlying TLS keys need to be updated. This occurs
> on an KeyUpdate event.
> 
> If the NVMe Target (TLS server) initiates a KeyUpdate this patch will
> allow the NVMe layer to process the KeyUpdate request and forward the
> request to userspace. Userspace must then update the key to keep the
> connection alive.
> 
> This patch allows us to handle the NVMe target sending a KeyUpdate
> request without aborting the connection. At this time we don't support
> initiating a KeyUpdate.
> 
> Link: https://datatracker.ietf.org/doc/html/rfc8446#section-4.6.3
> Signed-off-by: Alistair Francis <alistair.francis@....com>
> ---
> v2:
>   - Don't change the state
>   - Use a helper function for KeyUpdates
>   - Continue sending in nvme_tcp_send_all() after a KeyUpdate
>   - Remove command message using recvmsg
>   
>   drivers/nvme/host/tcp.c | 73 +++++++++++++++++++++++++++++++++++++++--
>   1 file changed, 70 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/host/tcp.c b/drivers/nvme/host/tcp.c
> index 776047a71436..b6449effc2ac 100644
> --- a/drivers/nvme/host/tcp.c
> +++ b/drivers/nvme/host/tcp.c
> @@ -171,6 +171,7 @@ struct nvme_tcp_queue {
>   	bool			tls_enabled;
>   	u32			rcv_crc;
>   	u32			snd_crc;
> +	key_serial_t		user_session_id;
>   	__le32			exp_ddgst;
>   	__le32			recv_ddgst;
>   	struct completion       tls_complete;
> @@ -210,6 +211,7 @@ static int nvme_tcp_start_tls(struct nvme_ctrl *nctrl,
>   			      struct nvme_tcp_queue *queue,
>   			      key_serial_t pskid,
>   			      handshake_key_update_type keyupdate);
> +static void update_tls_keys(struct nvme_tcp_queue *queue);
>   
>   static inline struct nvme_tcp_ctrl *to_tcp_ctrl(struct nvme_ctrl *ctrl)
>   {
> @@ -393,6 +395,14 @@ static inline void nvme_tcp_send_all(struct nvme_tcp_queue *queue)
>   	do {
>   		ret = nvme_tcp_try_send(queue);
>   	} while (ret > 0);
> +
> +	if (ret == -EKEYEXPIRED) {
> +		update_tls_keys(queue);
> +
> +		do {
> +			ret = nvme_tcp_try_send(queue);
> +		} while (ret > 0);
> +	}
>   }
>   
>   static inline bool nvme_tcp_queue_has_pending(struct nvme_tcp_queue *queue)
> @@ -1347,6 +1357,8 @@ static int nvme_tcp_try_send(struct nvme_tcp_queue *queue)
>   done:
>   	if (ret == -EAGAIN) {
>   		ret = 0;
> +	} else if (ret == -EKEYEXPIRED) {
> +		goto out;
>   	} else if (ret < 0) {
>   		dev_err(queue->ctrl->ctrl.device,
>   			"failed to send request %d\n", ret);
> @@ -1371,9 +1383,56 @@ static int nvme_tcp_try_recv(struct nvme_tcp_queue *queue)
>   	queue->nr_cqe = 0;
>   	consumed = sock->ops->read_sock(sk, &rd_desc, nvme_tcp_recv_skb);
>   	release_sock(sk);
> +
> +	/* If we received EINVAL from read_sock then it generally means the
> +	 * other side sent a command message. So let's try to clear it from
> +	 * our queue with a recvmsg, otherwise we get stuck in an infinite
> +	 * loop.
> +	 */
> +	if (consumed == -EINVAL) {
> +		char cbuf[CMSG_LEN(sizeof(char))] = {};
> +		struct msghdr msg = { .msg_flags = MSG_DONTWAIT };
> +		struct bio_vec bvec;
> +
> +		bvec_set_virt(&bvec, (void *)cbuf, sizeof(cbuf));
> +		iov_iter_bvec(&msg.msg_iter, ITER_DEST, &bvec, 1, sizeof(cbuf));
> +
> +		msg.msg_control = cbuf;
> +		msg.msg_controllen = sizeof(cbuf);
> +
> +		consumed = sock_recvmsg(sock, &msg, msg.msg_flags);
> +	}
> +
>   	return consumed == -EAGAIN ? 0 : consumed;
>   }
>   
> +static void update_tls_keys(struct nvme_tcp_queue *queue)
> +{
> +	int qid = nvme_tcp_queue_id(queue);
> +	int ret;
> +
> +	dev_dbg(queue->ctrl->ctrl.device,
> +		"updating key for queue %d\n", qid);
> +
> +	cancel_work(&queue->io_work);
> +	handshake_req_cancel(queue->sock->sk);
> +	handshake_sk_destruct_req(queue->sock->sk);
> +
Careful here. The RFC fully expects to have several KeyUpdate requests
pending (eg if both sides decide so initiate a KeyUpdate at the same
time). And cancelling a handshake request would cause tlshd/gnutls
to lose track of the generation counter and generate an invalid
traffic secret.
I would just let it rip and don't bother with other handshake
requests.

> +	nvme_stop_keep_alive(&(queue->ctrl->ctrl));
> +	flush_work(&(queue->ctrl->ctrl).async_event_work);
> +
Oh bugger. Seems like gnutls is generating the KeyUpdate message
itself, and we have to wait for that.
So much for KeyUpdate being transparent without having to stop I/O...

Can't we fix gnutls to make sending the KeyUpdate message and changing
the IV parameters an atomic operation? That would be a far better 
interface, as then we would not need to stop I/O and the handshake
process could run fully asynchronous to normal I/O...

> +	ret = nvme_tcp_start_tls(&(queue->ctrl->ctrl),
> +				 queue, queue->ctrl->ctrl.tls_pskid,
> +				 HANDSHAKE_KEY_UPDATE_TYPE_RECEIVED);
> +
> +	if (ret < 0) {
> +		dev_err(queue->ctrl->ctrl.device,
> +			"failed to update the keys %d\n", ret);
> +		nvme_tcp_fail_request(queue->request);
> +		nvme_tcp_done_send_req(queue);
> +	}
> +}
> +
>   static void nvme_tcp_io_work(struct work_struct *w)
>   {
>   	struct nvme_tcp_queue *queue =
> @@ -1389,15 +1448,21 @@ static void nvme_tcp_io_work(struct work_struct *w)
>   			mutex_unlock(&queue->send_mutex);
>   			if (result > 0)
>   				pending = true;
> -			else if (unlikely(result < 0))
> +			else if (unlikely(result < 0)) {
> +				if (result == -EKEYEXPIRED)
> +					update_tls_keys(queue);

How exactly can we get -EKEYEXPIRED when _sending_?
To my understanding that would have required userspace to intercept
here trying (or even sending) a KeyUpdate message, right?
So really not something we should see during normal operation.
As mentioned in my previous mail we should rather code the
KeyUpdate process itself here, too.
Namely: Trigger the KeyUpdate via userspace (eg by writing into the 
tls_key attribute for the controller), and then have the kernel side
to call out into tlshd to initiate the KeyUpdate 'handshake'.
That way we have identical flow of control for both the sending
and receiving side.

Incidentally: the RFC has some notion about 'request_update' setting
in the KeyUpdate message. Is that something we have to care about at
this level?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@...e.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich