netdev - Re: [PATCH net 5/5] tls: don't decrypt the next record if it's of a different type

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZPm__x5TcsmqagBH@hog>
Date: Thu, 7 Sep 2023 14:21:59 +0200
From: Sabrina Dubroca <sd@...asysnail.net>
To: Jakub Kicinski <kuba@...nel.org>,
	Herbert Xu <herbert@...dor.apana.org.au>
Cc: netdev@...r.kernel.org, Dave Watson <davejwatson@...com>,
	Vakul Garg <vakul.garg@....com>, Boris Pismenny <borisp@...dia.com>,
	John Fastabend <john.fastabend@...il.com>
Subject: Re: [PATCH net 5/5] tls: don't decrypt the next record if it's of a
 different type

2023-09-06, 20:47:27 -0700, Jakub Kicinski wrote:
> On Wed,  6 Sep 2023 19:08:35 +0200 Sabrina Dubroca wrote:
> > If the next record is of a different type, we won't copy it to
> > userspace in this round, tls_record_content_type will stop us just
> > after decryption. Skip decryption until the next recvmsg() call.
> > 
> > This fixes a use-after-free when a data record is decrypted
> > asynchronously but doesn't fill the userspace buffer, and the next
> > record is non-data, for example in the bad_cmsg selftest.
> 
> What's the UAF on?

It doesn't always happen unless I set cryptd_delay_ms from my debug
patch (10 is enough):
https://patchwork.kernel.org/project/linux-crypto/patch/9d664093b1bf7f47497b2c40b3a085b45f3274a2.1694021240.git.sd@queasysnail.net/

UAF is on the crypto_async_request (part of the aead_request,
allocated in the big kmalloc in tls_decrypt_sg), mostly caught when
cryptd_queue_worker calls crypto_request_complete, but sometimes a bit
earlier (in crypto_dequeue_request).

I'll admit this patch is papering over the issue a bit, but decrypting
a record we know we won't read within this recv() call seems a bit
pointless.

I wonder if the way we're using ctx->async_wait here is correct. I'm
observing crypto_wait_req return 0 even though the decryption hasn't
run yet (and it should return -EBADMSG, not 0). I guess
tls_decrypt_done calls the completion (since we only had one
decrypt_pending), and then crypto_wait_req thinks everything is
already done.

Adding a fresh crypto_wait in tls_do_decryption (DECLARE_CRYPTO_WAIT)
and using it in the !darg->async case also seems to fix the UAF (but
makes the bad_cmsg test case fail in the same way as what I wrote in
the cover letter for bad_in_large_read -- not decrypting the next
message at all makes the selftest pass).

Herbert, WDYT? We're calling tls_do_decryption twice from the same
tls_sw_recvmsg invocation, first with darg->async = true, then with
darg->async = false. Is it ok to use ctx->async_wait for both, or do
we need a fresh one as in this patch?

-------- 8< --------
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 86b835b15872..ad51960f2864 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -246,6 +246,7 @@ static int tls_do_decryption(struct sock *sk,
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_prot_info *prot = &tls_ctx->prot_info;
 	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	DECLARE_CRYPTO_WAIT(wait);
 	int ret;

 	aead_request_set_tfm(aead_req, ctx->aead_recv);
@@ -262,7 +263,7 @@ static int tls_do_decryption(struct sock *sk,
 	} else {
 		aead_request_set_callback(aead_req,
 					  CRYPTO_TFM_REQ_MAY_BACKLOG,
-					  crypto_req_done, &ctx->async_wait);
+					  crypto_req_done, &wait);
 	}

 	ret = crypto_aead_decrypt(aead_req);
@@ -270,7 +271,7 @@ static int tls_do_decryption(struct sock *sk,
 		if (darg->async)
 			return 0;

-		ret = crypto_wait_req(ret, &ctx->async_wait);
+		ret = crypto_wait_req(ret, &wait);
 	}
 	darg->async = false;

-- 
Sabrina