netdev - Re: [PATCH] net: tls: fix possible race condition between do_tls_getsockopt_conf() and do_tls_setsockopt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <52faaa10-f3e4-bca9-4bff-6f1ea7d26593@gmail.com>
Date:   Mon, 27 Feb 2023 11:26:18 +0800
From:   Hangyu Hua <hbh25y@...il.com>
To:     Jakub Kicinski <kuba@...nel.org>,
        Sabrina Dubroca <sd@...asysnail.net>
Cc:     Florian Westphal <fw@...len.de>, borisp@...dia.com,
        john.fastabend@...il.com, davem@...emloft.net, edumazet@...gle.com,
        pabeni@...hat.com, davejwatson@...com, aviadye@...lanox.com,
        ilyal@...lanox.com, netdev@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] net: tls: fix possible race condition between
 do_tls_getsockopt_conf() and do_tls_setsockopt_conf()

On 25/2/2023 06:17, Jakub Kicinski wrote:
> On Fri, 24 Feb 2023 22:48:57 +0100 Sabrina Dubroca wrote:
>> 2023-02-24, 13:06:25 -0800, Jakub Kicinski wrote:
>>> On Fri, 24 Feb 2023 21:22:43 +0100 Sabrina Dubroca wrote:
>>   [...]
>>>>
>>>> I suggested a change of locking in do_tls_getsockopt_conf this
>>>> morning [1]. The issue reported last seemed valid, but this patch is not
>>>> at all what I had in mind.
>>>> [1] https://lore.kernel.org/all/Y/ht6gQL+u6fj3dG@hog/
>>>
>>> Ack, I read the messages out of order, sorry.
>>>    
>>>> do_tls_setsockopt_conf fills crypto_info immediately from what
>>>> userspace gives us (and clears it on exit in case of failure), which
>>>> getsockopt could see since it's not locking the socket when it checks
>>>> TLS_CRYPTO_INFO_READY. So getsockopt would progress up to the point it
>>>> finally locks the socket, but if setsockopt failed, we could have
>>>> cleared TLS_CRYPTO_INFO_READY and freed iv/rec_seq.
>>>
>>> Makes sense. We should just take the socket lock around all of
>>> do_tls_getsockopt(), then?
>>
>> That would make things simple and consistent. My idea was just taking
>> the existing lock_sock in do_tls_getsockopt_conf out of the switch and
>> put it just above TLS_CRYPTO_INFO_READY.

I know what you mean. I just think lock crypto_info can fix this simply.

The original situation is:

thread1				thread2(do_tls_getsockopt_conf)

lock_sock(sk)
do_tls_setsockopt_conf(crypto_info->cipher_type set)

				crypto_info = xxx
				cctx = &ctx->tx
				if(!TLS_CRYPTO_INFO_READY(crypto_info))
				
tls_set_device_offload(kmalloc cctx->iv)
tls_set_sw_offload(fail and cctx->iv may not set to NULL)
do_tls_setsockopt_conf(set crypto_info->cipher_type to NULL)
release_sock(sk)

				lock_sock(sk)
				memcpy(xxx, cctx->iv, xxx)
				release_sock(sk)

If we lock crypto_info:

thread1				thread2(do_tls_getsockopt_conf)

lock_sock(sk)
do_tls_setsockopt_conf(crypto_info->cipher_type set)			
tls_set_device_offload(kmalloc cctx->iv)
tls_set_sw_offload(fail and cctx->iv may not set to NULL)
do_tls_setsockopt_conf(set crypto_info->cipher_type to NULL)
release_sock(sk)

				lock_sock(sk)
				crypto_info = xxx
				cctx = &ctx->tx
				release_sock(sk)
				if(!TLS_CRYPTO_INFO_READY(crypto_info))
				lock_sock(sk)
				memcpy(xxx, cctx->iv, xxx)
				release_sock(sk)

>>
>> While we're at it, should we move the
>>
>>      ctx->prot_info.version != TLS_1_3_VERSION
>>
>> check in do_tls_setsockopt_no_pad under lock_sock?
> 
> Yes, or READ_ONCE(), same for do_tls_getsockopt_tx_zc() and its access
> on ctx->zerocopy_sendfile.
> 
>>   I don't think that
>> can do anything wrong (we'd have to get past this check just before a
>> failing setsockopt clears crypto_info, and even then we're just
>> reading a bit from the context), it just looks a bit strange. Or just
>> lock the socket around all of do_tls_setsockopt_no_pad, like the other
>> options we have.
> 
> The delayed locking feels like a premature optimization, we'll keep
> having such issues with new options. Hence my vote to lock all of
> do_tls_getsockopt().

In order to reduce ambiguity, I think it may be a good idea only to
lock do_tls_getsockopt_conf() like we did in do_tls_setsockopt()

It will look like:

static int do_tls_getsockopt(struct sock *sk, int optname,
			     char __user *optval, int __user *optlen)
{
	int rc = 0;

	switch (optname) {
	case TLS_TX:
	case TLS_RX:
+		lock_sock(sk);
		rc = do_tls_getsockopt_conf(sk, optval, optlen,
					    optname == TLS_TX);
+		release_sock(sk);
		break;
	case TLS_TX_ZEROCOPY_RO:
		rc = do_tls_getsockopt_tx_zc(sk, optval, optlen);
		break;
	case TLS_RX_EXPECT_NO_PAD:
		rc = do_tls_getsockopt_no_pad(sk, optval, optlen);
		break;
	default:
		rc = -ENOPROTOOPT;
		break;
	}
	return rc;
}

Of cause, I will clean the lock in do_tls_getsockopt_conf(). What do you
guys think?