netdev - Re: [PATCH v3] net/tls: support maximum record size limit

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05dc7efdb45363358825ff3782d3006ef9c6cea4.camel@gmail.com>
Date: Thu, 18 Sep 2025 10:52:14 +1000
From: Wilfred Mallawa <wilfred.opensource@...il.com>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
 pabeni@...hat.com, 	horms@...nel.org, corbet@....net,
 john.fastabend@...il.com, netdev@...r.kernel.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, 
	alistair.francis@....com, dlemoal@...nel.org
Subject: Re: [PATCH v3] net/tls: support maximum record size limit

Hey Sabrina,

Sorry for the delay in getting back to this! Responded inline.

On Wed, 2025-09-03 at 12:14 +0200, Sabrina Dubroca wrote:
> note: since this is a new feature, the subject prefix should be
> "[PATCH net-next vN]" (ie add "net-next", the target tree for "new
> feature" changes)
> 
> 2025-09-03, 11:47:57 +1000, Wilfred Mallawa wrote:
> > diff --git a/Documentation/networking/tls.rst
> > b/Documentation/networking/tls.rst
> > index 36cc7afc2527..0232df902320 100644
> > --- a/Documentation/networking/tls.rst
> > +++ b/Documentation/networking/tls.rst
> > @@ -280,6 +280,13 @@ If the record decrypted turns out to had been
> > padded or is not a data
> >  record it will be decrypted again into a kernel buffer without
> > zero copy.
> >  Such events are counted in the ``TlsDecryptRetry`` statistic.
> >  
> > +TLS_TX_RECORD_SIZE_LIM
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +During a TLS handshake, an endpoint may use the record size limit
> > extension
> > +to specify a maximum record size. This allows enforcing the
> > specified record
> > +size limit, such that outgoing records do not exceed the limit
> > specified.
> 
> Maybe worth adding a reference to the RFC that defines this
> extension?
> I'm not sure if that would be helpful to readers of this doc or not.
Good idea, I'll add that in.
> 
> 
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index a3ccb3135e51..94237c97f062 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> [...]
> > @@ -1022,6 +1075,7 @@ static int tls_init(struct sock *sk)
> >  
> >  	ctx->tx_conf = TLS_BASE;
> >  	ctx->rx_conf = TLS_BASE;
> > +	ctx->tx_record_size_limit = TLS_MAX_PAYLOAD_SIZE;
> >  	update_sk_prot(sk, ctx);
> >  out:
> >  	write_unlock_bh(&sk->sk_callback_lock);
> > @@ -1065,7 +1119,7 @@ static u16 tls_user_config(struct tls_context
> > *ctx, bool tx)
> >  
> >  static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool
> > net_admin)
> >  {
> > -	u16 version, cipher_type;
> > +	u16 version, cipher_type, tx_record_size_limit;
> >  	struct tls_context *ctx;
> >  	struct nlattr *start;
> >  	int err;
> > @@ -1110,7 +1164,13 @@ static int tls_get_info(struct sock *sk,
> > struct sk_buff *skb, bool net_admin)
> >  		if (err)
> >  			goto nla_failure;
> >  	}
> > -
> > +	tx_record_size_limit = ctx->tx_record_size_limit;
> > +	if (tx_record_size_limit) {
> 
> You probably meant to update that to:
> 
>     tx_record_size_limit != TLS_MAX_PAYLOAD_SIZE
> 
> Otherwise, now that the default is TLS_MAX_PAYLOAD_SIZE, it will
> always be exported - which is not wrong either. So I'd either update
> the conditional so that the attribute is only exported for non-
> default
> sizes (like in v2), or drop the if() and always export it.
> 
Yeah, that makes sense I'll drop the If() so that it's always exported
then.
> > +		err = nla_put_u16(skb,
> > TLS_INFO_TX_RECORD_SIZE_LIM,
> > +				  tx_record_size_limit);
> > +		if (err)
> > +			goto nla_failure;
> > +	}
> 
> [...]
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index bac65d0d4e3e..28fb796573d1 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock
> > *sk, struct msghdr *msg,
> >  		orig_size = msg_pl->sg.size;
> >  		full_record = false;
> >  		try_to_copy = msg_data_left(msg);
> > -		record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl-
> > >sg.size;
> > +		record_room = tls_ctx->tx_record_size_limit -
> > msg_pl->sg.size;
> 
> If we entered tls_sw_sendmsg_locked with an existing open record,
> this
> could end up being negative and confuse the rest of the code.
> 
>     send(MSG_MORE) returns with an open record of length len1
>     setsockopt(TLS_INFO_TX_RECORD_SIZE_LIM, limit < len1)
>     send() -> record_room < 0
> 
> 
> Possibly not a problem with a "well-behaved" userspace, but we can't
> rely on that.
Good catch! what if we don't allow tx_record_size_limit to be set if
there's a pending open record. This should avoid userspace from atleast
causing the record_room < 0 if we somehow end up there.

So for example:

diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7c0367dc5d40..34bb6690016c 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -841,20 +841,27 @@ static int
do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval,
                                            unsigned int optlen)
 {
        struct tls_context *ctx = tls_get_ctx(sk);
+       struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx);
        u16 value;
 
+       if (sw_ctx->open_rec)
+               return -EBUSY;
...

And to your follow up response:

```
> I suspect it's not a problem in practice because of what the TLS
> exchange between the peers setting up this extension looks like? (ie,
> there should never be an open record at this stage - unless userspace
> delays doing this setsockopt after getting the message from the peer,
> but then maybe we can call that a buggy userspace)
```

Yeah, record size limit extension occurs during a handshake
(Client/ServerHello). AFAIK, all of that is handled in tlshd/GnuTLS. We
shouldn't have any open records here at this point. For user-space
context, this is what support for record size limit looks like [1] in
tlshd.

If for whatever reason, as you mentioned, userspace decides to set it
later, change above could mitigate it for the open record case? I don't
think we need to try to fix things (or even can for records already
submitted to TCP) in the kernel.

[1]
WIP:https://github.com/twilfredo/ktls-utils/commit/73cb755acb4589ba31e4c42ef6b16cf5efdf3892
> 
> 
> Pushing out the pending "too big" record at the time we set
> tx_record_size_limit would likely make the peer close the connection
> (because it's already told us to limit our TX size), so I guess we'd
> have to split the pending record into tx_record_size_limit chunks
> before we start processing the new message (either directly at
> setsockopt(TLS_INFO_TX_RECORD_SIZE_LIM) time, or the next send/etc
> call). The final push during socket closing, and maybe some more
> codepaths that deal with ctx->open_rec, would also have to do that.
> 
> I think additional selftests for
>     send(MSG_MORE), TLS_INFO_TX_RECORD_SIZE_LIM, send
> and
>     send(MSG_MORE), TLS_INFO_TX_RECORD_SIZE_LIM, close
> verifying the received record sizes would make sense, since it's a
> bit
> tricky to get that right.
Yeah I agree, I will work on that. Thanks for the feedback!

Regards,
Wilfred