[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <05dc7efdb45363358825ff3782d3006ef9c6cea4.camel@gmail.com>
Date: Thu, 18 Sep 2025 10:52:14 +1000
From: Wilfred Mallawa <wilfred.opensource@...il.com>
To: Sabrina Dubroca <sd@...asysnail.net>
Cc: davem@...emloft.net, edumazet@...gle.com, kuba@...nel.org,
pabeni@...hat.com, horms@...nel.org, corbet@....net,
john.fastabend@...il.com, netdev@...r.kernel.org,
linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org,
alistair.francis@....com, dlemoal@...nel.org
Subject: Re: [PATCH v3] net/tls: support maximum record size limit
Hey Sabrina,
Sorry for the delay in getting back to this! Responded inline.
On Wed, 2025-09-03 at 12:14 +0200, Sabrina Dubroca wrote:
> note: since this is a new feature, the subject prefix should be
> "[PATCH net-next vN]" (ie add "net-next", the target tree for "new
> feature" changes)
>
> 2025-09-03, 11:47:57 +1000, Wilfred Mallawa wrote:
> > diff --git a/Documentation/networking/tls.rst
> > b/Documentation/networking/tls.rst
> > index 36cc7afc2527..0232df902320 100644
> > --- a/Documentation/networking/tls.rst
> > +++ b/Documentation/networking/tls.rst
> > @@ -280,6 +280,13 @@ If the record decrypted turns out to had been
> > padded or is not a data
> > record it will be decrypted again into a kernel buffer without
> > zero copy.
> > Such events are counted in the ``TlsDecryptRetry`` statistic.
> >
> > +TLS_TX_RECORD_SIZE_LIM
> > +~~~~~~~~~~~~~~~~~~~~~~
> > +
> > +During a TLS handshake, an endpoint may use the record size limit
> > extension
> > +to specify a maximum record size. This allows enforcing the
> > specified record
> > +size limit, such that outgoing records do not exceed the limit
> > specified.
>
> Maybe worth adding a reference to the RFC that defines this
> extension?
> I'm not sure if that would be helpful to readers of this doc or not.
Good idea, I'll add that in.
>
>
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index a3ccb3135e51..94237c97f062 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> [...]
> > @@ -1022,6 +1075,7 @@ static int tls_init(struct sock *sk)
> >
> > ctx->tx_conf = TLS_BASE;
> > ctx->rx_conf = TLS_BASE;
> > + ctx->tx_record_size_limit = TLS_MAX_PAYLOAD_SIZE;
> > update_sk_prot(sk, ctx);
> > out:
> > write_unlock_bh(&sk->sk_callback_lock);
> > @@ -1065,7 +1119,7 @@ static u16 tls_user_config(struct tls_context
> > *ctx, bool tx)
> >
> > static int tls_get_info(struct sock *sk, struct sk_buff *skb, bool
> > net_admin)
> > {
> > - u16 version, cipher_type;
> > + u16 version, cipher_type, tx_record_size_limit;
> > struct tls_context *ctx;
> > struct nlattr *start;
> > int err;
> > @@ -1110,7 +1164,13 @@ static int tls_get_info(struct sock *sk,
> > struct sk_buff *skb, bool net_admin)
> > if (err)
> > goto nla_failure;
> > }
> > -
> > + tx_record_size_limit = ctx->tx_record_size_limit;
> > + if (tx_record_size_limit) {
>
> You probably meant to update that to:
>
> tx_record_size_limit != TLS_MAX_PAYLOAD_SIZE
>
> Otherwise, now that the default is TLS_MAX_PAYLOAD_SIZE, it will
> always be exported - which is not wrong either. So I'd either update
> the conditional so that the attribute is only exported for non-
> default
> sizes (like in v2), or drop the if() and always export it.
>
Yeah, that makes sense I'll drop the If() so that it's always exported
then.
> > + err = nla_put_u16(skb,
> > TLS_INFO_TX_RECORD_SIZE_LIM,
> > + tx_record_size_limit);
> > + if (err)
> > + goto nla_failure;
> > + }
>
> [...]
> > diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> > index bac65d0d4e3e..28fb796573d1 100644
> > --- a/net/tls/tls_sw.c
> > +++ b/net/tls/tls_sw.c
> > @@ -1079,7 +1079,7 @@ static int tls_sw_sendmsg_locked(struct sock
> > *sk, struct msghdr *msg,
> > orig_size = msg_pl->sg.size;
> > full_record = false;
> > try_to_copy = msg_data_left(msg);
> > - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl-
> > >sg.size;
> > + record_room = tls_ctx->tx_record_size_limit -
> > msg_pl->sg.size;
>
> If we entered tls_sw_sendmsg_locked with an existing open record,
> this
> could end up being negative and confuse the rest of the code.
>
> send(MSG_MORE) returns with an open record of length len1
> setsockopt(TLS_INFO_TX_RECORD_SIZE_LIM, limit < len1)
> send() -> record_room < 0
>
>
> Possibly not a problem with a "well-behaved" userspace, but we can't
> rely on that.
Good catch! what if we don't allow tx_record_size_limit to be set if
there's a pending open record. This should avoid userspace from atleast
causing the record_room < 0 if we somehow end up there.
So for example:
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7c0367dc5d40..34bb6690016c 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -841,20 +841,27 @@ static int
do_tls_setsockopt_tx_record_size(struct sock *sk, sockptr_t optval,
unsigned int optlen)
{
struct tls_context *ctx = tls_get_ctx(sk);
+ struct tls_sw_context_tx *sw_ctx = tls_sw_ctx_tx(ctx);
u16 value;
+ if (sw_ctx->open_rec)
+ return -EBUSY;
...
And to your follow up response:
```
> I suspect it's not a problem in practice because of what the TLS
> exchange between the peers setting up this extension looks like? (ie,
> there should never be an open record at this stage - unless userspace
> delays doing this setsockopt after getting the message from the peer,
> but then maybe we can call that a buggy userspace)
```
Yeah, record size limit extension occurs during a handshake
(Client/ServerHello). AFAIK, all of that is handled in tlshd/GnuTLS. We
shouldn't have any open records here at this point. For user-space
context, this is what support for record size limit looks like [1] in
tlshd.
If for whatever reason, as you mentioned, userspace decides to set it
later, change above could mitigate it for the open record case? I don't
think we need to try to fix things (or even can for records already
submitted to TCP) in the kernel.
[1]
WIP:https://github.com/twilfredo/ktls-utils/commit/73cb755acb4589ba31e4c42ef6b16cf5efdf3892
>
>
> Pushing out the pending "too big" record at the time we set
> tx_record_size_limit would likely make the peer close the connection
> (because it's already told us to limit our TX size), so I guess we'd
> have to split the pending record into tx_record_size_limit chunks
> before we start processing the new message (either directly at
> setsockopt(TLS_INFO_TX_RECORD_SIZE_LIM) time, or the next send/etc
> call). The final push during socket closing, and maybe some more
> codepaths that deal with ctx->open_rec, would also have to do that.
>
> I think additional selftests for
> send(MSG_MORE), TLS_INFO_TX_RECORD_SIZE_LIM, send
> and
> send(MSG_MORE), TLS_INFO_TX_RECORD_SIZE_LIM, close
> verifying the received record sizes would make sense, since it's a
> bit
> tricky to get that right.
Yeah I agree, I will work on that. Thanks for the feedback!
Regards,
Wilfred
Powered by blists - more mailing lists