[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
<PAXPR07MB7984F66EB2AD576D2385C351A357A@PAXPR07MB7984.eurprd07.prod.outlook.com>
Date: Tue, 15 Jul 2025 14:49:34 +0000
From: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>
To: Paolo Abeni <pabeni@...hat.com>, "edumazet@...gle.com"
<edumazet@...gle.com>, "linux-doc@...r.kernel.org"
<linux-doc@...r.kernel.org>, "corbet@....net" <corbet@....net>,
"horms@...nel.org" <horms@...nel.org>, "dsahern@...nel.org"
<dsahern@...nel.org>, "kuniyu@...zon.com" <kuniyu@...zon.com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>, "netdev@...r.kernel.org"
<netdev@...r.kernel.org>, "dave.taht@...il.com" <dave.taht@...il.com>,
"jhs@...atatu.com" <jhs@...atatu.com>, "kuba@...nel.org" <kuba@...nel.org>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>, "jiri@...nulli.us"
<jiri@...nulli.us>, "davem@...emloft.net" <davem@...emloft.net>,
"andrew+netdev@...n.ch" <andrew+netdev@...n.ch>, "donald.hunter@...il.com"
<donald.hunter@...il.com>, "ast@...erby.net" <ast@...erby.net>,
"liuhangbin@...il.com" <liuhangbin@...il.com>, "shuah@...nel.org"
<shuah@...nel.org>, "linux-kselftest@...r.kernel.org"
<linux-kselftest@...r.kernel.org>, "ij@...nel.org" <ij@...nel.org>,
"ncardwell@...gle.com" <ncardwell@...gle.com>, "Koen De Schepper (Nokia)"
<koen.de_schepper@...ia-bell-labs.com>, "g.white@...lelabs.com"
<g.white@...lelabs.com>, "ingemar.s.johansson@...csson.com"
<ingemar.s.johansson@...csson.com>, "mirja.kuehlewind@...csson.com"
<mirja.kuehlewind@...csson.com>, "cheshire@...le.com" <cheshire@...le.com>,
"rs.ietf@....at" <rs.ietf@....at>, "Jason_Livingood@...cast.com"
<Jason_Livingood@...cast.com>, "vidhi_goel@...le.com" <vidhi_goel@...le.com>
Subject: RE: [PATCH v12 net-next 11/15] tcp: accecn: AccECN option
> -----Original Message-----
> From: Paolo Abeni <pabeni@...hat.com>
> Sent: Monday, July 14, 2025 4:32 PM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; edumazet@...gle.com; linux-doc@...r.kernel.org; corbet@....net; horms@...nel.org; dsahern@...nel.org; kuniyu@...zon.com; bpf@...r.kernel.org; netdev@...r.kernel.org; dave.taht@...il.com; jhs@...atatu.com; kuba@...nel.org; stephen@...workplumber.org; xiyou.wangcong@...il.com; jiri@...nulli.us; davem@...emloft.net; andrew+netdev@...n.ch; donald.hunter@...il.com; ast@...erby.net; liuhangbin@...il.com; shuah@...nel.org; linux-kselftest@...r.kernel.org; ij@...nel.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white@...lelabs.com; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel@...le.com
> Subject: Re: [PATCH v12 net-next 11/15] tcp: accecn: AccECN option
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On 7/4/25 10:53 AM, chia-yu.chang@...ia-bell-labs.com wrote:
[...]
> > +}
> > +
> > +/* Handles AccECN option ECT and CE 24-bit byte counters update into
> > + * the u32 value in tcp_sock. As we're processing TCP options, it is
> > + * safe to access from - 1.
> > + */
> > +static inline s32 tcp_update_ecn_bytes(u32 *cnt, const char *from,
> > + u32 init_offset) {
> > + u32 truncated = (get_unaligned_be32(from - 1) - init_offset) &
> > + 0xFFFFFFU;
> > + u32 delta = (truncated - *cnt) & 0xFFFFFFU;
> > +
> > + /* If delta has the highest bit set (24th bit) indicating
> > + * negative, sign extend to correct an estimation using
> > + * sign_extend32(delta, 24 - 1)
> > + */
> > + delta = sign_extend32(delta, 23);
>
> I'm under the impression that delta could be simply:
>
> delta = (truncated - *cnt)
>
> What am I missing?
Hi Paolo,
I think this code is necessary to ensure delta will not a super large value in case of wrap adound.
For instance, if truncated = 0x0000001F and *cnt = 0x00FFFFFF, then (truncated - *cnt) = 0xFF000020
But sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0x00000020, which shall be corrrect.
Another example, if truncated = 0x0000001F and *cnt = 0x0000003F, then (truncated - *cnt) = 0xFFFFFFE0
And sign_extend32(((truncated - *cnt) & 0xFFFFFFU, 23) = 0xFFFFFFE0.
In this latter example, both are correct.
[...]
> > a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> > d98a1a17eb52..2169fd28594e 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -385,6 +385,7 @@ static inline bool tcp_urg_mode(const struct tcp_sock *tp)
> > #define OPTION_SMC BIT(9)
> > #define OPTION_MPTCP BIT(10)
> > #define OPTION_AO BIT(11)
> > +#define OPTION_ACCECN BIT(12)
> >
> > static void smc_options_write(__be32 *ptr, u16 *options) { @@ -406,6
> > +407,8 @@ struct tcp_out_options {
> > u16 mss; /* 0 to disable */
> > u8 ws; /* window scale, 0 to disable */
> > u8 num_sack_blocks; /* number of SACK blocks to include */
> > + u8 num_accecn_fields:7, /* number of AccECN fields needed */
> > + use_synack_ecn_bytes:1; /* Use synack_ecn_bytes or not */
> > u8 hash_size; /* bytes in hash_location */
> > u8 bpf_opt_len; /* length of BPF hdr option */
> > __u8 *hash_location; /* temporary pointer, overloaded */
> > @@ -621,6 +624,8 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
> > struct tcp_out_options *opts,
> > struct tcp_key *key) {
> > + u8 leftover_highbyte = TCPOPT_NOP; /* replace 1st NOP if avail */
> > + u8 leftover_lowbyte = TCPOPT_NOP; /* replace 2nd NOP in
> > + succession */
> > __be32 *ptr = (__be32 *)(th + 1);
> > u16 options = opts->options; /* mungable copy */
> >
> > @@ -656,15 +661,79 @@ static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
> > *ptr++ = htonl(opts->tsecr);
> > }
> >
> > + if (OPTION_ACCECN & options) {
> > + /* Initial values for AccECN option, ordered is based on ECN field bits
> > + * similar to received_ecn_bytes. Used for SYN/ACK AccECN option.
> > + */
> > + static u32 synack_ecn_bytes[3] = { 0, 0, 0 };
>
> I think this does not address Eric's concern on v9 WRT global variable, as every CPU will still touch the same memory while accessing the above array.
>
> > + const u8 ect0_idx = INET_ECN_ECT_0 - 1;
> > + const u8 ect1_idx = INET_ECN_ECT_1 - 1;
> > + const u8 ce_idx = INET_ECN_CE - 1;
> > + u32 e0b;
> > + u32 e1b;
> > + u32 ceb;
> > + u8 len;
> > +
> > + if (opts->use_synack_ecn_bytes) {
> > + e0b = synack_ecn_bytes[ect0_idx] + TCP_ACCECN_E0B_INIT_OFFSET;
> > + e1b = synack_ecn_bytes[ect1_idx] + TCP_ACCECN_E1B_INIT_OFFSET;
> > + ceb = synack_ecn_bytes[ce_idx] +
> > + TCP_ACCECN_CEB_INIT_OFFSET;
>
> On the flip side I don't see such array modified here, not in later patches?!? If so you could make it const and a global variable would be ok.
Sure, I will make it as static const global variable, which I hope this is ok for you.
> > +/* Calculates how long AccECN option will fit to @remaining option space.
> > + *
> > + * AccECN option can sometimes replace NOPs used for alignment of
> > +other
> > + * TCP options (up to @max_combine_saving available).
> > + *
> > + * Only solutions with at least @required AccECN fields are accepted.
> > + *
> > + * Returns: The size of the AccECN option excluding space repurposed
> > +from
> > + * the alignment of the other options.
> > + */
> > +static int tcp_options_fit_accecn(struct tcp_out_options *opts, int required,
> > + int remaining) {
> > + int size = TCP_ACCECN_MAXSIZE;
> > + int max_combine_saving;
> > +
> > + if (opts->use_synack_ecn_bytes)
> > + max_combine_saving = tcp_synack_options_combine_saving(opts);
> > + else
> > + max_combine_saving = opts->num_sack_blocks > 0 ? 2 : 0;
> > + opts->num_accecn_fields = TCP_ACCECN_NUMFIELDS;
> > + while (opts->num_accecn_fields >= required) {
> > + int leftover_size = size & 0x3;
> > + /* Pad to dword if cannot combine */
> > + if (leftover_size > max_combine_saving)
> > + leftover_size = -((4 - leftover_size) & 0x3);
>
> I *think* that with the above you mean something alike:
>
> size = ALIGN(size, 4);
> leftover_size = 0
>
> ?
>
> The used code looks quite obscure to me.
>
> /P
Indeed, I will make below changes in the next version by using ALIGN() and ALIGN_DOWN()
Here the aim is to pad up (if max_combine_saving is not enough) or trim down (if max_combine saving is enough) to DWORD.
And the final return size will be the the a multiple of DWORD.
Would it be more readable?
/* Pad to DWORD if cannot combine. Align_size represents
* the final size to be used by AccECN options.
* +======+=============+====================+============+
* | size | size exceed | max_combine_saving | align_size |
* | | DWORD | | |
* +======+=============+====================+============+
* | 2 | 2 | < 2 | 4 |
* | 2 | 2 | >=2 | 0 |
* | 5 | 1 | < 1 | 8 |
* | 5 | 1 | >=1 | 4 |
* | 8 | 0 | Any | 8 |
* | 11 | 3 | < 3 | 12 |
* | 11 | 3 | >=3 | 8 |
* +======+=============+====================+============+
*/
if ((size & 0x3) > max_combine_saving)
align_size = ALIGN(size, 4);
else
align_size = ALIGN_DOWN(size, 4);
if (remaining >= align_size) {
size = align_size;
break;
}
Powered by blists - more mailing lists