[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1564983d-5bbc-c28c-351e-fc6ab8d4d4eb@kernel.org>
Date: Tue, 6 May 2025 20:40:30 +0300 (EEST)
From: Ilpo Järvinen <ij@...nel.org>
To: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>
cc: Paolo Abeni <pabeni@...hat.com>, "horms@...nel.org" <horms@...nel.org>,
"dsahern@...nel.org" <dsahern@...nel.org>,
"kuniyu@...zon.com" <kuniyu@...zon.com>,
"bpf@...r.kernel.org" <bpf@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"dave.taht@...il.com" <dave.taht@...il.com>,
"jhs@...atatu.com" <jhs@...atatu.com>, "kuba@...nel.org" <kuba@...nel.org>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>,
"jiri@...nulli.us" <jiri@...nulli.us>,
"davem@...emloft.net" <davem@...emloft.net>,
"edumazet@...gle.com" <edumazet@...gle.com>,
"andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
"donald.hunter@...il.com" <donald.hunter@...il.com>,
"ast@...erby.net" <ast@...erby.net>,
"liuhangbin@...il.com" <liuhangbin@...il.com>,
"shuah@...nel.org" <shuah@...nel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
"ncardwell@...gle.com" <ncardwell@...gle.com>,
"Koen De Schepper (Nokia)" <koen.de_schepper@...ia-bell-labs.com>,
"g.white" <g.white@...lelabs.com>,
"ingemar.s.johansson@...csson.com" <ingemar.s.johansson@...csson.com>,
"mirja.kuehlewind@...csson.com" <mirja.kuehlewind@...csson.com>,
"cheshire@...le.com" <cheshire@...le.com>,
"rs.ietf@....at" <rs.ietf@....at>,
"Jason_Livingood@...cast.com" <Jason_Livingood@...cast.com>,
vidhi_goel <vidhi_goel@...le.com>
Subject: RE: [PATCH v5 net-next 09/15] tcp: accecn: AccECN option
On Tue, 6 May 2025, Chia-Yu Chang (Nokia) wrote:
> > -----Original Message-----
> > From: Ilpo Järvinen <ij@...nel.org>
> > Sent: Tuesday, May 6, 2025 12:54 AM
> > To: Paolo Abeni <pabeni@...hat.com>
> > Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; horms@...nel.org; dsahern@...nel.org; kuniyu@...zon.com; bpf@...r.kernel.org; netdev@...r.kernel.org; dave.taht@...il.com; jhs@...atatu.com; kuba@...nel.org; stephen@...workplumber.org; xiyou.wangcong@...il.com; jiri@...nulli.us; davem@...emloft.net; edumazet@...gle.com; andrew+netdev@...n.ch; donald.hunter@...il.com; ast@...erby.net; liuhangbin@...il.com; shuah@...nel.org; linux-kselftest@...r.kernel.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white <g.white@...lelabs.com>; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel <vidhi_goel@...le.com>
> > Subject: Re: [PATCH v5 net-next 09/15] tcp: accecn: AccECN option
> >
> >
> > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> >
> >
> >
> > On Tue, 29 Apr 2025, Paolo Abeni wrote:
> >
> > > On 4/22/25 5:35 PM, chia-yu.chang@...ia-bell-labs.com wrote:
> > > > @@ -302,10 +303,13 @@ struct tcp_sock {
> > > > u32 snd_up; /* Urgent pointer */
> > > > u32 delivered; /* Total data packets delivered incl. rexmits */
> > > > u32 delivered_ce; /* Like the above but only ECE marked packets */
> > > > + u32 delivered_ecn_bytes[3];
> > >
> > > This new fields do not belong to this cacheline group. I'm unsure they
> > > belong to fast-path at all. Also u32 will wrap-around very soon.
> > >
> > > [...]
> > > > diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
> > > > index dc8fdc80e16b..74ac8a5d2e00 100644
> > > > --- a/include/uapi/linux/tcp.h
> > > > +++ b/include/uapi/linux/tcp.h
> > > > @@ -298,6 +298,13 @@ struct tcp_info {
> > > > __u32 tcpi_snd_wnd; /* peer's advertised receive window after
> > > > * scaling (bytes)
> > > > */
> > > > + __u32 tcpi_received_ce; /* # of CE marks received */
> > > > + __u32 tcpi_delivered_e1_bytes; /* Accurate ECN byte counters */
> > > > + __u32 tcpi_delivered_e0_bytes;
> > > > + __u32 tcpi_delivered_ce_bytes;
> > > > + __u32 tcpi_received_e1_bytes;
> > > > + __u32 tcpi_received_e0_bytes;
> > > > + __u32 tcpi_received_ce_bytes;
> > >
> > > This will break uAPI: new fields must be addded at the end, or must
> > > fill existing holes. Also u32 set in stone in uAPI for a byte counter
> > > looks way too small.
> > >
> > > > @@ -5100,7 +5113,7 @@ static void __init tcp_struct_check(void)
> > > > /* 32bit arches with 8byte alignment on u64 fields might need padding
> > > > * before tcp_clock_cache.
> > > > */
> > > > - CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 109 + 7);
> > > > + CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock,
> > > > + tcp_sock_write_txrx, 122 + 6);
> > >
> > > The above means an additional cacheline in fast-path WRT the current
> > > status. IMHO should be avoided.
> > >
> > > > diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> > > > 5bd7fc9bcf66..41e45b9aff3f 100644
> > > > --- a/net/ipv4/tcp_input.c
> > > > +++ b/net/ipv4/tcp_input.c
> > > > @@ -70,6 +70,7 @@
> > > > #include <linux/sysctl.h>
> > > > #include <linux/kernel.h>
> > > > #include <linux/prefetch.h>
> > > > +#include <linux/bitops.h>
> > > > #include <net/dst.h>
> > > > #include <net/tcp.h>
> > > > #include <net/proto_memory.h>
> > > > @@ -499,6 +500,144 @@ static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, const struct tcphdr
> > > > return false;
> > > > }
> > > >
> > > > +/* Maps IP ECN field ECT/CE code point to AccECN option field
> > > > +number, given
> > > > + * we are sending fields with Accurate ECN Order 1: ECT(1), CE, ECT(0).
> > > > + */
> > > > +static u8 tcp_ecnfield_to_accecn_optfield(u8 ecnfield) {
> > > > + switch (ecnfield) {
> > > > + case INET_ECN_NOT_ECT:
> > > > + return 0; /* AccECN does not send counts of NOT_ECT */
> > > > + case INET_ECN_ECT_1:
> > > > + return 1;
> > > > + case INET_ECN_CE:
> > > > + return 2;
> > > > + case INET_ECN_ECT_0:
> > > > + return 3;
> > > > + default:
> > > > + WARN_ONCE(1, "bad ECN code point: %d\n", ecnfield);
> > >
> > > No WARN_ONCE() above please: either the 'ecnfield' data is masked vs
> > > INET_ECN_MASK and the WARN_ONCE should not be possible or a remote
> > > sender can deterministically trigger a WARN() which nowadays will in
> > > turn raise a CVE...
> > >
> > > [...]
> > > > +static u32 tcp_accecn_field_init_offset(u8 ecnfield) {
> > > > + switch (ecnfield) {
> > > > + case INET_ECN_NOT_ECT:
> > > > + return 0; /* AccECN does not send counts of NOT_ECT */
> > > > + case INET_ECN_ECT_1:
> > > > + return TCP_ACCECN_E1B_INIT_OFFSET;
> > > > + case INET_ECN_CE:
> > > > + return TCP_ACCECN_CEB_INIT_OFFSET;
> > > > + case INET_ECN_ECT_0:
> > > > + return TCP_ACCECN_E0B_INIT_OFFSET;
> > > > + default:
> > > > + WARN_ONCE(1, "bad ECN code point: %d\n", ecnfield);
> > >
> > > Same as above.
> > >
> > > > + }
> > > > + return 0;
> > > > +}
> > > > +
> > > > +/* Maps AccECN option field #nr to IP ECN field ECT/CE bits */
> > > > +static unsigned int tcp_accecn_optfield_to_ecnfield(unsigned int optfield,
> > > > + bool order) {
> > > > + u8 tmp;
> > > > +
> > > > + optfield = order ? 2 - optfield : optfield;
> > > > + tmp = optfield + 2;
> > > > +
> > > > + return (tmp + (tmp >> 2)) & INET_ECN_MASK; }
> > > > +
> > > > +/* Handles AccECN option ECT and CE 24-bit byte counters update
> > > > +into
> > > > + * the u32 value in tcp_sock. As we're processing TCP options, it
> > > > +is
> > > > + * safe to access from - 1.
> > > > + */
> > > > +static s32 tcp_update_ecn_bytes(u32 *cnt, const char *from, u32
> > > > +init_offset) {
> > > > + u32 truncated = (get_unaligned_be32(from - 1) - init_offset) &
> > > > + 0xFFFFFFU;
> > > > + u32 delta = (truncated - *cnt) & 0xFFFFFFU;
> > > > +
> > > > + /* If delta has the highest bit set (24th bit) indicating
> > > > + * negative, sign extend to correct an estimation using
> > > > + * sign_extend32(delta, 24 - 1)
> > > > + */
> > > > + delta = sign_extend32(delta, 23);
> > > > + *cnt += delta;
> > > > + return (s32)delta;
> > > > +}
> > > > +
> > > > +/* Returns true if the byte counters can be used */ static bool
> > > > +tcp_accecn_process_option(struct tcp_sock *tp,
> > > > + const struct sk_buff *skb,
> > > > + u32 delivered_bytes, int flag) {
> > > > + u8 estimate_ecnfield = tp->est_ecnfield;
> > > > + bool ambiguous_ecn_bytes_incr = false;
> > > > + bool first_changed = false;
> > > > + unsigned int optlen;
> > > > + unsigned char *ptr;
> >
> > u8 would we more appropriate type for binary data.
>
> Hi Ilpo,
>
> Not sure I understand your point, could you elaborate which binary data
> you think shall use u8?
The header/option is binary data so u8 seems the right type for it. So:
u8 *ptr;
--
i.
Powered by blists - more mailing lists