[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID:
<PAXPR07MB7984AE543124B85E845F9E1CA3BD2@PAXPR07MB7984.eurprd07.prod.outlook.com>
Date: Wed, 16 Apr 2025 12:47:30 +0000
From: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>
To: Simon Horman <horms@...nel.org>
CC: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, "dave.taht@...il.com"
<dave.taht@...il.com>, "pabeni@...hat.com" <pabeni@...hat.com>,
"jhs@...atatu.com" <jhs@...atatu.com>, "kuba@...nel.org" <kuba@...nel.org>,
"stephen@...workplumber.org" <stephen@...workplumber.org>,
"xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>, "jiri@...nulli.us"
<jiri@...nulli.us>, "davem@...emloft.net" <davem@...emloft.net>,
"edumazet@...gle.com" <edumazet@...gle.com>, "andrew+netdev@...n.ch"
<andrew+netdev@...n.ch>, "donald.hunter@...il.com" <donald.hunter@...il.com>,
"ast@...erby.net" <ast@...erby.net>, "liuhangbin@...il.com"
<liuhangbin@...il.com>, "shuah@...nel.org" <shuah@...nel.org>,
"linux-kselftest@...r.kernel.org" <linux-kselftest@...r.kernel.org>,
"ij@...nel.org" <ij@...nel.org>, "ncardwell@...gle.com"
<ncardwell@...gle.com>, "Koen De Schepper (Nokia)"
<koen.de_schepper@...ia-bell-labs.com>, g.white <g.white@...lelabs.com>,
"ingemar.s.johansson@...csson.com" <ingemar.s.johansson@...csson.com>,
"mirja.kuehlewind@...csson.com" <mirja.kuehlewind@...csson.com>,
"cheshire@...le.com" <cheshire@...le.com>, "rs.ietf@....at" <rs.ietf@....at>,
"Jason_Livingood@...cast.com" <Jason_Livingood@...cast.com>, vidhi_goel
<vidhi_goel@...le.com>
Subject: RE: [PATCH v3 net-next 05/15] tcp: accecn: add AccECN rx byte
counters
> -----Original Message-----
> From: Simon Horman <horms@...nel.org>
> Sent: Wednesday, April 16, 2025 11:07 AM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>
> Cc: netdev@...r.kernel.org; dave.taht@...il.com; pabeni@...hat.com; jhs@...atatu.com; kuba@...nel.org; stephen@...workplumber.org; xiyou.wangcong@...il.com; jiri@...nulli.us; davem@...emloft.net; edumazet@...gle.com; andrew+netdev@...n.ch; donald.hunter@...il.com; ast@...erby.net; liuhangbin@...il.com; shuah@...nel.org; linux-kselftest@...r.kernel.org; ij@...nel.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white <g.white@...lelabs.com>; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel <vidhi_goel@...le.com>
> Subject: Re: [PATCH v3 net-next 05/15] tcp: accecn: add AccECN rx byte counters
>
>
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
> On Mon, Apr 14, 2025 at 03:13:05PM +0200, chia-yu.chang@...ia-bell-labs.com wrote:
> > From: Ilpo Järvinen <ij@...nel.org>
> >
> > These counters track IP ECN field payload byte sums for all arriving
> > (acceptable) packets. The AccECN option (added by a later patch in the
> > series) echoes these counters back to sender side.
> >
> > Signed-off-by: Ilpo Järvinen <ij@...nel.org>
> > Signed-off-by: Neal Cardwell <ncardwell@...gle.com>
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
> > ---
> > include/linux/tcp.h | 1 +
> > include/net/tcp.h | 18 +++++++++++++++++-
> > net/ipv4/tcp.c | 3 ++-
> > net/ipv4/tcp_input.c | 13 +++++++++----
> > net/ipv4/tcp_minisocks.c | 3 ++-
> > 5 files changed, 31 insertions(+), 7 deletions(-)
> >
> > diff --git a/include/linux/tcp.h b/include/linux/tcp.h index
> > af38fff24aa4..9cbfefd693e3 100644
> > --- a/include/linux/tcp.h
> > +++ b/include/linux/tcp.h
> > @@ -303,6 +303,7 @@ struct tcp_sock {
> > u32 delivered; /* Total data packets delivered incl. rexmits */
> > u32 delivered_ce; /* Like the above but only ECE marked packets */
> > u32 received_ce; /* Like the above but for rcvd CE marked pkts */
> > + u32 received_ecn_bytes[3];
> > u8 received_ce_pending:4, /* Not yet transmit cnt of received_ce */
> > unused2:4;
> > u32 app_limited; /* limited until "delivered" reaches this val */
>
> ...
>
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index
> > 73f8cc715bff..278990dba721 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -5092,6 +5092,7 @@ static void __init tcp_struct_check(void)
> > CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, delivered);
> > CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, delivered_ce);
> > CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock,
> > tcp_sock_write_txrx, received_ce);
> > + CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock,
> > + tcp_sock_write_txrx, received_ecn_bytes);
> > CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_limited);
> > CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_wnd);
> > CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock,
> > tcp_sock_write_txrx, rx_opt); @@ -5099,7 +5100,7 @@ static void __init tcp_struct_check(void)
> > /* 32bit arches with 8byte alignment on u64 fields might need padding
> > * before tcp_clock_cache.
> > */
> > - CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 97 + 7);
> > + CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock,
> > + tcp_sock_write_txrx, 109 + 3);
>
> Hi Ilpo,
>
> I think that incrementing 97 to 109 is correct, as 12 bytes have been added to this group.
>
> However, I do not think it is correct to decrement 7 to 3.
>
> On, at least, x86_64, x86_32 and arm64 that decrement does not cause any problems. (I assume it is also fine without the rest of this patch).
>
> But on (32-bit) ARM, this causes the assertion to fail.
> This is because on ARM an extra 4-byte hole is added just after pred_flags.
> And the assertion checks an upper bound on the size of the group.
>
> I assume this extra hole is due to alignment requirements.
> In any case on ARM, with this patch applied, pahole shows the group looking like this:
>
> __u8 __cacheline_group_begin__tcp_sock_write_txrx[0]; /* 1869 0 */
>
> /* XXX 3 bytes hole, try to pack */
>
> __be32 pred_flags; /* 1872 4 */
>
> /* XXX 4 bytes hole, try to pack */
>
> u64 tcp_clock_cache; /* 1880 8 */
> u64 tcp_mstamp; /* 1888 8 */
> u32 rcv_nxt; /* 1896 4 */
> u32 snd_nxt; /* 1900 4 */
> u32 snd_una; /* 1904 4 */
> u32 window_clamp; /* 1908 4 */
> u32 srtt_us; /* 1912 4 */
> u32 packets_out; /* 1916 4 */
> /* --- cacheline 30 boundary (1920 bytes) --- */
> u32 snd_up; /* 1920 4 */
> u32 delivered; /* 1924 4 */
> u32 delivered_ce; /* 1928 4 */
> u32 received_ce; /* 1932 4 */
> u32 received_ecn_bytes[3]; /* 1936 12 */
> u8 received_ce_pending:4; /* 1948: 0 1 */
> u8 unused2:4; /* 1948: 4 1 */
>
> /* XXX 3 bytes hole, try to pack */
>
> u32 app_limited; /* 1952 4 */
> u32 rcv_wnd; /* 1956 4 */
> struct tcp_options_received rx_opt; /* 1960 24 */
> /* --- cacheline 31 boundary (1984 bytes) --- */
> u8 nonagle:4; /* 1984: 0 1 */
> u8 rate_app_limited:1; /* 1984: 4 1 */
>
> /* XXX 3 bits hole, try to pack */
>
> __u8 __cacheline_group_end__tcp_sock_write_txrx[0]; /* 1985 0 */
>
> So we are now at 3 cache lines. And perhaps it is worth trying to pack things a bit. Or perhaps that becomes tricky to get right across different architectures. I didn't explore that.
>
> But, taking the naive approach: with the following update, tcp.c compiles compiles with allmodconfig on x86_64, x86_32, ARM and arm64 (I did not test others).
>
> CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 109 + 7);
>
> For the record, gcc 14.2.0 reports this problem as:
>
> CC net/ipv4/tcp.o
> In file included from <command-line>:
> In function 'tcp_struct_check',
> inlined from 'tcp_init' at net/ipv4/tcp.c:5133:2:
> ././include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_ 1403' declared with attribute error: BUILD_BUG_ON failed: offsetof(struct tcp_sock, __cacheline_group_end__tcp_sock_write_txrx) - offsetofend(struct tcp_sock, __cacheline_group_begin__tcp_sock_write_txrx) > 109 + 3
> 557 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^
> ././include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert'
> 538 | prefix ## suffix(); \
> | ^~~~~~
> ././include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert'
> 557 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
> | ^~~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
> 39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
> | ^~~~~~~~~~~~~~~~~~
> ./include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
> 50 | BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
> | ^~~~~~~~~~~~~~~~
> ./include/linux/cache.h:160:9: note: in expansion of macro 'BUILD_BUG_ON'
> 160 | BUILD_BUG_ON(offsetof(TYPE, __cacheline_group_end__##GROUP) - \
> | ^~~~~~~~~~~~
> net/ipv4/tcp.c:5103:9: note: in expansion of macro 'CACHELINE_ASSERT_GROUP_SIZE'
> 5103 | CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 109 + 3);
> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> ...
Thanks Simon for pointing out this issue, my previous calculation is without considering that for ARM one extra 4-byte hole is added.
I will modify and resubmit in the next version.
Chia-Yu
>
> --
> pw-bot: changes-requested
Powered by blists - more mailing lists