lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250416090701.GK395307@horms.kernel.org>
Date: Wed, 16 Apr 2025 10:07:01 +0100
From: Simon Horman <horms@...nel.org>
To: chia-yu.chang@...ia-bell-labs.com
Cc: netdev@...r.kernel.org, dave.taht@...il.com, pabeni@...hat.com,
	jhs@...atatu.com, kuba@...nel.org, stephen@...workplumber.org,
	xiyou.wangcong@...il.com, jiri@...nulli.us, davem@...emloft.net,
	edumazet@...gle.com, andrew+netdev@...n.ch, donald.hunter@...il.com,
	ast@...erby.net, liuhangbin@...il.com, shuah@...nel.org,
	linux-kselftest@...r.kernel.org, ij@...nel.org,
	ncardwell@...gle.com, koen.de_schepper@...ia-bell-labs.com,
	g.white@...lelabs.com, ingemar.s.johansson@...csson.com,
	mirja.kuehlewind@...csson.com, cheshire@...le.com, rs.ietf@....at,
	Jason_Livingood@...cast.com, vidhi_goel@...le.com
Subject: Re: [PATCH v3 net-next 05/15] tcp: accecn: add AccECN rx byte
 counters

On Mon, Apr 14, 2025 at 03:13:05PM +0200, chia-yu.chang@...ia-bell-labs.com wrote:
> From: Ilpo Järvinen <ij@...nel.org>
> 
> These counters track IP ECN field payload byte sums for all
> arriving (acceptable) packets. The AccECN option (added by
> a later patch in the series) echoes these counters back to
> sender side.
> 
> Signed-off-by: Ilpo Järvinen <ij@...nel.org>
> Signed-off-by: Neal Cardwell <ncardwell@...gle.com>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@...ia-bell-labs.com>
> ---
>  include/linux/tcp.h      |  1 +
>  include/net/tcp.h        | 18 +++++++++++++++++-
>  net/ipv4/tcp.c           |  3 ++-
>  net/ipv4/tcp_input.c     | 13 +++++++++----
>  net/ipv4/tcp_minisocks.c |  3 ++-
>  5 files changed, 31 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/tcp.h b/include/linux/tcp.h
> index af38fff24aa4..9cbfefd693e3 100644
> --- a/include/linux/tcp.h
> +++ b/include/linux/tcp.h
> @@ -303,6 +303,7 @@ struct tcp_sock {
>  	u32	delivered;	/* Total data packets delivered incl. rexmits */
>  	u32	delivered_ce;	/* Like the above but only ECE marked packets */
>  	u32	received_ce;	/* Like the above but for rcvd CE marked pkts */
> +	u32	received_ecn_bytes[3];
>  	u8	received_ce_pending:4, /* Not yet transmit cnt of received_ce */
>  		unused2:4;
>  	u32	app_limited;	/* limited until "delivered" reaches this val */

...

> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 73f8cc715bff..278990dba721 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -5092,6 +5092,7 @@ static void __init tcp_struct_check(void)
>  	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, delivered);
>  	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, delivered_ce);
>  	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, received_ce);
> +	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, received_ecn_bytes);
>  	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, app_limited);
>  	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rcv_wnd);
>  	CACHELINE_ASSERT_GROUP_MEMBER(struct tcp_sock, tcp_sock_write_txrx, rx_opt);
> @@ -5099,7 +5100,7 @@ static void __init tcp_struct_check(void)
>  	/* 32bit arches with 8byte alignment on u64 fields might need padding
>  	 * before tcp_clock_cache.
>  	 */
> -	CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 97 + 7);
> +	CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 109 + 3);

Hi Ilpo,

I think that incrementing 97 to 109 is correct, as 12 bytes have been
added to this group.

However, I do not think it is correct to decrement 7 to 3.

On, at least, x86_64, x86_32 and arm64 that decrement does not
cause any problems. (I assume it is also fine without the rest of this
patch).

But on (32-bit) ARM, this causes the assertion to fail.
This is because on ARM an extra 4-byte hole is added just after pred_flags.
And the assertion checks an upper bound on the size of the group.

I assume this extra hole is due to alignment requirements.
In any case on ARM, with this patch applied, pahole shows
the group looking like this:

        __u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  1869     0 */

        /* XXX 3 bytes hole, try to pack */

        __be32                     pred_flags;           /*  1872     4 */

        /* XXX 4 bytes hole, try to pack */

        u64                        tcp_clock_cache;      /*  1880     8 */
        u64                        tcp_mstamp;           /*  1888     8 */
        u32                        rcv_nxt;              /*  1896     4 */
        u32                        snd_nxt;              /*  1900     4 */
        u32                        snd_una;              /*  1904     4 */
        u32                        window_clamp;         /*  1908     4 */
        u32                        srtt_us;              /*  1912     4 */
        u32                        packets_out;          /*  1916     4 */
        /* --- cacheline 30 boundary (1920 bytes) --- */
        u32                        snd_up;               /*  1920     4 */
        u32                        delivered;            /*  1924     4 */
        u32                        delivered_ce;         /*  1928     4 */
        u32                        received_ce;          /*  1932     4 */
        u32                        received_ecn_bytes[3]; /*  1936    12 */
        u8                         received_ce_pending:4; /*  1948: 0  1 */
        u8                         unused2:4;            /*  1948: 4  1 */

        /* XXX 3 bytes hole, try to pack */

        u32                        app_limited;          /*  1952     4 */
        u32                        rcv_wnd;              /*  1956     4 */
        struct tcp_options_received rx_opt;              /*  1960    24 */
        /* --- cacheline 31 boundary (1984 bytes) --- */
        u8                         nonagle:4;            /*  1984: 0  1 */
        u8                         rate_app_limited:1;   /*  1984: 4  1 */

        /* XXX 3 bits hole, try to pack */

        __u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  1985     0 */

So we are now at 3 cache lines. And perhaps it is worth trying to pack
things a bit. Or perhaps that becomes tricky to get right across
different architectures. I didn't explore that.

But, taking the naive approach: with the following update, tcp.c compiles
compiles with allmodconfig on x86_64, x86_32, ARM and arm64 (I did not test
others).

	CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 109 + 7);

For the record, gcc 14.2.0 reports this problem as:

  CC      net/ipv4/tcp.o
In file included from <command-line>:
In function 'tcp_struct_check',
    inlined from 'tcp_init' at net/ipv4/tcp.c:5133:2:
././include/linux/compiler_types.h:557:45: error: call to '__compiletime_assert_
1403' declared with attribute error: BUILD_BUG_ON failed: offsetof(struct tcp_sock, __cacheline_group_end__tcp_sock_write_txrx) - offsetofend(struct tcp_sock, __cacheline_group_begin__tcp_sock_write_txrx) > 109 + 3
  557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |                                             ^
././include/linux/compiler_types.h:538:25: note: in definition of macro '__compiletime_assert'
  538 |                         prefix ## suffix();                             \
      |                         ^~~~~~
././include/linux/compiler_types.h:557:9: note: in expansion of macro '_compiletime_assert'
  557 |         _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
      |         ^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:39:37: note: in expansion of macro 'compiletime_assert'
   39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
      |                                     ^~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:50:9: note: in expansion of macro 'BUILD_BUG_ON_MSG'
   50 |         BUILD_BUG_ON_MSG(condition, "BUILD_BUG_ON failed: " #condition)
      |         ^~~~~~~~~~~~~~~~
./include/linux/cache.h:160:9: note: in expansion of macro 'BUILD_BUG_ON'
  160 |         BUILD_BUG_ON(offsetof(TYPE, __cacheline_group_end__##GROUP) - \
      |         ^~~~~~~~~~~~
net/ipv4/tcp.c:5103:9: note: in expansion of macro 'CACHELINE_ASSERT_GROUP_SIZE'
 5103 |         CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 109 + 3);
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~

...

-- 
pw-bot: changes-requested

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ