lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <PAXPR07MB7984B27C12DCE5EFCDCC48ADA389A@PAXPR07MB7984.eurprd07.prod.outlook.com>
Date: Tue, 6 May 2025 13:56:07 +0000
From: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>
To: Paolo Abeni <pabeni@...hat.com>, "horms@...nel.org" <horms@...nel.org>,
	"dsahern@...nel.org" <dsahern@...nel.org>, "kuniyu@...zon.com"
	<kuniyu@...zon.com>, "bpf@...r.kernel.org" <bpf@...r.kernel.org>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "dave.taht@...il.com"
	<dave.taht@...il.com>, "jhs@...atatu.com" <jhs@...atatu.com>,
	"kuba@...nel.org" <kuba@...nel.org>, "stephen@...workplumber.org"
	<stephen@...workplumber.org>, "xiyou.wangcong@...il.com"
	<xiyou.wangcong@...il.com>, "jiri@...nulli.us" <jiri@...nulli.us>,
	"davem@...emloft.net" <davem@...emloft.net>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "andrew+netdev@...n.ch" <andrew+netdev@...n.ch>,
	"donald.hunter@...il.com" <donald.hunter@...il.com>, "ast@...erby.net"
	<ast@...erby.net>, "liuhangbin@...il.com" <liuhangbin@...il.com>,
	"shuah@...nel.org" <shuah@...nel.org>, "linux-kselftest@...r.kernel.org"
	<linux-kselftest@...r.kernel.org>, "ij@...nel.org" <ij@...nel.org>,
	"ncardwell@...gle.com" <ncardwell@...gle.com>, "Koen De Schepper (Nokia)"
	<koen.de_schepper@...ia-bell-labs.com>, g.white <g.white@...lelabs.com>,
	"ingemar.s.johansson@...csson.com" <ingemar.s.johansson@...csson.com>,
	"mirja.kuehlewind@...csson.com" <mirja.kuehlewind@...csson.com>,
	"cheshire@...le.com" <cheshire@...le.com>, "rs.ietf@....at" <rs.ietf@....at>,
	"Jason_Livingood@...cast.com" <Jason_Livingood@...cast.com>, vidhi_goel
	<vidhi_goel@...le.com>
CC: "Olivier Tilmans (Nokia)" <olivier.tilmans@...ia.com>
Subject: RE: [PATCH v5 net-next 03/15] tcp: AccECN core

> -----Original Message-----
> From: Chia-Yu Chang (Nokia) 
> Sent: Monday, May 5, 2025 5:25 PM
> To: Paolo Abeni <pabeni@...hat.com>; horms@...nel.org; dsahern@...nel.org; kuniyu@...zon.com; bpf@...r.kernel.org; netdev@...r.kernel.org; dave.taht@...il.com; jhs@...atatu.com; kuba@...nel.org; stephen@...workplumber.org; xiyou.wangcong@...il.com; jiri@...nulli.us; davem@...emloft.net; edumazet@...gle.com; andrew+netdev@...n.ch; donald.hunter@...il.com; ast@...erby.net; liuhangbin@...il.com; shuah@...nel.org; linux-kselftest@...r.kernel.org; ij@...nel.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white <g.white@...lelabs.com>; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel <vidhi_goel@...le.com>
> Cc: Olivier Tilmans (Nokia) <olivier.tilmans@...ia.com>
> Subject: RE: [PATCH v5 net-next 03/15] tcp: AccECN core
> 
> > -----Original Message-----
> > From: Paolo Abeni <pabeni@...hat.com>
> > Sent: Tuesday, April 29, 2025 12:14 PM
> > To: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; 
> > horms@...nel.org; dsahern@...nel.org; kuniyu@...zon.com; 
> > bpf@...r.kernel.org; netdev@...r.kernel.org; dave.taht@...il.com; 
> > jhs@...atatu.com; kuba@...nel.org; stephen@...workplumber.org; 
> > xiyou.wangcong@...il.com; jiri@...nulli.us; davem@...emloft.net; 
> > edumazet@...gle.com; andrew+netdev@...n.ch; donald.hunter@...il.com; 
> > ast@...erby.net; liuhangbin@...il.com; shuah@...nel.org; 
> > linux-kselftest@...r.kernel.org; ij@...nel.org; ncardwell@...gle.com; 
> > Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; 
> > g.white <g.white@...lelabs.com>; ingemar.s.johansson@...csson.com; 
> > mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; 
> > Jason_Livingood@...cast.com; vidhi_goel <vidhi_goel@...le.com>
> > Cc: Olivier Tilmans (Nokia) <olivier.tilmans@...ia.com>
> > Subject: Re: [PATCH v5 net-next 03/15] tcp: AccECN core
> > 
> > 
> > CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> > 
> > 
> > 
> > On 4/22/25 5:35 PM, chia-yu.chang@...ia-bell-labs.com wrote:
> > > @@ -298,6 +298,9 @@ struct tcp_sock {
> > >       u32     snd_up;         /* Urgent pointer               */
> > >       u32     delivered;      /* Total data packets delivered incl. rexmits */
> > >       u32     delivered_ce;   /* Like the above but only ECE marked packets */
> > > +     u32     received_ce;    /* Like the above but for rcvd CE marked pkts */
> > > +     u8      received_ce_pending:4, /* Not yet transmit cnt of received_ce */
> > > +             unused2:4;
> > 
> > AFAICS this uses a 4 bytes hole present prior to this patch after "rcv_wnd", leaving a 3 bytes hole after 'unused2'. Possibly should be worth mentioning the hole presence.
> > 
> > @Eric: would it make sense use this hole for 'noneagle'/'rate_app_limited' and shrink the 'tcp_sock_write_txrx' group a bit?

Hi,

By moving noneagle/rate_app_limited in the beginning of this group, I manage to reuse the 3-byte hole in the beginning of __cacheline_group_begin__tcp_sock_write_txrx.
Thus, I will include it in the next version, and you can find the pahole results below:


/*BEFORE this patch*/
__u8                       __cacheline_group_end__tcp_sock_write_tx[0]; /*  2585     0 */
__u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2585     0 */

/* XXX 3 bytes hole, try to pack */

__be32                     pred_flags;           /*  2588     4 */
u64                        tcp_clock_cache;      /*  2592     8 */
u64                        tcp_mstamp;           /*  2600     8 */
u32                        rcv_nxt;              /*  2608     4 */
u32                        snd_nxt;              /*  2612     4 */
u32                        snd_una;              /*  2616     4 */
u32                        window_clamp;         /*  2620     4 */
/* --- cacheline 41 boundary (2624 bytes) --- */
u32                        srtt_us;              /*  2624     4 */
u32                        packets_out;          /*  2628     4 */
u32                        snd_up;               /*  2632     4 */
u32                        delivered;            /*  2636     4 */
u32                        delivered_ce;         /*  2640     4 */
u32                        app_limited;          /*  2644     4 */
u32                        rcv_wnd;              /*  2648     4 */
struct tcp_options_received rx_opt;              /*  2652    24 */
u8                         nonagle:4;            /*  2676: 0  1 */
u8                         rate_app_limited:1;   /*  2676: 4  1 */

/* XXX 3 bits hole, try to pack */

__u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  2677     0 */

/* XXX 3 bytes hole, try to pack */

__u8                       __cacheline_group_begin__tcp_sock_write_rx[0] __attribute__((__aligned__(8))); /*  2680     0 */


/*AFTER this patch*/
__u8                       __cacheline_group_end__tcp_sock_write_tx[0]; /*  2585     0 */
__u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2585     0 */
u8                         nonagle:4;            /*  2585: 0  1 */
u8                         rate_app_limited:1;   /*  2585: 4  1 */

/* XXX 3 bits hole, try to pack */

/* Force alignment to the next boundary: */
u8                         :0;

u8                         received_ce_pending:4; /*  2586: 0  1 */
u8                         unused2:4;            /*  2586: 4  1 */

/* XXX 1 byte hole, try to pack */

__be32                     pred_flags;           /*  2588     4 */
u64                        tcp_clock_cache;      /*  2592     8 */
u64                        tcp_mstamp;           /*  2600     8 */
u32                        rcv_nxt;              /*  2608     4 */
u32                        snd_nxt;              /*  2612     4 */
u32                        snd_una;              /*  2616     4 */
u32                        window_clamp;         /*  2620     4 */
/* --- cacheline 41 boundary (2624 bytes) --- */
u32                        srtt_us;              /*  2624     4 */
u32                        packets_out;          /*  2628     4 */
u32                        snd_up;               /*  2632     4 */
u32                        delivered;            /*  2636     4 */
u32                        delivered_ce;         /*  2640     4 */
u32                        received_ce;          /*  2644     4 */
u32                        app_limited;          /*  2648     4 */
u32                        rcv_wnd;              /*  2652     4 */
struct tcp_options_received rx_opt;              /*  2656    24 */
__u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  2680     0 */
__u8                       __cacheline_group_begin__tcp_sock_write_rx[0] __attribute__((__aligned__(8))); /*  2680     0 */

> 
> Hi Paolo,
> 
> 	Thanks for the feedback and sorry for my late response.
> 	I can either mention it here or move the places.
> 	However, as the following patches will continue change holes, so maybe I mention the hole change per patch make it more understandable.
> 	If this is find for you, then I will make revision in the next version.
> 
> > 
> > [...]
> > > @@ -5095,7 +5097,7 @@ static void __init tcp_struct_check(void)
> > >       /* 32bit arches with 8byte alignment on u64 fields might need padding
> > >        * before tcp_clock_cache.
> > >        */
> > > -     CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, tcp_sock_write_txrx, 92 + 4);
> > > +     CACHELINE_ASSERT_GROUP_SIZE(struct tcp_sock, 
> > > + tcp_sock_write_txrx, 97 + 7);
> > 
> > Really? I *think* the change here should not move the cacheline end 
> > around, due to holes. Could you please include the relevant pahole
> > (trimmed) output prior to this patch and after in the commit message?
> > 
> 
> Here is pahole output before and after this patch.
> Indeed, it creates 3 bytes hole after 'unused2' so it shall add (5+3)=8 to the original 92 + 4.
> Finally, it will be 92 + 4 + (5 + 3) = 97 + 7.
> 	
> *BEFORE this patch*
>     __u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2585     0 */
> 
>     /* XXX 3 bytes hole, try to pack */
> 
>     __be32                     pred_flags;           /*  2588     4 */
>     u64                        tcp_clock_cache;      /*  2592     8 */
>     u64                        tcp_mstamp;           /*  2600     8 */
>     u32                        rcv_nxt;              /*  2608     4 */
>     u32                        snd_nxt;              /*  2612     4 */
>     u32                        snd_una;              /*  2616     4 */
>     u32                        window_clamp;         /*  2620     4 */
>     /* --- cacheline 41 boundary (2624 bytes) --- */
>     u32                        srtt_us;              /*  2624     4 */
>     u32                        packets_out;          /*  2628     4 */
>     u32                        snd_up;               /*  2632     4 */
>     u32                        delivered;            /*  2636     4 */
>     u32                        delivered_ce;         /*  2640     4 */
>     u32                        app_limited;          /*  2644     4 */
>     u32                        rcv_wnd;              /*  2648     4 */
>     struct tcp_options_received rx_opt;              /*  2652    24 */
>     u8                         nonagle:4;            /*  2676: 0  1 */
>     u8                         rate_app_limited:1;   /*  2676: 4  1 */
> 
>     /* XXX 3 bits hole, try to pack */
> 
>     __u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  2677     0 */
> 
>     /* XXX 3 bytes hole, try to pack */
> 
> *AFTER this patch*
>     __u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  2585     0 */
> 
>     /* XXX 3 bytes hole, try to pack */              
>     
>     __be32                     pred_flags;           /*  2588     4 */
>     u64                        tcp_clock_cache;      /*  2592     8 */
>     u64                        tcp_mstamp;           /*  2600     8 */
>     u32                        rcv_nxt;              /*  2608     4 */
>     u32                        snd_nxt;              /*  2612     4 */
>     u32                        snd_una;              /*  2616     4 */
>     u32                        window_clamp;         /*  2620     4 */
>     /* --- cacheline 41 boundary (2624 bytes) --- */
>     u32                        srtt_us;              /*  2624     4 */
>     u32                        packets_out;          /*  2628     4 */
>     u32                        snd_up;               /*  2632     4 */
>     u32                        delivered;            /*  2636     4 */
>     u32                        delivered_ce;         /*  2640     4 */ 
>     u32                        received_ce;          /*  2644     4 */
>     u8                         received_ce_pending:4; /*  2648: 0  1 */
>     u8                         unused2:4;            /*  2648: 4  1 */
> 
>     /* XXX 3 bytes hole, try to pack */
> 
>     u32                        app_limited;          /*  2652     4 */
>     u32                        rcv_wnd;              /*  2656     4 */
>     struct tcp_options_received rx_opt;              /*  2660    24 */
>     u8                         nonagle:4;            /*  2684: 0  1 */
>     u8                         rate_app_limited:1;   /*  2684: 4  1 */
> 
>     /* XXX 3 bits hole, try to pack */
> 
>     __u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  2685     0 */
> 
> 
> > [...]
> > > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb)
> > >               if (tcp_ca_needs_ecn(sk))
> > >                       tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
> > >
> > > -             if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
> > > +             if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&
> > > +                 tcp_ecn_mode_rfc3168(tp)) {
> > >                       /* Better not delay acks, sender can have a very low cwnd */
> > >                       tcp_enter_quickack_mode(sk, 2);
> > >                       tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
> > >               }
> > > -             tp->ecn_flags |= TCP_ECN_SEEN;
> > 
> > At this point is not entirely clear to me why the removal of the above line is needed/correct.
> 
> What I see is to move the place to set this flag from here to tcp_ecn_received_counters().
> 
> Also, this function will work when receiving data by calling tcp_event_data_recv(). 
> 
> While tcp_ecn_received_counters() takes effect in more places (e.g., either len <= tcp_header_len or NOT) to ensure ACE counter tracks all segments including pure ACKs.
> 
> > [...]
> > > @@ -4056,6 +4118,11 @@ static int tcp_ack(struct sock *sk, const 
> > > struct sk_buff *skb, int flag)
> > >
> > >       tcp_rack_update_reo_wnd(sk, &rs);
> > >
> > > +     if (tcp_ecn_mode_accecn(tp))
> > > +             ecn_count = tcp_accecn_process(sk, skb,
> > > +                                            tp->delivered - delivered,
> > > +                                            &flag);
> > 
> > AFAICS the above could set FLAG_ECE in flags, menaning the previous
> > tcp_clean_rtx_queue() will run with such flag cleared and the later function checking such flag will not. I wondering if this inconsistency could cause problems?
> 
> This flag set by tcp_accecn_process() will be used by following functions: tcp_in_ack_event(), tcp_fastretrans_alert().
> 
> And this shall only impact the AccECN mode.
> 
> Best regards,
> Chia-Yu
> 
> > 
> > /P

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ