lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID:
 <PAXPR07MB7984C7B7A47A49FA30D24005A37CA@PAXPR07MB7984.eurprd07.prod.outlook.com>
Date: Fri, 20 Jun 2025 08:57:44 +0000
From: "Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>
To: Paolo Abeni <pabeni@...hat.com>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "linux-doc@...r.kernel.org"
	<linux-doc@...r.kernel.org>, "corbet@....net" <corbet@....net>,
	"horms@...nel.org" <horms@...nel.org>, "dsahern@...nel.org"
	<dsahern@...nel.org>, "kuniyu@...zon.com" <kuniyu@...zon.com>,
	"bpf@...r.kernel.org" <bpf@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "dave.taht@...il.com" <dave.taht@...il.com>,
	"jhs@...atatu.com" <jhs@...atatu.com>, "kuba@...nel.org" <kuba@...nel.org>,
	"stephen@...workplumber.org" <stephen@...workplumber.org>,
	"xiyou.wangcong@...il.com" <xiyou.wangcong@...il.com>, "jiri@...nulli.us"
	<jiri@...nulli.us>, "davem@...emloft.net" <davem@...emloft.net>,
	"andrew+netdev@...n.ch" <andrew+netdev@...n.ch>, "donald.hunter@...il.com"
	<donald.hunter@...il.com>, "ast@...erby.net" <ast@...erby.net>,
	"liuhangbin@...il.com" <liuhangbin@...il.com>, "shuah@...nel.org"
	<shuah@...nel.org>, "linux-kselftest@...r.kernel.org"
	<linux-kselftest@...r.kernel.org>, "ij@...nel.org" <ij@...nel.org>,
	"ncardwell@...gle.com" <ncardwell@...gle.com>, "Koen De Schepper (Nokia)"
	<koen.de_schepper@...ia-bell-labs.com>, "g.white@...lelabs.com"
	<g.white@...lelabs.com>, "ingemar.s.johansson@...csson.com"
	<ingemar.s.johansson@...csson.com>, "mirja.kuehlewind@...csson.com"
	<mirja.kuehlewind@...csson.com>, "cheshire@...le.com" <cheshire@...le.com>,
	"rs.ietf@....at" <rs.ietf@....at>, "Jason_Livingood@...cast.com"
	<Jason_Livingood@...cast.com>, "vidhi_goel@...le.com" <vidhi_goel@...le.com>
CC: "Olivier Tilmans (Nokia)" <olivier.tilmans@...ia.com>
Subject: RE: [PATCH v8 net-next 04/15] tcp: AccECN core

> -----Original Message-----
> From: Paolo Abeni <pabeni@...hat.com> 
> Sent: Tuesday, June 17, 2025 10:03 AM
> To: Chia-Yu Chang (Nokia) <chia-yu.chang@...ia-bell-labs.com>; edumazet@...gle.com; linux-doc@...r.kernel.org; corbet@....net; horms@...nel.org; dsahern@...nel.org; kuniyu@...zon.com; bpf@...r.kernel.org; netdev@...r.kernel.org; dave.taht@...il.com; jhs@...atatu.com; kuba@...nel.org; stephen@...workplumber.org; xiyou.wangcong@...il.com; jiri@...nulli.us; davem@...emloft.net; andrew+netdev@...n.ch; donald.hunter@...il.com; ast@...erby.net; liuhangbin@...il.com; shuah@...nel.org; linux-kselftest@...r.kernel.org; ij@...nel.org; ncardwell@...gle.com; Koen De Schepper (Nokia) <koen.de_schepper@...ia-bell-labs.com>; g.white@...lelabs.com; ingemar.s.johansson@...csson.com; mirja.kuehlewind@...csson.com; cheshire@...le.com; rs.ietf@....at; Jason_Livingood@...cast.com; vidhi_goel@...le.com
> Cc: Olivier Tilmans (Nokia) <olivier.tilmans@...ia.com>
> Subject: Re: [PATCH v8 net-next 04/15] tcp: AccECN core
> 
> 
> CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> 
> 
> 
> On 6/10/25 2:53 PM, chia-yu.chang@...ia-bell-labs.com wrote:
> > From: Ilpo Järvinen <ij@...nel.org>
> >
> > This change implements Accurate ECN without negotiation and AccECN 
> > Option (that will be added by later changes). Based on AccECN 
> > specifications:
> >   https://tools.ietf.org/id/draft-ietf-tcpm-accurate-ecn-28.txt
> >
> > Accurate ECN allows feeding back the number of CE (congestion
> > experienced) marks accurately to the sender in contrast to
> > RFC3168 ECN that can only signal one marks-seen-yes/no per RTT.
> > Congestion control algorithms can take advantage of the accurate ECN 
> > information to fine-tune their congestion response to avoid drastic 
> > rate reduction when only mild congestion is encountered.
> >
> > With Accurate ECN, tp->received_ce (r.cep in AccECN spec) keeps track 
> > of how many segments have arrived with a CE mark. Accurate ECN uses 
> > ACE field (ECE, CWR, AE) to communicate the value back to the sender 
> > which updates tp->delivered_ce (s.cep) based on the feedback. This 
> > signalling channel is lossy when ACE field overflow occurs.
> >
> > Conservative strategy is selected here to deal with the ACE overflow, 
> > however, some strategies using the AccECN option later in the overall 
> > patchset mitigate against false overflows detected.
> >
> > The ACE field values on the wire are offset by 
> > TCP_ACCECN_CEP_INIT_OFFSET. Delivered_ce/received_ce count the real CE 
> > marks rather than forcing all downstream users to adapt to the wire 
> > offset.
> >
> > This patch uses the first 1-byte hole and the last 4-byte hole of the 
> > tcp_sock_write_txrx for 'received_ce_pending' and 'received_ce'.
> > Also, the group size of tcp_sock_write_txrx is increased from
> > 91 + 4 to 95 + 4 due to the new u32 received_ce member. Below are the 
> > trimmed pahole outcomes before and after this patch.
> 
> AFAICS 'received_ce' fills the existing 4 bytes hole, so tcp_sock_write_txrx size should be now (95 + 0), am I missreading something?

Hi Paolo,

First, thanks for the feedback.
Regarding such "+ 4" is due to the byte alignemnt needed for ARM arch.
You can see below tha pahole outputs BEFORE and AFTER this patch, in which a 4-byte hole is added before "tcp_clock_cache" to make it aligns from the multiple of 8.
And the comment in net/ipv4/tcp.c also explain it: "32bit arches with 8byte alignment on u64 fields might need padding before tcp_clock_cache."
Does it make sense to you?

[BEFORE PATCH]
__u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  1869     0 */
u8                         nonagle:4;            /*  1869: 0  1 */
u8                         rate_app_limited:1;   /*  1869: 4  1 */

/* XXX 3 bits hole, try to pack */
/* XXX 2 bytes hole, try to pack */

__be32                     pred_flags;           /*  1872     4 */

/* XXX 4 bytes hole, try to pack */

u64                        tcp_clock_cache;      /*  1880     8 */
u64                        tcp_mstamp;           /*  1888     8 */
u32                        rcv_nxt;              /*  1896     4 */
u32                        snd_nxt;              /*  1900     4 */
u32                        snd_una;              /*  1904     4 */
u32                        window_clamp;         /*  1908     4 */
u32                        srtt_us;              /*  1912     4 */
u32                        packets_out;          /*  1916     4 */
/* --- cacheline 30 boundary (1920 bytes) --- */
u32                        snd_up;               /*  1920     4 */
u32                        delivered;            /*  1924     4 */
u32                        delivered_ce;         /*  1928     4 */
u32                        app_limited;          /*  1932     4 */
u32                        rcv_wnd;              /*  1936     4 */
struct tcp_options_received rx_opt;              /*  1940    24 */
__u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  1964     0 */


[AFTER PATCH]
__u8                       __cacheline_group_begin__tcp_sock_write_txrx[0]; /*  1869     0 */
u8                         nonagle:4;            /*  1869: 0  1 */
u8                         rate_app_limited:1;   /*  1869: 4  1 */

/* XXX 3 bits hole, try to pack */

/* Force alignment to the next boundary: */
u8                         :0;

u8                         received_ce_pending:4; /*  1870: 0  1 */
u8                         unused2:4;            /*  1870: 4  1 */

/* XXX 1 byte hole, try to pack */

__be32                     pred_flags;           /*  1872     4 */

/* XXX 4 bytes hole, try to pack */

u64                        tcp_clock_cache;      /*  1880     8 */
u64                        tcp_mstamp;           /*  1888     8 */
u32                        rcv_nxt;              /*  1896     4 */
u32                        snd_nxt;              /*  1900     4 */
u32                        snd_una;              /*  1904     4 */
u32                        window_clamp;         /*  1908     4 */
u32                        srtt_us;              /*  1912     4 */
u32                        packets_out;          /*  1916     4 */
/* --- cacheline 30 boundary (1920 bytes) --- */
u32                        snd_up;               /*  1920     4 */
u32                        delivered;            /*  1924     4 */
u32                        delivered_ce;         /*  1928     4 */
u32                        received_ce;          /*  1932     4 */
u32                        app_limited;          /*  1936     4 */
u32                        rcv_wnd;              /*  1940     4 */
struct tcp_options_received rx_opt;              /*  1944    24 */
__u8                       __cacheline_group_end__tcp_sock_write_txrx[0]; /*  1968     0 */


> 
> > @@ -384,17 +387,16 @@ static void tcp_data_ecn_check(struct sock *sk, const struct sk_buff *skb)
> >               if (tcp_ca_needs_ecn(sk))
> >                       tcp_ca_event(sk, CA_EVENT_ECN_IS_CE);
> >
> > -             if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR)) {
> > +             if (!(tp->ecn_flags & TCP_ECN_DEMAND_CWR) &&
> > +                 tcp_ecn_mode_rfc3168(tp)) {
> >                       /* Better not delay acks, sender can have a very low cwnd */
> >                       tcp_enter_quickack_mode(sk, 2);
> >                       tp->ecn_flags |= TCP_ECN_DEMAND_CWR;
> >               }
> > -             tp->ecn_flags |= TCP_ECN_SEEN;
> 
> It's not clear why you need to move this statement earlier in the code path even for ecn_mode_rfc3168(). Either a comment or
> 
>                 if (!tcp_ecn_mode_rfc3168(tp))
>                         break;
> 
> a few lines aboved could help.
> 
> >               break;
> >       default:
> >               if (tcp_ca_needs_ecn(sk))
> >                       tcp_ca_event(sk, CA_EVENT_ECN_NO_CE);
> > -             tp->ecn_flags |= TCP_ECN_SEEN;
> 
> Same here.
> 
> Thanks,
> 
> Paolo

OK, will apply the above feedback in the next version and thanks.

BRs,
Chia-Yu

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ