netdev - Re: [PATCH v4 net-next 1/1] sched: Add dualpi2 qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CADVnQy=Gt+PHPJ+EdaXY=xcrgeDwusSBmmWV9+6-=93ZhD4SXw@mail.gmail.com>
Date: Thu, 31 Oct 2024 09:27:49 -0400
From: Neal Cardwell <ncardwell@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Paolo Abeni <pabeni@...hat.com>, chia-yu.chang@...ia-bell-labs.com, 
	netdev@...r.kernel.org, davem@...emloft.net, stephen@...workplumber.org, 
	jhs@...atatu.com, kuba@...nel.org, dsahern@...nel.org, ij@...nel.org, 
	koen.de_schepper@...ia-bell-labs.com, g.white@...lelabs.com, 
	ingemar.s.johansson@...csson.com, mirja.kuehlewind@...csson.com, 
	cheshire@...le.com, rs.ietf@....at, Jason_Livingood@...cast.com, 
	vidhi_goel@...le.com, Olga Albisser <olga@...isser.org>, 
	Olivier Tilmans <olivier.tilmans@...ia.com>, Henrik Steen <henrist@...rist.net>, 
	Bob Briscoe <research@...briscoe.net>
Subject: Re: [PATCH v4 net-next 1/1] sched: Add dualpi2 qdisc

On Tue, Oct 29, 2024 at 12:53 PM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Tue, Oct 29, 2024 at 1:56 PM Paolo Abeni <pabeni@...hat.com> wrote:
> >
> > On 10/22/24 00:12, chia-yu.chang@...ia-bell-labs.com wrote:
> > > +/* Default alpha/beta values give a 10dB stability margin with max_rtt=100ms. */
> > > +static void dualpi2_reset_default(struct dualpi2_sched_data *q)
> > > +{
> > > +     q->sch->limit = 10000;                          /* Max 125ms at 1Gbps */
> > > +
> > > +     q->pi2.target = 15 * NSEC_PER_MSEC;
> > > +     q->pi2.tupdate = 16 * NSEC_PER_MSEC;
> > > +     q->pi2.alpha = dualpi2_scale_alpha_beta(41);    /* ~0.16 Hz * 256 */
> > > +     q->pi2.beta = dualpi2_scale_alpha_beta(819);    /* ~3.20 Hz * 256 */
> > > +
> > > +     q->step.thresh = 1 * NSEC_PER_MSEC;
> > > +     q->step.in_packets = false;
> > > +
> > > +     dualpi2_calculate_c_protection(q->sch, q, 10);  /* wc=10%, wl=90% */
> > > +
> > > +     q->ecn_mask = INET_ECN_ECT_1;
> > > +     q->coupling_factor = 2;         /* window fairness for equal RTTs */
> > > +     q->drop_overload = true;        /* Preserve latency by dropping */
> > > +     q->drop_early = false;          /* PI2 drops on dequeue */
> > > +     q->split_gso = true;
> >
> > This is a very unexpected default. Splitting GSO packets earlier WRT the
> > H/W constaints definitely impact performances in a bad way.
> >
> > Under which condition this is expected to give better results?
> > It should be at least documented clearly.
>
> I agree, it is very strange to see this orthogonal feature being
> spread in some qdisc.

IMHO it makes sense to offer this split_gso feature in the dualpi2
qdisc because the dualpi2 qdisc is targeted at reducing latency and
targeted mostly at hops in the last mile of the public Internet, where
there can be orders of magnitude disparities in bandwidth between
upstream and downstream links (e.g., packets arriving over 10G
ethernet and leaving destined for a 10M DSL link). In such cases, GRO
may aggregate many packets into a single skb receiving data on a fast
ingress link, and then may want to reduce latency issues on the slow
link by allowing smaller skbs to be enqueued on the slower egress
link.

> Also, it seems this qdisc could be a mere sch_prio queue, with two
> sch_pie children, or two sch_fq or sch_fq_codel ?

Having two independent children would not allow meeting the dualpi2
goal to "preserve fairness between ECN-capable and non-ECN-capable
traffic." (quoting text from https://datatracker.ietf.org/doc/rfc9332/
). The main issue is that there may be differing numbers of flows in
the ECN-capable and non-ECN-capable queues, and yet dualpi2 wants to
maintain approximate per-flow fairness on both sides. To do this, it
uses a single qdisc with coupling of the ECN mark rate in the
ECN-capable queue and drop rate in the non-ECN-capable queue.

This could probably be made more clear in the commit message.

> Many of us are using fq_codel or fq, there is no way we can switch to
> dualpi2 just to experiment things.

Yes, sites that are using fq_codel or fq do not need to switch to dualpi2.

AFAIK the idea with dualpi2 is to offer a new qdisc for folks
developing hardware for the last mile of the Internet where you want
low latency via L4S, and want approximate per-flow fairness between
L4S and non-L4S traffic, even in the presence of VPN-encrypted traffic
(where flow identifiers are not available for fq_codel or fq fair
queuing).

Sites that don't have VPN traffic or don't care about the VPN issue
can use fq or fq_codel with the ce_threshold parameter to allow low
latency via L4S while achieving approximate per-flow fairness.

best regards,
neal