netdev - Re: [PATCH v4 net-next 1/1] sched: Add dualpi2 qdisc

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CAA93jw7b5D3DXw3=x5hWxvEctLiVhc4ozQSgwogqArOE6pMYcQ@mail.gmail.com>
Date: Thu, 31 Oct 2024 10:27:27 -0700
From: Dave Taht <dave.taht@...il.com>
To: "Koen De Schepper (Nokia)" <koen.de_schepper@...ia-bell-labs.com>
Cc: Eric Dumazet <edumazet@...gle.com>, Neal Cardwell <ncardwell@...gle.com>, 
	Paolo Abeni <pabeni@...hat.com>, 
	"Chia-Yu Chang (Nokia)" <chia-yu.chang@...ia-bell-labs.com>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "davem@...emloft.net" <davem@...emloft.net>, 
	"stephen@...workplumber.org" <stephen@...workplumber.org>, "jhs@...atatu.com" <jhs@...atatu.com>, 
	"kuba@...nel.org" <kuba@...nel.org>, "dsahern@...nel.org" <dsahern@...nel.org>, "ij@...nel.org" <ij@...nel.org>, 
	"g.white@...lelabs.com" <g.white@...lelabs.com>, 
	"ingemar.s.johansson@...csson.com" <ingemar.s.johansson@...csson.com>, 
	"mirja.kuehlewind@...csson.com" <mirja.kuehlewind@...csson.com>, "cheshire@...le.com" <cheshire@...le.com>, 
	"rs.ietf@....at" <rs.ietf@....at>, 
	"Jason_Livingood@...cast.com" <Jason_Livingood@...cast.com>, "vidhi_goel@...le.com" <vidhi_goel@...le.com>, 
	Olga Albisser <olga@...isser.org>, "Olivier Tilmans (Nokia)" <olivier.tilmans@...ia.com>, 
	Henrik Steen <henrist@...rist.net>, Bob Briscoe <research@...briscoe.net>
Subject: Re: [PATCH v4 net-next 1/1] sched: Add dualpi2 qdisc

On Thu, Oct 31, 2024 at 9:46 AM Koen De Schepper (Nokia)
<koen.de_schepper@...ia-bell-labs.com> wrote:
>
>
> From: Eric Dumazet <edumazet@...gle.com>
> Sent: Thursday, October 31, 2024 3:31 PM
> > On Thu, Oct 31, 2024 at 2:28 PM Neal Cardwell <ncardwell@...gle.com> wrote:
> > > On Tue, Oct 29, 2024 at 12:53 PM Eric Dumazet <edumazet@...gle.com> wrote:
> > > > Also, it seems this qdisc could be a mere sch_prio queue, with two
> > > > sch_pie children, or two sch_fq or sch_fq_codel ?
> > >
> > > Having two independent children would not allow meeting the dualpi2
> > > goal to "preserve fairness between ECN-capable and non-ECN-capable
> > > traffic." (quoting text from https://datatracker.ietf.org/doc/rfc9332/
> > > ). The main issue is that there may be differing numbers of flows in
> > > the ECN-capable and non-ECN-capable queues, and yet dualpi2 wants to
> > > maintain approximate per-flow fairness on both sides. To do this, it
> > > uses a single qdisc with coupling of the ECN mark rate in the
> > > ECN-capable queue and drop rate in the non-ECN-capable queue.
> >
> > Not sure I understand this argument.
> >
> > The dequeue  seems to use WRR, so this means that instead of prio, this could use net/sched/sch_drr.c, then two PIE (with different settings) as children, and a proper classify at enqueue to choose one queue or the other.
> >
> > Reviewing ~1000 lines of code, knowing that in one year another net/sched/sch_fq_dualpi2.c will follow (as net/sched/sch_fq_pie.c followed net/sched/sch_pie.c ) is not exactly appealing to me.
>
> This composition doesn't work. We need more than 2 independent AQMs and a scheduler. The coupling between the queues and other extra interworking conditions is very important here, which are unfortunately not possible with a composition of existing qdiscs.

I tried to mention that the dualpi concept is not very dual when
hardware mq is in use - one "dualpi" instance per core.

So essential limitations on usage for dualpi are:

Single instance only
gso-splitting only

So it is not suitable as a general purpose data center qdisc because
it simply cannot scale to larger bandwidths.

I think in part the confusion here is the other stuff that was
originally submitted (accecn, tcp prague), needs to be tested somehow,
and a path forward seems to be to put a ce_threshold into sch_fq
matching the l4s ecn bit, with a suitable default (which in dualpi is
1ms). (self congestion is a thing), then incorporate accecn, then test
prague driving that, then, somewhere on the path or test setup put in
a rate limited dualpi instance?

> Also, we don't expect any FQ and DualQ merger. Using only 2 queues (one for each class L4S and Classic) is one of the differentiating features of DualQ compared to FQ, with a lower L4S tail latency compared to a blocking and scheduled FQ qdiscs.

>Adding FQ_ on top or under DualQ would break the goal of DualQ.

Comparing fq_codel or fq_pie to dualQ would probably be enlightening.
Both of these scale to hardware mq.

In dualpi's defence it seems to be an attempt to mimic a hardware
implementation.

> If an FQ_ supporting L4S is needed, then existing FQ_ implementations can be used (like fq_codel) or extended (identifying L4S and using the correct thresholds by default).

Merely having a preferred value for that threshold would be nice. The
threshold first deployed for fq_codel was far too low for production
environments. If 1ms works, cool!

>
> Regards,
> Koen.



-- 
Dave Täht CSO, LibreQos