[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKdVt7AAh0bcx=zEUz0O+oBneOHvq2EjRbyNifQozEv4A@mail.gmail.com>
Date: Mon, 30 Sep 2024 19:55:15 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Willem de Bruijn <willemdebruijn.kernel@...il.com>
Cc: "David S . Miller" <davem@...emloft.net>, Jakub Kicinski <kuba@...nel.org>,
Paolo Abeni <pabeni@...hat.com>, Willem de Bruijn <willemb@...gle.com>,
Jeffrey Ji <jeffreyji@...gle.com>, netdev@...r.kernel.org, eric.dumazet@...il.com
Subject: Re: [PATCH net-next 2/2] net_sched: sch_fq: add the ability to
offload pacing
On Mon, Sep 30, 2024 at 7:33 PM Willem de Bruijn
<willemdebruijn.kernel@...il.com> wrote:
>
> Eric Dumazet wrote:
> > From: Jeffrey Ji <jeffreyji@...gle.com>
> >
> > Some network devices have the ability to offload EDT (Earliest
> > Departure Time) which is the model used for TCP pacing and FQ packet
> > scheduler.
> >
> > Some of them implement the timing wheel mechanism described in
> > https://saeed.github.io/files/carousel-sigcomm17.pdf
> > with an associated 'timing wheel horizon'.
> >
> > This patchs adds to FQ packet scheduler TCA_FQ_OFFLOAD_HORIZON
> > attribute.
> >
> > Its value is capped by the device max_pacing_offload_horizon,
> > added in the prior patch.
> >
> > It allows FQ to let packets within pacing offload horizon
> > to be delivered to the device, which will handle the needed
> > delay without host involvement.
> >
> > Signed-off-by: Jeffrey Ji <jeffreyji@...gle.com>
> > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
>
> > @@ -1100,6 +1105,17 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
> > WRITE_ONCE(q->horizon_drop,
> > nla_get_u8(tb[TCA_FQ_HORIZON_DROP]));
> >
> > + if (tb[TCA_FQ_OFFLOAD_HORIZON]) {
> > + u64 offload_horizon = (u64)NSEC_PER_USEC *
> > + nla_get_u32(tb[TCA_FQ_OFFLOAD_HORIZON]);
> > +
> > + if (offload_horizon <= qdisc_dev(sch)->max_pacing_offload_horizon) {
> > + WRITE_ONCE(q->offload_horizon, offload_horizon);
>
> Do we expect that that an administrator will ever set the offload
> horizon different from the device horizon?
We want to be able to eventually deal with firmware/hardware bugs,
like lack of backpressure on the timer wheel, which probably has some
kind of capacity limit.
I think it is much better to let the admin choose, eventually
disabling the whole thing, or enabling it for a small horizon like
2500 ns.
>
> It might be useful to have a wildcard value that means "match
> hardware ability"?
"ip link" will show the device max capability.
Same story for gso_max_size attribute.
We do not automatically set it to dev->tso_max_size
I do not think we have a precedent for a qdisc/link attribute where
the kernel automatically caps the user
choice with the device capability.
>
> Both here and in the device, realistic values will likely always be
> MSEC scale?
msec granularity proved to be not good for TCP stack, we went to us already.
Fast path compares in ns unit, storing the value in ns removes
multiplies from it.
Powered by blists - more mailing lists