[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAA93jw7aFAtMjTH9_TutXc85vKW1wwnddR2ARRaxhaczfzF04A@mail.gmail.com>
Date:   Sun, 31 Oct 2021 13:34:42 -0700
From:   Dave Taht <dave.taht@...il.com>
To:     Eric Dumazet <edumazet@...gle.com>
Cc:     Neal Cardwell <ncardwell@...gle.com>,
        Asad Sajjad Ahmed <asadsa@....uio.no>,
        "David S. Miller" <davem@...emloft.net>,
        Jakub Kicinski <kuba@...nel.org>,
        netdev <netdev@...r.kernel.org>,
        Toke Høiland-Jørgensen <toke@...hat.com>,
        Ingemar Johansson S <ingemar.s.johansson@...csson.com>,
        Tom Henderson <tomh@...h.org>,
        Bob Briscoe <research@...briscoe.net>,
        Olga Albisser <olga@...isser.org>
Subject: Re: [PATCH net-next] fq_codel: avoid under-utilization with
 ce_threshold at low link rates
On Sun, Oct 31, 2021 at 1:31 PM Dave Taht <dave.taht@...il.com> wrote:
>
> On Fri, Oct 29, 2021 at 7:53 AM Eric Dumazet <edumazet@...gle.com> wrote:
> >
> > On Fri, Oct 29, 2021 at 6:54 AM Neal Cardwell <ncardwell@...gle.com> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 3:15 PM Asad Sajjad Ahmed <asadsa@....uio.no> wrote:
> > > >
> > > > Commit "fq_codel: generalise ce_threshold marking for subset of traffic"
> > > > [1] enables ce_threshold to be used in the Internet, not just in data
> > > > centres.
> > > >
> > > > Because ce_threshold is in time units, it can cause poor utilization at
> > > > low link rates when it represents <1 packet.
> > > > E.g., if link rate <12Mb/s ce_threshold=1ms is <1500B packet.
> > > >
> > > > So, suppress ECN marking unless the backlog is also > 1 MTU.
> > > >
> > > > A similar patch to [1] was tested on an earlier kernel, and a similar
> > > > one-packet check prevented poor utilization at low link rates [2].
> > > >
> > > > [1] commit dfcb63ce1de6 ("fq_codel: generalise ce_threshold marking for subset of traffic")
> > > >
> > > > [2] See right hand column of plots at the end of:
> > > > https://bobbriscoe.net/projects/latency/dctth_journal_draft20190726.pdf
> > > >
> > > > Signed-off-by: Asad Sajjad Ahmed <asadsa@....uio.no>
> > > > Signed-off-by: Olga Albisser <olga@...isser.org>
> > > > ---
> > > >  include/net/codel_impl.h | 3 ++-
> > > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/net/codel_impl.h b/include/net/codel_impl.h
> > > > index 137d40d8cbeb..4e3e8473e776 100644
> > > > --- a/include/net/codel_impl.h
> > > > +++ b/include/net/codel_impl.h
> > > > @@ -248,7 +248,8 @@ static struct sk_buff *codel_dequeue(void *ctx,
> > > >                                                     vars->rec_inv_sqrt);
> > > >         }
> > > >  end:
> > > > -       if (skb && codel_time_after(vars->ldelay, params->ce_threshold)) {
> > > > +       if (skb && codel_time_after(vars->ldelay, params->ce_threshold) &&
> > > > +           *backlog > params->mtu) {
> >
> > I think this idea would apply to codel quite well.  (This helper is
> > common to codel and fq_codel)
> >
> > But with fq_codel my thoughts are:
> >
> > *backlog is the backlog of the qdisc, not the backlog for the flow,
> > and it includes the packet currently being removed from the queue.
> >
> > Setting ce_threshold to 1ms while the link rate is 12Mbs sounds
> > misconfiguration to me.
> >
> > Even if this flow has to transmit one tiny packet every minute, it
> > will get CE mark
> > just because at least one packet from an elephant flow is currently
> > being sent to the wire.
> >
> > BQL won't prevent that at least one packet is being processed while
> > the tiny packet
> > is coming into fq_codel qdisc.
> >
> > vars->ldelay = now - skb_time_func(skb);
> >
> > For tight ce_threshold, vars->ldelay would need to be replaced by
> >
> > now - (time of first codel_dequeue() after this skb has been queued).
> > This seems a bit hard to implement cheaply.
> >
> >
> >
> >
> > > >                 bool set_ce = true;
> > > >
> > > >                 if (params->ce_threshold_mask) {
> > > > --
> > >
> > > Sounds like a good idea, and looks good to me.
> > >
> > > Acked-by: Neal Cardwell <ncardwell@...gle.com>
> > >
> > > Eric, what do you think?
> > >
> > > neal
>
> My 2c:
sigh, I got my bits flipped.
>
> I would prefer this entire patch series be reverted and put back into
> the l4s and BBRv2 trees where it can be thoroughly evaluated, and
> appropriate parameterizations found for bare metal DCs, virtualized
> environments, hosts and routers at a variety of bandwidths.
>
> I'm not huge on arbitrarily cluttering up the fq_codel hot path, or
> the uapi, before that happens.
>
> Also, if anyone has any suggestions as to how to apply this correctly
> to the wifi users in the kernel, I'm all ears.
>
> I think a safer path forward in linux mainline would be to remove ECT0
> from consideration as RFC3168 ECN in the INET_macros (which all the
> existing qdiscs will then pick up) and have the unconfigured linux
> hosts and vm substrates across the internet revert to drop in that
> case. Prague is defined to revert to reno in that case. Throughput, if
> not reductions in local queuing, will be good.
>
> A good test of the ECT0 concept across the wire might be to then have
> the in-kernel dctcp instance start using that instead of ECT1 and see
> what breaks.
>
> I would appreciate a set of before/after benchmarks on a future submission.
>
> It's otherwise not my call, I do wish more folk would read this (
> https://github.com/heistp/l4s-tests#key-findings) before committing
> seemingly untested code.
>
> --
> I tried to build a better future, a few times:
> https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
>
> Dave Täht CEO, TekLibre, LLC
-- 
I tried to build a better future, a few times:
https://wayforward.archive.org/?site=https%3A%2F%2Fwww.icei.org
Dave Täht CEO, TekLibre, LLC
Powered by blists - more mailing lists
 
