lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <-8f3jdd-5pxHr0GW-uu8VtTzqDKDOyJohJ-soIwzRyqJUub186VYIxqNoGOTh8Oxtu1U0CEDl5h3N1c1D1jbn7nIlXUrNo55CHK5KcT23c4=@willsroot.io>
Date: Mon, 11 Aug 2025 16:52:51 +0000
From: William Liu <will@...lsroot.io>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com, pabeni@...hat.com, jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com, horms@...nel.org, savy@...t3mfailure.io, victor@...atatu.com
Subject: Re: [PATCH net v4 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal

On Monday, August 11th, 2025 at 3:30 PM, Jakub Kicinski <kuba@...nel.org> wrote:

> 
> 
> On Sun, 10 Aug 2025 21:06:57 +0000 William Liu wrote:
> 
> > > On Sun, 27 Jul 2025 23:56:32 +0000 William Liu wrote:
> > > 
> > > > Special care is taken for fq_codel_dequeue to account for the
> > > > qdisc_tree_reduce_backlog call in its dequeue handler. The
> > > > cstats reset is moved from the end to the beginning of
> > > > fq_codel_dequeue, so the change handler can use cstats for
> > > > proper backlog reduction accounting purposes. The drop_len and
> > > > drop_count fields are not used elsewhere so this reordering in
> > > > fq_codel_dequeue is ok.
> > > 
> > > Using local variables like we do in other qdiscs will not work?
> > > I think your change will break drop accounting during normal dequeue?
> > 
> > Can you elaborate on this?
> > 
> > I just moved the reset of two cstats fields from the dequeue handler
> > epilogue to the prologue. Those specific cstats fields are not used
> > elsewhere so they should be fine,
> 
> 
> That's the disconnect. AFAICT they are passed to codel_dequeue(),
> and will be used during normal dequeue, as part of normal active
> queue management under traffic..
> 

Yes, that is the only place those values are used. From my understanding, codel_dequeue is only called in fq_codel_dequeue. So moving the reset from the dequeue epilogue to the dequeue prologue should be fine as the same behavior is kept - the same values should always be used by codel_dequeue.

Is there a case I am not seeing? If so, I can just add additional fields to the fq_codel_sched_data, but wanted to avoid doing that for this one edge case.

> > but we need to accumulate their
> > values during limit adjustment. Otherwise the limit adjustment loop
> > could perform erroneous accounting in the final
> > qdisc_tree_reduce_backlog because the dequeue path could have already
> > triggered qdisc_tree_reduce_backlog calls.
> > 
> > > > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> > > > index 902ff5470607..986e71e3362c 100644
> > > > --- a/net/sched/sch_fq.c
> > > > +++ b/net/sched/sch_fq.c
> > > > @@ -1014,10 +1014,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
> > > > struct netlink_ext_ack *extack)
> > > > {
> > > > struct fq_sched_data *q = qdisc_priv(sch);
> > > > + unsigned int prev_qlen, prev_backlog;
> > > > struct nlattr *tb[TCA_FQ_MAX + 1];
> > > > - int err, drop_count = 0;
> > > > - unsigned drop_len = 0;
> > > > u32 fq_log;
> > > > + int err;
> > > > 
> > > > err = nla_parse_nested_deprecated(tb, TCA_FQ_MAX, opt, fq_policy,
> > > > NULL);
> > > > @@ -1135,16 +1135,16 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
> > > > err = fq_resize(sch, fq_log);
> > > > sch_tree_lock(sch);
> > > > }
> > > > +
> > > > + prev_qlen = sch->q.qlen;
> > > > + prev_backlog = sch->qstats.backlog;
> > > > while (sch->q.qlen > sch->limit) {
> > > > struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
> > > > 
> > > > - if (!skb)
> > > > - break;
> > > 
> > > The break conditions is removed to align the code across the qdiscs?
> > 
> > That break is no longer needed because qdisc_internal_dequeue handles
> > all the length and backlog size adjustments. The check existed there
> > because of the qdisc_pkt_len call.
> 
> 
> Ack, tho, theoretically the break also prevents an infinite loop.
> Change is fine, but worth calling this out in the commit message,
> I reckon.
> 
> > > > - drop_len += qdisc_pkt_len(skb);
> > > > rtnl_kfree_skbs(skb, skb);
> > > > - drop_count++;
> > > > }
> > > > - qdisc_tree_reduce_backlog(sch, drop_count, drop_len);
> > > > + qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
> > > > + prev_backlog - sch->qstats.backlog);
> > > 
> > > There is no real change in the math here, right?
> > > Again, you're just changing this to align across the qdiscs?
> > 
> > Yep, asides from using a properly updated qlen and backlog from the
> > revamped qdisc_dequeue_internal.
> 
> 
> Personal preference, but my choice would be to follow the FQ code,
> and count the skbs as they are freed. But up to you, since we hold
> the lock supposedly the changes to backlog can only be due to our
> purging.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ