netdev - Re: [PATCH net v4 1/2] net/sched: Fix backlog accounting in qdisc_dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <n-GjVW0_1R1-ujkLgZIEgnaQKSsNtQ9-7UZiTmDCJsy1EutoUtiGOSahNSxpz2yANsp5olbxItT2X9apTC9btIRepMGAZZVBqWx6ueYE5O4=@willsroot.io>
Date: Sun, 10 Aug 2025 21:06:57 +0000
From: William Liu <will@...lsroot.io>
To: Jakub Kicinski <kuba@...nel.org>
Cc: netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com, pabeni@...hat.com, jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com, horms@...nel.org, savy@...t3mfailure.io, victor@...atatu.com
Subject: Re: [PATCH net v4 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal

On Friday, August 8th, 2025 at 9:27 PM, Jakub Kicinski <kuba@...nel.org> wrote:

> 
> 
> On Sun, 27 Jul 2025 23:56:32 +0000 William Liu wrote:
> 
> > Special care is taken for fq_codel_dequeue to account for the
> > qdisc_tree_reduce_backlog call in its dequeue handler. The
> > cstats reset is moved from the end to the beginning of
> > fq_codel_dequeue, so the change handler can use cstats for
> > proper backlog reduction accounting purposes. The drop_len and
> > drop_count fields are not used elsewhere so this reordering in
> > fq_codel_dequeue is ok.
> 
> 
> Using local variables like we do in other qdiscs will not work?
> I think your change will break drop accounting during normal dequeue?

Can you elaborate on this? 

I just moved the reset of two cstats fields from the dequeue handler epilogue to the prologue. Those specific cstats fields are not used elsewhere so they should be fine, but we need to accumulate their values during limit adjustment. Otherwise the limit adjustment loop could perform erroneous accounting in the final qdisc_tree_reduce_backlog because the dequeue path could have already triggered qdisc_tree_reduce_backlog calls.

> 
> > diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
> > index 638948be4c50..a24094a638dc 100644
> > --- a/include/net/sch_generic.h
> > +++ b/include/net/sch_generic.h
> > @@ -1038,10 +1038,15 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
> > skb = __skb_dequeue(&sch->gso_skb);
> > if (skb) {
> > sch->q.qlen--;
> > + qdisc_qstats_backlog_dec(sch, skb);
> > + return skb;
> > + }
> > + if (direct) {
> > + skb = __qdisc_dequeue_head(&sch->q);
> > + if (skb)
> > + qdisc_qstats_backlog_dec(sch, skb);
> > return skb;
> > }
> > - if (direct)
> > - return __qdisc_dequeue_head(&sch->q);
> > else
> 
> 
> sorry for a late nit, it wasn't very clear from the diff but
> we end up with
> 
> if (direct) {
> ...
> }
> else
> return ..;
> 
> Please reformat:
> 
> if (direct) {
> ...
> } else {
> ...
> }
> 

Ok noted.

> > return sch->dequeue(sch);
> > }
> 
> > diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
> > index 902ff5470607..986e71e3362c 100644
> > --- a/net/sched/sch_fq.c
> > +++ b/net/sched/sch_fq.c
> > @@ -1014,10 +1014,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
> > struct netlink_ext_ack *extack)
> > {
> > struct fq_sched_data *q = qdisc_priv(sch);
> > + unsigned int prev_qlen, prev_backlog;
> > struct nlattr *tb[TCA_FQ_MAX + 1];
> > - int err, drop_count = 0;
> > - unsigned drop_len = 0;
> > u32 fq_log;
> > + int err;
> > 
> > err = nla_parse_nested_deprecated(tb, TCA_FQ_MAX, opt, fq_policy,
> > NULL);
> > @@ -1135,16 +1135,16 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
> > err = fq_resize(sch, fq_log);
> > sch_tree_lock(sch);
> > }
> > +
> > + prev_qlen = sch->q.qlen;
> > + prev_backlog = sch->qstats.backlog;
> > while (sch->q.qlen > sch->limit) {
> > struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
> > 
> > - if (!skb)
> > - break;
> 
> 
> The break conditions is removed to align the code across the qdiscs?

That break is no longer needed because qdisc_internal_dequeue handles all the length and backlog size adjustments. The check existed there because of the qdisc_pkt_len call.

> 
> > - drop_len += qdisc_pkt_len(skb);
> > rtnl_kfree_skbs(skb, skb);
> > - drop_count++;
> > }
> > - qdisc_tree_reduce_backlog(sch, drop_count, drop_len);
> > + qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
> > + prev_backlog - sch->qstats.backlog);
> 
> 
> There is no real change in the math here, right?
> Again, you're just changing this to align across the qdiscs?

Yep, asides from using a properly updated qlen and backlog from the revamped qdisc_dequeue_internal.

> --
> pw-bot: cr