netdev - Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <3VMC2BvvUHZC7kQfL_mtfX8Arwvgb3Sx8jNF-YzqiJ81f_L7nOgIYvQOxVmjeWqWJcIW5rb48oCbzp8iAzjAUPU0iKBk012t0DR7arhqtOY=@willsroot.io>
Date: Tue, 22 Jul 2025 15:52:25 +0000
From: William Liu <will@...lsroot.io>
To: Paolo Abeni <pabeni@...hat.com>
Cc: netdev@...r.kernel.org, jhs@...atatu.com, xiyou.wangcong@...il.com, kuba@...nel.org, jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com, horms@...nel.org, savy@...t3mfailure.io
Subject: Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal

On Tuesday, July 22nd, 2025 at 1:32 PM, Paolo Abeni <pabeni@...hat.com> wrote:

Thank you for the review!

> 
> 
> On 7/19/25 8:08 PM, William Liu wrote:
> 
> > This issue applies for the following qdiscs: hhf, fq, fq_codel, and
> > fq_pie, and occurs in their change handlers when adjusting to the new
> > limit. The problems are the following in the values passed to the
> > subsequent qdisc_tree_reduce_backlog call:
> > 
> > 1. When the tbf parent runs out of tokens, skbs of these qdiscs will
> > be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
> > which accounts for both qlen and backlog. However, in the case of
> > qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
> > from gso_skb. This means that these qdiscs are missing a
> > qdisc_qstats_backlog_dec when dropping packets to satisfy the
> > new limit in their change handlers.
> > 
> > One can observe this issue with the following (with tc patched to
> > support a limit of 0):
> > 
> > export TARGET=fq
> > tc qdisc del dev lo root
> > tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
> > tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
> > echo ''; echo 'add child'; tc -s -d qdisc show dev lo
> > ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
> > echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
> > tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
> > echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
> > tc qdisc replace dev lo handle 2: parent 1:1 sfq
> > echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
> > 
> > The second to last show command shows 0 packets but a positive
> > number (74) of backlog bytes. The problem becomes clearer in the
> > last show command, where qdisc_purge_queue triggers
> > qdisc_tree_reduce_backlog with the positive backlog and causes an
> > underflow in the tbf parent's backlog (4096 Mb instead of 0).
> > 
> > 2. fq_codel_change is also wrong in the non gso_skb case. It tracks
> > the amount to drop after the limit adjustment loop through
> > cstats.drop_count and cstats.drop_len, but these are also updated
> > in fq_codel_dequeue, and reset everytime if non-zero in that
> > function after a call to qdisc_tree_reduce_backlog.
> > If the drop path ever occurs in fq_codel_dequeue and
> > qdisc_dequeue_internal takes the non gso_skb path, then we would
> > reduce the backlog by an extra packet.
> > 
> > To fix these issues, the codepath for all clients of
> > qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
> > fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
> > adjustments for all cases that do not directly use the dequeue
> > handler.
> > 
> > Special care is taken for fq_codel_dequeue to account for the
> > qdisc_tree_reduce_backlog call in its dequeue handler. The
> > cstats reset is moved from the end to the beginning of
> > fq_codel_dequeue, so the change handler can use cstats for
> > proper backlog reduction accounting purposes. The drop_len and
> > drop_count fields are not used elsewhere so this reordering in
> > fq_codel_dequeue is ok.
> > 
> > Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
> > Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> > Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> > 
> > Signed-off-by: William Liu will@...lsroot.io
> > Reviewed-by: Savino Dicanosa savy@...t3mfailure.io
> 
> 
> Please avid black lines in the tag area, i.e. between 'Fixes' and the SoB.
> 

Ok noted.

> Also I think this could/should be splitted in several patches, one for
> each affected qdisc.
> 

I considered doing that originally, but the commit that introduced qdisc_dequeue_internal had all the changes in one patch: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2d3cbfd6d54a2c39ce3244f33f85c595844bd7b8

Plus, splitting this change up for each affected qdisc and the change in qdisc_dequeue_internal would have codel and pie end up with broken backlog accounting, unless I group them with the same patch for qdisc_dequeue_internal. Should I still split it in this case?

> > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> > index 2a0f3a513bfa..f9e6d76a1712 100644
> > --- a/net/sched/sch_fq_codel.c
> > +++ b/net/sched/sch_fq_codel.c
> > @@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> > struct fq_codel_flow *flow;
> > struct list_head *head;
> > 
> > + /* reset these here, as change needs them for proper accounting*/
> > + q->cstats.drop_count = 0;
> > + q->cstats.drop_len = 0;
> > +
> > begin:
> > head = &q->new_flows;
> > if (list_empty(head)) {
> > @@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> > if (q->cstats.drop_count) {
> 
> 
> Why is this 'if' still needed ? Isn't drop_count always 0 here?
> 

No, the codel_dequeue call in the middle can adjust that value.

> /P

Best,
Will