netdev - Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <96927022-92d9-46be-8af4-225cab01b006@redhat.com>
Date: Tue, 22 Jul 2025 15:32:04 +0200
From: Paolo Abeni <pabeni@...hat.com>
To: William Liu <will@...lsroot.io>, netdev@...r.kernel.org
Cc: jhs@...atatu.com, xiyou.wangcong@...il.com, kuba@...nel.org,
 jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com,
 horms@...nel.org, savy@...t3mfailure.io
Subject: Re: [PATCH net 1/2] net/sched: Fix backlog accounting in
 qdisc_dequeue_internal

On 7/19/25 8:08 PM, William Liu wrote:
> This issue applies for the following qdiscs: hhf, fq, fq_codel, and
> fq_pie, and occurs in their change handlers when adjusting to the new
> limit. The problems are the following in the values passed to the
> subsequent qdisc_tree_reduce_backlog call:
> 
> 1. When the tbf parent runs out of tokens, skbs of these qdiscs will
>    be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
>    which accounts for both qlen and backlog. However, in the case of
>    qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
>    from gso_skb. This means that these qdiscs are missing a
>    qdisc_qstats_backlog_dec when dropping packets to satisfy the
>    new limit in their change handlers.
> 
>    One can observe this issue with the following (with tc patched to
>    support a limit of 0):
> 
>    export TARGET=fq
>    tc qdisc del dev lo root
>    tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
>    tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
>    echo ''; echo 'add child'; tc -s -d qdisc show dev lo
>    ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
>    echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
>    tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
>    echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
>    tc qdisc replace dev lo handle 2: parent 1:1 sfq
>    echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
> 
>    The second to last show command shows 0 packets but a positive
>    number (74) of backlog bytes. The problem becomes clearer in the
>    last show command, where qdisc_purge_queue triggers
>    qdisc_tree_reduce_backlog with the positive backlog and causes an
>    underflow in the tbf parent's backlog (4096 Mb instead of 0).
> 
> 2. fq_codel_change is also wrong in the non gso_skb case. It tracks
>    the amount to drop after the limit adjustment loop through
>    cstats.drop_count and cstats.drop_len, but these are also updated
>    in fq_codel_dequeue, and reset everytime if non-zero in that
>    function after a call to qdisc_tree_reduce_backlog.
>    If the drop path ever occurs in fq_codel_dequeue and
>    qdisc_dequeue_internal takes the non gso_skb path, then we would
>    reduce the backlog by an extra packet.
> 
> To fix these issues, the codepath for all clients of
> qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
> fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
> adjustments for all cases that do not directly use the dequeue
> handler.
> 
> Special care is taken for fq_codel_dequeue to account for the
> qdisc_tree_reduce_backlog call in its dequeue handler. The
> cstats reset is moved from the end to the beginning of
> fq_codel_dequeue, so the change handler can use cstats for
> proper backlog reduction accounting purposes. The drop_len and
> drop_count fields are not used elsewhere so this reordering in
> fq_codel_dequeue is ok.
> 
> Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
> Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> 
> Signed-off-by: William Liu <will@...lsroot.io>
> Reviewed-by: Savino Dicanosa <savy@...t3mfailure.io>

Please avid black lines in the tag area, i.e. between 'Fixes' and the SoB.

Also I think this could/should be splitted in several patches, one for
each affected qdisc.

> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index 2a0f3a513bfa..f9e6d76a1712 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
>  	struct fq_codel_flow *flow;
>  	struct list_head *head;
>  
> +	/* reset these here, as change needs them for proper accounting*/
> +	q->cstats.drop_count = 0;
> +	q->cstats.drop_len = 0;
> +
>  begin:
>  	head = &q->new_flows;
>  	if (list_empty(head)) {
> @@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
>  	if (q->cstats.drop_count) {

Why is this 'if' still needed ? Isn't drop_count always 0 here?

/P