[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <rt_47kiO_w5I_HyL1B4RuHKclPjWvmSJqnlZwSZB5YKxStxbDAsb7lTae0yhrRLJkh9yb7JjV-LpU_xzNCd031YI23Z4Yy83u8-DgYuPsEI=@willsroot.io>
Date: Tue, 08 Jul 2025 20:21:31 +0000
From: William Liu <will@...lsroot.io>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Jiri Pirko <jiri@...nulli.us>, Jamal Hadi Salim <jhs@...atatu.com>, Savy <savy@...t3mfailure.io>, Paolo Abeni <pabeni@...hat.com>
Subject: Re: [BUG] Inconsistency between qlen and backlog in hhf, fq, fq_codel, and fq_pie causing WARNING in qdisc_tree_reduce_backlog
On Monday, July 7th, 2025 at 6:51 PM, Cong Wang <xiyou.wangcong@...il.com> wrote:
>
>
> On Sun, Jul 06, 2025 at 01:43:08PM +0000, William Liu wrote:
>
> > On Saturday, July 5th, 2025 at 1:04 AM, Cong Wang xiyou.wangcong@...il.com wrote:
> >
> > > On Wed, Jul 02, 2025 at 08:39:45PM +0000, William Liu wrote:
> > >
> > > > Hi,
> > > >
> > > > We write to report a bug in qlen and backlog consistency affecting hhf, fq, fq_codel, and fq_pie when acting as a child of tbf. The cause of this bug was introduced by the following fix last month designed to address a null dereference bug caused by gso segmentation and a temporary inconsistency in queue state when tbf peeks at its child while running out of tokens during tbf_dequeue [1]. We actually reported that bug but did not realize the subtle problem in the fix until now. We are aware of bugs with similar symptoms reported by Mingi [3] and Lion [4], but those are of a different root cause (at least what we can see of Mingi's report).
> > >
> > > Thanks for your report.
> > >
> > > > This works on the upstream kernel, and we have the following reproducer.
> > > >
> > > > ./tc qdisc del dev lo root
> > > > ./tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms || echo TBF
> > > > ./tc qdisc add dev lo handle 3: parent 1:1 hhf limit 1000 || echo HH
> > > > ping -I lo -f -c1 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
> > > > ./tc qdisc change dev lo handle 3: parent 1:1 hhf limit 0 || echo HH
> > > > ./tc qdisc replace dev lo handle 2: parent 1:1 sfq || echo SFQ
> > > >
> > > > Note that a patched version of tc that supports 0 limits must be built. The symptom of the bug arises in the WARN_ON_ONCE check in qdisc_tree_reduce_backlog [2], where n is 0. You can replace hhf with fq, fq_codel, and fq_pie to trigger warnings as well, though the success rate may vary.
> > > >
> > > > The root cause comes from the newly introduced function qdisc_dequeue_internal, which the change handler will trigger in the affected qdiscs [5]. When dequeuing from a non empty gso in this peek function, only qlen is decremented, and backlog is not considered. The gso insertion is triggered by qdisc_peek_dequeued, which tbf calls for these qdiscs when they are its child.
> > > >
> > > > When replacing the qdisc, tbf_graft triggers, and qdisc_purge_queue triggers qdisc_tree_reduce_backlog with the inconsistent values, which one can observe by adding printk to the passed qlen backlog values.
> > >
> > > If I understand you correctly, the problem is the inconsistent behavior
> > > between qdisc_purge_queue() and qdisc_dequeue_internal()? And it is
> > > because the former does not take care of ->gso_skb?
> >
> > No, there are 2 points of inconsistent behavior.
> >
> > 1. qdisc_dequeue_internal and qdisc_peek_dequeued. In qdisc_peek_dequeued, when a skb comes from the dequeue handler, it gets added to the gso_skb with qlen and backlog increased. In qdisc_dequeue_internal, only qlen is decreased when removing from gso.
>
>
> Yes, because qlen is handled by qdisc_dequeue_internal()'s callers to
> control their loop of sch->limit.
>
This makes sense. However, if backlog is not adjusted in that helper, then they would go out of sync. qdisc_tree_reduce_backlog only adjusts counters for parent/ancestral qdiscs.
This should help elucidate the problem:
export TARGET=hhf
./tc qdisc del dev lo root
./tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms || echo TBF
./tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000 || echo HH
echo ''; echo 'add child'; tc -s -d qdisc show dev lo
ping -I lo -f -c1 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
./tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0 || echo HH
echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
./tc qdisc replace dev lo handle 2: parent 1:1 sfq || echo SFQ
echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
Perhaps the original WARNING in qdisc_tree_reduce_backlog (which Lion's patch removed) regarding an inconsistent backlog was not a real problem then? But this backlog adjustment can be accounted for pretty easily I think. Let me know if I can help with this.
On another note, asides from hhf (because the backlog value to qdisc_tree_reduce_backlog is 0 due to its different limit change loop), the other qdiscs cause an underflow in the backlog of tbf by the final graft.
>
> > 2. The dequeue limit loop in change handlers and qdisc_dequeue_internal. Every time those loops call qdisc_dequeue_internal, the loops track their own version of dropped items and total dropped packet sizes, before using those values for qdisc_tree_reduce_backlog. The hhf qdisc is an exception as it just uses a before and after loop in the limit change loop. Would this not lead to double counting in the gso_skb dequeue case for qdisc_dequeue_internal?
>
>
> Right, I think hhf qdisc should align with other Qdisc's to track the
> qlen.
>
> Note: my patch didn't change its behavior, so this issue existed even
> before my patch.
>
> > Also, I took a look at some of the dequeue handlers (which qdisc_dequeue_internal call when direct is false), and fq_codel_dequeue touches the drop_count and drop_len statistics, which the limit adjustment loop also uses.
>
>
>
> Right, this is why it is hard to move the qlen tracking into
> qdisc_dequeue_internal().
>
> Thanks.
Powered by blists - more mailing lists