[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <442716ca-ae2e-4fac-8a01-ced3562fd588@mojatatu.com>
Date: Mon, 30 Jun 2025 14:52:19 -0300
From: Victor Nogueira <victor@...atatu.com>
To: Jamal Hadi Salim <jhs@...atatu.com>, Lion Ackermann <nnamrec@...il.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org,
Jiri Pirko <jiri@...nulli.us>, Mingi Cho <mincho@...ori.io>
Subject: Re: Incomplete fix for recent bug in tc / hfsc
On 6/30/25 11:57, Jamal Hadi Salim wrote:
> On Mon, Jun 30, 2025 at 9:36 AM Lion Ackermann <nnamrec@...il.com> wrote:
>>
>> Hi,
>>
>> On 6/30/25 1:34 PM, Jamal Hadi Salim wrote:
>>> Hi,
>>>
>>> On Mon, Jun 30, 2025 at 5:04 AM Lion Ackermann <nnamrec@...il.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 6/29/25 9:50 PM, Cong Wang wrote:
>>>>> On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
>>>>>>> On "What do you think the root cause is here?"
>>>>>>>
>>>>>>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
>>>>>>> all packets in enqueue (mostly in relation to peek()) and that result
>>>>>>> is not being reflected in the return code returned to its parent
>>>>>>> qdisc.
>>>>>>> So, in the example you described in this thread, drr is oblivious to
>>>>>>> the fact that the child qdisc dropped its packet because the call to
>>>>>>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
>>>>>>> activate a class that shouldn't have been activated at all.
>>>>>>>
>>>>>>> You can argue that drr (and other similar qdiscs) may detect this by
>>>>>>> checking the call to qlen_notify (as the drr patch was
>>>>>>> doing), but that seems really counter-intuitive. Imagine writing a new
>>>>>>> qdisc and having to check for that every time you call a child's
>>>>>>> enqueue. Sure your patch solves this, but it also seems like it's not
>>>>>>> fixing the underlying issue (which is drr activating the class in the
>>>>>>> first place). Your patch is simply removing all the classes from their
>>>>>>> active lists when you delete them. And your patch may seem ok for now,
>>>>>>> but I am worried it might break something else in the future that we
>>>>>>> are not seeing.
>>>>>>>
>>>>>>> And do note: All of the examples of the hierarchy I have seen so far,
>>>>>>> that put us in this situation, are nonsensical
>>>>>>>
>>>>>>
>>>>>> At this point my thinking is to apply your patch and then we discuss a
>>>>>> longer term solution. Cong?
>>>>>
>>>>> I agree. If Lion's patch works, it is certainly much better as a bug fix
>>>>> for both -net and -stable.
>>>>>
>>>>> Also for all of those ->qlen_notify() craziness, I think we need to
>>>>> rethink about the architecture, _maybe_ there are better architectural
>>>>> solutions.
>>>>>
>>>>> Thanks!
>>>>
>>>> Just for the record, I agree with all your points and as was stated this
>>>> patch really only does damage prevention. Your proposal of preventing
>>>> hierarchies sounds useful in the long run to keep the backlogs sane.
>>>>
>>>> I did run all the tdc tests on the latest net tree and they passed. Also
>>>> my HFSC reproducer does not trigger with the proposed patch. I do not have
>>>> a simple reproducer at hand for the QFQ tree case that you mentioned. So
>>>> please verify this too if you can.
>>>>
>>>> Otherwise please feel free to go forward with the patch. If I can add
>>>> anything else to the discussion please let me know.
>>>>
>>>
>>> Please post the patch formally as per Cong request. A tdc test case of
>>> the reproducer would also help.
>>>
>>> cheers,
>>> jamal
>>
>> I sent a patch, though I am not terribly familiar with the tdc test case
>> infrastructure. If it is a no-op for you to translate the repro above into
>> the required format, please feel free to do that and post a patch for that.
>> Otherwise I can have a closer look at it tomorrow.
>>
>
> We'll help out this time - but it is a good idea to for you to learn
> how to do it if you are going to keep finding issues on tc ;->
Lion, I attached a patch to this email that edits Cong's original tdc test
case to account for your reproducer. Please resend your patch with it (after
the 24 hour wait period).
cheers,
Victor
View attachment "0001-selftests-tc-testing-Fix-test-case-831d-to-reproduce.patch" of type "text/x-patch" (3141 bytes)
Powered by blists - more mailing lists