lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <442716ca-ae2e-4fac-8a01-ced3562fd588@mojatatu.com>
Date: Mon, 30 Jun 2025 14:52:19 -0300
From: Victor Nogueira <victor@...atatu.com>
To: Jamal Hadi Salim <jhs@...atatu.com>, Lion Ackermann <nnamrec@...il.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org,
 Jiri Pirko <jiri@...nulli.us>, Mingi Cho <mincho@...ori.io>
Subject: Re: Incomplete fix for recent bug in tc / hfsc

On 6/30/25 11:57, Jamal Hadi Salim wrote:
> On Mon, Jun 30, 2025 at 9:36 AM Lion Ackermann <nnamrec@...il.com> wrote:
>>
>> Hi,
>>
>> On 6/30/25 1:34 PM, Jamal Hadi Salim wrote:
>>> Hi,
>>>
>>> On Mon, Jun 30, 2025 at 5:04 AM Lion Ackermann <nnamrec@...il.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> On 6/29/25 9:50 PM, Cong Wang wrote:
>>>>> On Sun, Jun 29, 2025 at 10:29:44AM -0400, Jamal Hadi Salim wrote:
>>>>>>> On "What do you think the root cause is here?"
>>>>>>>
>>>>>>> I believe the root cause is that qdiscs like hfsc and qfq are dropping
>>>>>>> all packets in enqueue (mostly in relation to peek()) and that result
>>>>>>> is not being reflected in the return code returned to its parent
>>>>>>> qdisc.
>>>>>>> So, in the example you described in this thread, drr is oblivious to
>>>>>>> the fact that the child qdisc dropped its packet because the call to
>>>>>>> its child enqueue returned NET_XMIT_SUCCESS. This causes drr to
>>>>>>> activate a class that shouldn't have been activated at all.
>>>>>>>
>>>>>>> You can argue that drr (and other similar qdiscs) may detect this by
>>>>>>> checking the call to qlen_notify (as the drr patch was
>>>>>>> doing), but that seems really counter-intuitive. Imagine writing a new
>>>>>>> qdisc and having to check for that every time you call a child's
>>>>>>> enqueue. Sure  your patch solves this, but it also seems like it's not
>>>>>>> fixing the underlying issue (which is drr activating the class in the
>>>>>>> first place). Your patch is simply removing all the classes from their
>>>>>>> active lists when you delete them. And your patch may seem ok for now,
>>>>>>> but I am worried it might break something else in the future that we
>>>>>>> are not seeing.
>>>>>>>
>>>>>>> And do note: All of the examples of the hierarchy I have seen so far,
>>>>>>> that put us in this situation, are nonsensical
>>>>>>>
>>>>>>
>>>>>> At this point my thinking is to apply your patch and then we discuss a
>>>>>> longer term solution. Cong?
>>>>>
>>>>> I agree. If Lion's patch works, it is certainly much better as a bug fix
>>>>> for both -net and -stable.
>>>>>
>>>>> Also for all of those ->qlen_notify() craziness, I think we need to
>>>>> rethink about the architecture, _maybe_ there are better architectural
>>>>> solutions.
>>>>>
>>>>> Thanks!
>>>>
>>>> Just for the record, I agree with all your points and as was stated this
>>>> patch really only does damage prevention. Your proposal of preventing
>>>> hierarchies sounds useful in the long run to keep the backlogs sane.
>>>>
>>>> I did run all the tdc tests on the latest net tree and they passed. Also
>>>> my HFSC reproducer does not trigger with the proposed patch. I do not have
>>>> a simple reproducer at hand for the QFQ tree case that you mentioned. So
>>>> please verify this too if you can.
>>>>
>>>> Otherwise please feel free to go forward with the patch. If I can add
>>>> anything else to the discussion please let me know.
>>>>
>>>
>>> Please post the patch formally as per Cong request. A tdc test case of
>>> the reproducer would also help.
>>>
>>> cheers,
>>> jamal
>>
>> I sent a patch, though I am not terribly familiar with the tdc test case
>> infrastructure. If it is a no-op for you to translate the repro above into
>> the required format, please feel free to do that and post a patch for that.
>> Otherwise I can have a closer look at it tomorrow.
>>
> 
> We'll help out this time - but it is a good idea to for you to learn
> how to do it if you are going to keep finding issues on tc ;->

Lion, I attached a patch to this email that edits Cong's original tdc test
case to account for your reproducer. Please resend your patch with it (after
the 24 hour wait period).

cheers,
Victor

View attachment "0001-selftests-tc-testing-Fix-test-case-831d-to-reproduce.patch" of type "text/x-patch" (3141 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ