[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMmuL7-pOqQZMA6Y0WW_zDzpbyRsw0HRHzn0RV=An9gsRw@mail.gmail.com>
Date: Tue, 8 Jul 2025 18:26:28 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: William Liu <will@...lsroot.io>, netdev@...r.kernel.org, victor@...atatu.com,
pctammela@...atatu.com, pabeni@...hat.com, kuba@...nel.org,
stephen@...workplumber.org, dcaratti@...hat.com, savy@...t3mfailure.io,
jiri@...nulli.us, davem@...emloft.net, edumazet@...gle.com, horms@...nel.org,
linux-kernel@...r.kernel.org, torvalds@...ux-foundation.org
Subject: Re: This breaks netem use cases
On Tue, Jul 8, 2025 at 5:32 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> (Cc Linus Torvalds)
>
> On Tue, Jul 08, 2025 at 04:35:37PM -0400, Jamal Hadi Salim wrote:
> > On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > >
> > > (Cc LKML for more audience, since this clearly breaks potentially useful
> > > use cases)
> > >
> > > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> > > > netem_enqueue's duplication prevention logic breaks when a netem
> > > > resides in a qdisc tree with other netems - this can lead to a
> > > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > > Ensure that a duplicating netem cannot exist in a tree with other
> > > > netems.
> > >
> > > As I already warned in your previous patchset, this breaks the following
> > > potentially useful use case:
> > >
> > > sudo tc qdisc add dev eth0 root handle 1: mq
> > > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> > >
> > > I don't see any logical problem of such use case, therefore we should
> > > consider it as valid, we can't break it.
> > >
> >
> > I thought we are trying to provide an intermediate solution to plug an
> > existing hole and come up with a longer term solution.
>
> Breaking valid use cases even for a short period is still no way to go.
> Sorry, Jamal. Since I can't convince you, please ask Linus.
>
> Also, I don't see you have proposed any long term solution. If you
> really have one, please state it clearly and provide a clear timeline to
> users.
>
I explained my approach a few times: We need to come up with a long
term solution that looks at the sanity of hierarchies.
Equivalent to init/change()
Today we only look at netlink requests for a specific qdisc. The new
approach (possibly an ops) will also look at the sanity of configs in
relation to hierarchies.
You can work on it or come with an alternative proposal.
That is not the scope of this discussion though
> > If there are users of such a "potential setup" you show above we are
> > going to find out very quickly.
>
> Please read the above specific example. It is more than just valid, it
> is very reasonable, installing netem for each queue is the right way of
> using netem duplication to avoid the global root spinlock in a multiqueue
> setup.
>
In all my years working on tc I have never seen _anyone_ using
duplication where netem is _not the root qdisc_. And i have done a lot
of "support" in this area.
You can craft any example you want but it needs to be practical - I
dont see the practicality in your example.
Just because we allow arbitrary crafting of hierarchies doesnt mean
they are correct.
The choice is between complicating things to fix a "potential" corner
use case vs simplicity (especially of a short term approach that is
intended to be obsoleted in the long term).
> Breaking users and letting them complain is not a good strategy either.
>
> On the other hand, thanks for acknowledging it breaks users, which
> confirms my point.
>
> I will wait for Linus' response.
>
> > We are working against security people who are finding all sorts of
> > "potential use cases" to create CVEs.
>
> I seriouly doubt the urgency of those CVE's, because none of them can be
> triggered without root. Please don't get me wrong, I already fixed many of
> them, but I believe they can wait, false urgency does not help anything.
>
All tc rules require root including your example - afaik, bounties
are being given for unprivileged user namespaces
> >
> > > >
> > > > Previous approaches suggested in discussions in chronological order:
> > > >
> > > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > > too specific a use case to extend such a struct, though this would
> > > > be a resilient fix and address other previous and potential future
> > > > DOS bugs like the one described in loopy fun [2].
> > >
> > > The link you provid is from 8 years ago, since then the redirection
> > > logic has been improved. I am not sure why it helps to justify your
> > > refusal of this approach.
> > >
> > > I also strongly disagree with "too specific a use case to extend such
> > > a struct", we simply have so many use-case-specific fields within
> > > sk_buff->cb. For example, the tc_skb_cb->zone is very specific
> > > for act_ct.
> > >
> > > skb->cb is precisely designed to be use-case-specific and layer-specific.
> > >
> > > None of the above points stands.
> > >
> >
> > I doubt you have looked at the code based on how you keep coming back
> > with the same points.
>
> Please avoid personal attacks. It helps nothing to your argument here,
> in fact, it will only weaken your arguments.
>
How is this a personal attack? You posted a patch that breaks things further.
I pointed it to you _multiple times_. You still posted it as a solution!
> > > >
> > > > 2) Restrict netem_enqueue recursion depth like in act_mirred with a
> > > > per cpu variable. However, netem_dequeue can call enqueue on its
> > > > child, and the depth restriction could be bypassed if the child is a
> > > > netem.
> > > >
> > > > 3) Use the same approach as in 2, but add metadata in netem_skb_cb
> > > > to handle the netem_dequeue case and track a packet's involvement
> > > > in duplication. This is an overly complex approach, and Jamal
> > > > notes that the skb cb can be overwritten to circumvent this
> > > > safeguard.
> > >
> > > This is not true, except qdisc_skb_cb(skb)->data, other area of
> > > skb->cb is preserved within Qdisc layer.
> > >
> >
> > Your approach has issues as i pointed out. At minimum invest the time
> > please and look at code.
>
> Sure, no one's patch is perfect. I am open to improve any of my patch
> First, let's discard this user-breaking patch for disatractions and
> start focusing other solutions (not necessarily mine).
>
> If it could help you, I can set the author to be you.
You think i am after getting my name on patches after all these years?
The patch is not sent by me - it is William's. There's zero credit on
my name. I could have written the patch myself, instead i guided a new
contributor on a path forward. That is my general approach to these
things.
> I have no interest
> to take any credit out of here, the reason why I sent out a patch is only
> because you asked me to help.
>
I think you are still missing the point. Let William get his patch in.
cheers,
jamal
> > I am certain you could keep changing other code outside of netem and
> > fix all issues you are exposing.
> > We agreed this is for a short term solution and needs to be contained
>
> I never agreed with you on breaking users, to me it is out of table for
> discussion. Just to clarify.
>
> > on just netem, what is the point of this whole long discussion really?
>
> To defend users, obviously.
>
> > We have spent over a month already..
>
> Sorry to hear that, I think you are going to a wrong direction. The
> earlier you switch to a right direction, the sooner we will have a right
> solution without breaking any users.
>
> Once again, please let me know how I specifically can help you out of
> this situation. I am here to solve problems, not to bring one.
>
> (If you need a video call for help, my calendar is open.)
>
> Thanks for your understanding!
Powered by blists - more mailing lists