[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aHCe3nznEtF/1MHq@pop-os.localdomain>
Date: Thu, 10 Jul 2025 22:19:26 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: William Liu <will@...lsroot.io>, netdev@...r.kernel.org,
victor@...atatu.com, pctammela@...atatu.com, pabeni@...hat.com,
kuba@...nel.org, stephen@...workplumber.org, dcaratti@...hat.com,
savy@...t3mfailure.io, jiri@...nulli.us, davem@...emloft.net,
edumazet@...gle.com, horms@...nel.org, linux-kernel@...r.kernel.org,
torvalds@...ux-foundation.org
Subject: Re: This breaks netem use cases
On Tue, Jul 08, 2025 at 06:26:28PM -0400, Jamal Hadi Salim wrote:
> On Tue, Jul 8, 2025 at 5:32 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> >
> > (Cc Linus Torvalds)
> >
> > On Tue, Jul 08, 2025 at 04:35:37PM -0400, Jamal Hadi Salim wrote:
> > > On Tue, Jul 8, 2025 at 3:42 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > >
> > > > (Cc LKML for more audience, since this clearly breaks potentially useful
> > > > use cases)
> > > >
> > > > On Tue, Jul 08, 2025 at 04:43:26PM +0000, William Liu wrote:
> > > > > netem_enqueue's duplication prevention logic breaks when a netem
> > > > > resides in a qdisc tree with other netems - this can lead to a
> > > > > soft lockup and OOM loop in netem_dequeue, as seen in [1].
> > > > > Ensure that a duplicating netem cannot exist in a tree with other
> > > > > netems.
> > > >
> > > > As I already warned in your previous patchset, this breaks the following
> > > > potentially useful use case:
> > > >
> > > > sudo tc qdisc add dev eth0 root handle 1: mq
> > > > sudo tc qdisc add dev eth0 parent 1:1 handle 10: netem duplicate 100%
> > > > sudo tc qdisc add dev eth0 parent 1:2 handle 20: netem duplicate 100%
> > > >
> > > > I don't see any logical problem of such use case, therefore we should
> > > > consider it as valid, we can't break it.
> > > >
> > >
> > > I thought we are trying to provide an intermediate solution to plug an
> > > existing hole and come up with a longer term solution.
> >
> > Breaking valid use cases even for a short period is still no way to go.
> > Sorry, Jamal. Since I can't convince you, please ask Linus.
> >
> > Also, I don't see you have proposed any long term solution. If you
> > really have one, please state it clearly and provide a clear timeline to
> > users.
> >
>
> I explained my approach a few times: We need to come up with a long
> term solution that looks at the sanity of hierarchies.
I interpret as you have no long term solution, so without any long term
solution, how do you convince users you will unbreak them after breaking
them? This looks more and more concerning.
> Equivalent to init/change()
> Today we only look at netlink requests for a specific qdisc. The new
> approach (possibly an ops) will also look at the sanity of configs in
> relation to hierarchies.
> You can work on it or come with an alternative proposal.
You misunderstood this. It is never about me, mentioning me is not even
relevant. I defend users, please think for users, not me (or youself).
If you think from users' perspective, you wouldn't even suggest breaking
any of their use cases for any time. All what you said here is from your
own perspective, surely you understand all the TC details, but users
don't.
> That is not the scope of this discussion though
>
> > > If there are users of such a "potential setup" you show above we are
> > > going to find out very quickly.
> >
> > Please read the above specific example. It is more than just valid, it
> > is very reasonable, installing netem for each queue is the right way of
> > using netem duplication to avoid the global root spinlock in a multiqueue
> > setup.
> >
>
> In all my years working on tc I have never seen _anyone_ using
> duplication where netem is _not the root qdisc_. And i have done a lot
> of "support" in this area.
> You can craft any example you want but it needs to be practical - I
> dont see the practicality in your example.
The example I provide is real and practical, in fact, it is _the only_
reasonable way to use netem duplication directly on multiqueue NIC.
I bet you don't have another way (unless you don't care about the global
spinlock) in such setup.
> Just because we allow arbitrary crafting of hierarchies doesnt mean
> they are correct.
Can we let users decide? Why do we have the priviledge to decide
everything for users? Users are usually more correct us, this is why so
many bugs actually became features, it is just simply not up to us.
We serve users, not vice versa, apparently. Let's be humble.
> The choice is between complicating things to fix a "potential" corner
> use case vs simplicity (especially of a short term approach that is
> intended to be obsoleted in the long term).
I don't see any simplicity from your patch, it is not maintainable at
all (I already explained why and suggested a better way). 40-LOC vs
4+/4-, you call the 40LOC simplicity?
And this case is not corner, nor potential, it is valid and reasonable.
Downplaying use cases only hurts users.
>
>
> > Breaking users and letting them complain is not a good strategy either.
> >
> > On the other hand, thanks for acknowledging it breaks users, which
> > confirms my point.
> >
> > I will wait for Linus' response.
> >
> > > We are working against security people who are finding all sorts of
> > > "potential use cases" to create CVEs.
> >
> > I seriouly doubt the urgency of those CVE's, because none of them can be
> > triggered without root. Please don't get me wrong, I already fixed many of
> > them, but I believe they can wait, false urgency does not help anything.
> >
>
> All tc rules require root including your example - afaik, bounties
> are being given for unprivileged user namespaces
Sure, many CVE's have bounties. This does not mean all of them are
urgent. They are important, but just not urgent. Creating false urgency
is harmful for decision making.
> > >
> > > > >
> > > > > Previous approaches suggested in discussions in chronological order:
> > > > >
> > > > > 1) Track duplication status or ttl in the sk_buff struct. Considered
> > > > > too specific a use case to extend such a struct, though this would
> > > > > be a resilient fix and address other previous and potential future
> > > > > DOS bugs like the one described in loopy fun [2].
> > > >
> > > > The link you provid is from 8 years ago, since then the redirection
> > > > logic has been improved. I am not sure why it helps to justify your
> > > > refusal of this approach.
> > > >
> > > > I also strongly disagree with "too specific a use case to extend such
> > > > a struct", we simply have so many use-case-specific fields within
> > > > sk_buff->cb. For example, the tc_skb_cb->zone is very specific
> > > > for act_ct.
> > > >
> > > > skb->cb is precisely designed to be use-case-specific and layer-specific.
> > > >
> > > > None of the above points stands.
> > > >
> > >
> > > I doubt you have looked at the code based on how you keep coming back
> > > with the same points.
> >
> > Please avoid personal attacks. It helps nothing to your argument here,
> > in fact, it will only weaken your arguments.
> >
>
> How is this a personal attack? You posted a patch that breaks things further.
> I pointed it to you _multiple times_. You still posted it as a solution!
Maybe you are not helping at all? You kept mentioning "issues" without
even explaining what issues are.
Here, you keep mentioning I didn't look at the code base without saying
anything helpful. The fact is I looked at all the qdisc_skb_cb and
tc_skb_cb use cases, I tried to place a new field/bit in at least 3
different locations with multiple failures.
Maybe you need to be helpful and respectful?
Thanks.
Powered by blists - more mailing lists