lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoMmC9nuzEB0ydb5VZh8NRZQcfZ=TmFxQ82CLg1S2Ew8ZWw@mail.gmail.com>
Date: Wed, 18 Jun 2025 16:09:53 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: William Liu <will@...lsroot.io>
Cc: Savy <savy@...t3mfailure.io>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Cong Wang <xiyou.wangcong@...il.com>, 
	Victor Nogueira <victor@...atatu.com>, Pedro Tammela <pctammela@...atatu.com>, 
	Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>, 
	Stephen Hemminger <stephen@...workplumber.org>, Davide Caratti <dcaratti@...hat.com>
Subject: Re: [BUG] net/sched: Soft Lockup/Task Hang and OOM Loop in netem_dequeue

On Mon, Jun 16, 2025 at 5:13 PM William Liu <will@...lsroot.io> wrote:
>
>>
> On Monday, June 16th, 2025 at 8:41 PM, Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> >
> >
> > On Thu, Jun 12, 2025 at 11:34 PM William Liu will@...lsroot.io wrote:
> >
> > > On Thursday, June 12th, 2025 at 10:08 PM, Jamal Hadi Salim jhs@...atatu.com wrote:
> > >
> > > > Hi William,
> > > > Apologies again for the latency.
> > > >
> > > > On Mon, Jun 9, 2025 at 11:31 AM William Liu will@...lsroot.io wrote:
> > > >
> > > > > On Monday, June 9th, 2025 at 12:27 PM, Jamal Hadi Salim jhs@...atatu.com wrote:
> > > >
> > > > > > I didnt finish my thought on that: I meant just dont allow a second
> > > > > > netem to be added to a specific tree if one already exists. Dont
> > > > > > bother checking for duplication.
> > > > > >
> > > > > > cheers,
> > > > > > jamal
> > > > > >
> > > > > > > > [1] see "loopy fun" in https://lwn.net/Articles/719297/
> > > > >
> > > > > Hi Jamal,
> > > > >
> > > > > I came up with the following fix last night to disallow adding a netem qdisc if one of its ancestral qdiscs is a netem. It's just a draft -I will clean it up, move qdisc_match_from_root to sch_generic, add test cases, and submit a formal patchset for review if it looks good to you. Please let us know if you catch any edge cases or correctness issues we might be missing.
> > > >
> > > > It is a reasonable approach for fixing the obvious case you are
> > > > facing. But I am still concerned.
> > > > Potentially if you had another netem on a different branch of the tree
> > > > it may still be problematic.
> > > > Consider a prio qdisc with 3 bands each with a netem child with duplication on.
> > > > Your solution will solve it for each branch if one tries to add a
> > > > netem child to any of these netems.
> > > >
> > > > But consider you have a filter on the root qdisc or some intermediate
> > > > qdisc and an associated action that changes skb->prio; when it hits
> > > >
> > > > netem and gets duplicated then when it goes back to the root it may be
> > > > classified by prio to a different netem which will duplicate and on
> > > > and on..
> > > > BTW: I gave the example of skb->prio but this could be skb->mark.
> > >
> > > Ah, good catch. I attached the modified patch below then.
> > >
> > > > Perhaps we should only allow one netem per tree or allow more but
> > > > check for duplication and only allow one per tree...
> > >
> > > I believe we have to keep it at one netem per tree. The OOM loop can still trigger if a netem without duplication has a netem child with duplication. Consider the following setup:
> > >
> > > tc qdisc add dev lo root handle 1: netem limit 1
> > > tc qdisc add dev lo parent 1: handle 2: netem gap 1 limit 1 duplicate 100% delay 1us reorder 100%
> >
> >
> > I didnt explain it clearly - I meant if you have a netem that has
> > duplication then dont allow another netem within the tree. Your patch
> > is in the right direction but does not check for duplication.
> >
> > > The first netem will store everything in the tfifo queue on netem_enqueue. Since netem_dequeue calls enqueue on the child, and the child will duplicate every packet back to the root, the first netem's netem_dequeue will never exit the tfifo_loop.
> > >
> > > Best,
> > > William
> > >
> > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > index fdd79d3ccd8c..4db5df202403 100644
> > > --- a/net/sched/sch_netem.c
> > > +++ b/net/sched/sch_netem.c
> > > @@ -1085,6 +1085,36 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
> > > return ret;
> > > }
> > >
> > > +static const struct Qdisc_class_ops netem_class_ops;
> > > +
> > > +static bool has_netem_in_tree(struct Qdisc *sch) {
> > > + struct Qdisc *root, *q;
> > > + unsigned int i;
> > > + bool ret = false;
> > > +
> > > + sch_tree_lock(sch);
> > > + root = qdisc_root_sleeping(sch);
> >
> >
> > Hrm. starting with qdisc_root_sleeping() seems quarky. Take a look at
> > dump() - and dig into something (off top of my head) like
> > hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash)
> >
>
> What is the issue here? My understanding is that the hashmap does not hold the root - sch_api.c shows edge cases for the root node when adding and deleting from the hashmap).
>

True. Ok, so no objection to using qdisc_root_sleeping() - I was
initially comparing it to what you would need to do in dump().
So more below in that relation (take  a look at dump):

+static const struct Qdisc_class_ops netem_class_ops;
+
+static bool has_netem_in_tree(struct Qdisc *sch) {
+       struct Qdisc *root, *q;
+       unsigned int i;
+       bool ret = false;
+
+       sch_tree_lock(sch);

You are already have rtnl being held by this point. No need to hold this lock.

+       root = qdisc_root_sleeping(sch);
+
+       if (root->ops->cl_ops == &netem_class_ops) {
+               ret = true;
+               goto unlock;
+       }
+


You dont need rcu here - again look at dump() hash_for_each() is the
correct thing to do.

+       hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, i, q, hash) {
+               if (q->ops->cl_ops == &netem_class_ops) {
+                       ret = true;
+                       goto unlock;
+               }
+       }
+

And therefore unlock etc becomes unnecessary.

cheers,
jamal

+unlock:
+       if (ret)
+               pr_warn("Cannot have multiple netems in tree\n");
+
+       sch_tree_unlock(sch);
+       return ret;
+}


> > > +
> > > + if (root->ops->cl_ops == &netem_class_ops) {
> > > + ret = true;
> > > + goto unlock;
> > > + }
> > > +
> > > + hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, i, q, hash) {
> > > + if (q->ops->cl_ops == &netem_class_ops) {
> > > + ret = true;
> > > + goto unlock;
> > > + }
> > > + }
> > > +
> > > +unlock:
> > > + if (ret)
> > > + pr_warn("Cannot have multiple netems in tree\n");
> >
> >
> > No point to the pr_warn()
> >
> > > +
> > > + sch_tree_unlock(sch);
> > > + return ret;
> > > +}
> > > +
> > > static int netem_init(struct Qdisc *sch, struct nlattr *opt,
> > > struct netlink_ext_ack *extack)
> > > {
> > > @@ -1093,6 +1123,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
> > >
> > > qdisc_watchdog_init(&q->watchdog, sch);
> > >
> > > + if (has_netem_in_tree(sch))
> > > + return -EINVAL;
> >
> >
> > set ext_ack to contain the string you had earlier ("Cannot have
> > multiple netems in tree") and user space will be able to see it.
> >
> > It will be easy to check the existing qdisc for netem_sched_data->duplicate
> >
> > Also, please dont forget to run tdc tests and contribute one or more
> > testcase that captures the essence of what you are pushing here.
> >
> > cheers,
> > jamal
>
> Best,
> William

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ