lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <KqiixybBnBLRGzU7hRxP5bpji1w9tvkkNJVawNPK04HV4Sq66HwoXkfvY-zUb-igUkh0WT0BdfbBmjpkA0H-tES79qTMRM9OuFH5HUsYwJs=@willsroot.io>
Date: Mon, 16 Jun 2025 21:13:01 +0000
From: William Liu <will@...lsroot.io>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Savy <savy@...t3mfailure.io>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Cong Wang <xiyou.wangcong@...il.com>, Victor Nogueira <victor@...atatu.com>, Pedro Tammela <pctammela@...atatu.com>, Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>, Stephen Hemminger <stephen@...workplumber.org>, Davide Caratti <dcaratti@...hat.com>
Subject: Re: [BUG] net/sched: Soft Lockup/Task Hang and OOM Loop in netem_dequeue






On Monday, June 16th, 2025 at 8:41 PM, Jamal Hadi Salim <jhs@...atatu.com> wrote:

> 
> 
> On Thu, Jun 12, 2025 at 11:34 PM William Liu will@...lsroot.io wrote:
> 
> > On Thursday, June 12th, 2025 at 10:08 PM, Jamal Hadi Salim jhs@...atatu.com wrote:
> > 
> > > Hi William,
> > > Apologies again for the latency.
> > > 
> > > On Mon, Jun 9, 2025 at 11:31 AM William Liu will@...lsroot.io wrote:
> > > 
> > > > On Monday, June 9th, 2025 at 12:27 PM, Jamal Hadi Salim jhs@...atatu.com wrote:
> > > 
> > > > > I didnt finish my thought on that: I meant just dont allow a second
> > > > > netem to be added to a specific tree if one already exists. Dont
> > > > > bother checking for duplication.
> > > > > 
> > > > > cheers,
> > > > > jamal
> > > > > 
> > > > > > > [1] see "loopy fun" in https://lwn.net/Articles/719297/
> > > > 
> > > > Hi Jamal,
> > > > 
> > > > I came up with the following fix last night to disallow adding a netem qdisc if one of its ancestral qdiscs is a netem. It's just a draft -I will clean it up, move qdisc_match_from_root to sch_generic, add test cases, and submit a formal patchset for review if it looks good to you. Please let us know if you catch any edge cases or correctness issues we might be missing.
> > > 
> > > It is a reasonable approach for fixing the obvious case you are
> > > facing. But I am still concerned.
> > > Potentially if you had another netem on a different branch of the tree
> > > it may still be problematic.
> > > Consider a prio qdisc with 3 bands each with a netem child with duplication on.
> > > Your solution will solve it for each branch if one tries to add a
> > > netem child to any of these netems.
> > > 
> > > But consider you have a filter on the root qdisc or some intermediate
> > > qdisc and an associated action that changes skb->prio; when it hits
> > > 
> > > netem and gets duplicated then when it goes back to the root it may be
> > > classified by prio to a different netem which will duplicate and on
> > > and on..
> > > BTW: I gave the example of skb->prio but this could be skb->mark.
> > 
> > Ah, good catch. I attached the modified patch below then.
> > 
> > > Perhaps we should only allow one netem per tree or allow more but
> > > check for duplication and only allow one per tree...
> > 
> > I believe we have to keep it at one netem per tree. The OOM loop can still trigger if a netem without duplication has a netem child with duplication. Consider the following setup:
> > 
> > tc qdisc add dev lo root handle 1: netem limit 1
> > tc qdisc add dev lo parent 1: handle 2: netem gap 1 limit 1 duplicate 100% delay 1us reorder 100%
> 
> 
> I didnt explain it clearly - I meant if you have a netem that has
> duplication then dont allow another netem within the tree. Your patch
> is in the right direction but does not check for duplication.
> 
> > The first netem will store everything in the tfifo queue on netem_enqueue. Since netem_dequeue calls enqueue on the child, and the child will duplicate every packet back to the root, the first netem's netem_dequeue will never exit the tfifo_loop.
> > 
> > Best,
> > William
> > 
> > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > index fdd79d3ccd8c..4db5df202403 100644
> > --- a/net/sched/sch_netem.c
> > +++ b/net/sched/sch_netem.c
> > @@ -1085,6 +1085,36 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
> > return ret;
> > }
> > 
> > +static const struct Qdisc_class_ops netem_class_ops;
> > +
> > +static bool has_netem_in_tree(struct Qdisc *sch) {
> > + struct Qdisc *root, *q;
> > + unsigned int i;
> > + bool ret = false;
> > +
> > + sch_tree_lock(sch);
> > + root = qdisc_root_sleeping(sch);
> 
> 
> Hrm. starting with qdisc_root_sleeping() seems quarky. Take a look at
> dump() - and dig into something (off top of my head) like
> hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash)
> 

What is the issue here? My understanding is that the hashmap does not hold the root - sch_api.c shows edge cases for the root node when adding and deleting from the hashmap).

> > +
> > + if (root->ops->cl_ops == &netem_class_ops) {
> > + ret = true;
> > + goto unlock;
> > + }
> > +
> > + hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, i, q, hash) {
> > + if (q->ops->cl_ops == &netem_class_ops) {
> > + ret = true;
> > + goto unlock;
> > + }
> > + }
> > +
> > +unlock:
> > + if (ret)
> > + pr_warn("Cannot have multiple netems in tree\n");
> 
> 
> No point to the pr_warn()
> 
> > +
> > + sch_tree_unlock(sch);
> > + return ret;
> > +}
> > +
> > static int netem_init(struct Qdisc *sch, struct nlattr *opt,
> > struct netlink_ext_ack *extack)
> > {
> > @@ -1093,6 +1123,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
> > 
> > qdisc_watchdog_init(&q->watchdog, sch);
> > 
> > + if (has_netem_in_tree(sch))
> > + return -EINVAL;
> 
> 
> set ext_ack to contain the string you had earlier ("Cannot have
> multiple netems in tree") and user space will be able to see it.
> 
> It will be easy to check the existing qdisc for netem_sched_data->duplicate
> 
> Also, please dont forget to run tdc tests and contribute one or more
> testcase that captures the essence of what you are pushing here.
> 
> cheers,
> jamal

Best,
William

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ