lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=QxAJS4ZK68mup55O7wQFqkQds-p2Us3R0U-W6FK6krw@mail.gmail.com>
Date: Mon, 16 Jun 2025 16:41:09 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: William Liu <will@...lsroot.io>
Cc: Savy <savy@...t3mfailure.io>, 
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Cong Wang <xiyou.wangcong@...il.com>, 
	Victor Nogueira <victor@...atatu.com>, Pedro Tammela <pctammela@...atatu.com>, 
	Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>, 
	Stephen Hemminger <stephen@...workplumber.org>, Davide Caratti <dcaratti@...hat.com>
Subject: Re: [BUG] net/sched: Soft Lockup/Task Hang and OOM Loop in netem_dequeue

On Thu, Jun 12, 2025 at 11:34 PM William Liu <will@...lsroot.io> wrote:
>
> On Thursday, June 12th, 2025 at 10:08 PM, Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> >
> >
> > Hi William,
> > Apologies again for the latency.
> >
> > On Mon, Jun 9, 2025 at 11:31 AM William Liu will@...lsroot.io wrote:
> >
> > > On Monday, June 9th, 2025 at 12:27 PM, Jamal Hadi Salim jhs@...atatu.com wrote:
> >
> > > > I didnt finish my thought on that: I meant just dont allow a second
> > > > netem to be added to a specific tree if one already exists. Dont
> > > > bother checking for duplication.
> > > >
> > > > cheers,
> > > > jamal
> > > >
> > > > > > [1] see "loopy fun" in https://lwn.net/Articles/719297/
> > >
> > > Hi Jamal,
> > >
> > > I came up with the following fix last night to disallow adding a netem qdisc if one of its ancestral qdiscs is a netem. It's just a draft -I will clean it up, move qdisc_match_from_root to sch_generic, add test cases, and submit a formal patchset for review if it looks good to you. Please let us know if you catch any edge cases or correctness issues we might be missing.
> >
> >
> > It is a reasonable approach for fixing the obvious case you are
> > facing. But I am still concerned.
> > Potentially if you had another netem on a different branch of the tree
> > it may still be problematic.
> > Consider a prio qdisc with 3 bands each with a netem child with duplication on.
> > Your solution will solve it for each branch if one tries to add a
> > netem child to any of these netems.
> >
> > But consider you have a filter on the root qdisc or some intermediate
> > qdisc and an associated action that changes skb->prio; when it hits
> >
> > netem and gets duplicated then when it goes back to the root it may be
> > classified by prio to a different netem which will duplicate and on
> > and on..
> > BTW: I gave the example of skb->prio but this could be skb->mark.
> >
>
> Ah, good catch. I attached the modified patch below then.
>
> >
> > Perhaps we should only allow one netem per tree or allow more but
> > check for duplication and only allow one per tree...
> >
>
> I believe we have to keep it at one netem per tree. The OOM loop can still trigger if a netem without duplication has a netem child with duplication. Consider the following setup:
>
> tc qdisc add dev lo root handle 1: netem limit 1
> tc qdisc add dev lo parent 1: handle 2: netem gap 1 limit 1 duplicate 100% delay 1us reorder 100%
>

I didnt explain it clearly - I meant if you have a netem that has
duplication then dont allow another netem within the tree. Your patch
is in the right direction but does not check for duplication.

> The first netem will store everything in the tfifo queue on netem_enqueue. Since netem_dequeue calls enqueue on the child, and the child will duplicate every packet back to the root, the first netem's netem_dequeue will never exit the tfifo_loop.
>
> Best,
> William
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..4db5df202403 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -1085,6 +1085,36 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
>         return ret;
>  }
>
> +static const struct Qdisc_class_ops netem_class_ops;
> +
> +static bool has_netem_in_tree(struct Qdisc *sch) {
> +       struct Qdisc *root, *q;
> +       unsigned int i;
> +       bool ret = false;
> +
> +       sch_tree_lock(sch);
> +       root = qdisc_root_sleeping(sch);

Hrm. starting with qdisc_root_sleeping() seems quarky. Take a look at
dump() - and dig into something (off top of my head) like
hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash)

> +
> +       if (root->ops->cl_ops == &netem_class_ops) {
> +               ret = true;
> +               goto unlock;
> +       }
> +
> +       hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, i, q, hash) {
> +               if (q->ops->cl_ops == &netem_class_ops) {
> +                       ret = true;
> +                       goto unlock;
> +               }
> +       }
> +
> +unlock:
> +       if (ret)
> +               pr_warn("Cannot have multiple netems in tree\n");

No point to the pr_warn()

> +
> +       sch_tree_unlock(sch);
> +       return ret;
> +}
> +
>  static int netem_init(struct Qdisc *sch, struct nlattr *opt,
>                       struct netlink_ext_ack *extack)
>  {
> @@ -1093,6 +1123,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
>
>         qdisc_watchdog_init(&q->watchdog, sch);
>
> +       if (has_netem_in_tree(sch))
> +               return -EINVAL;

set ext_ack to contain the string you had earlier ("Cannot have
multiple netems in tree") and user space will be able to see it.

It will be easy to check the existing qdisc for netem_sched_data->duplicate
Also, please dont forget to run tdc tests and contribute one or more
testcase that captures the essence of what you are pushing here.

cheers,
jamal

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ