[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoMnd0nZxJW3zpEuBGWTwB3AnJSnj242f9hMpcLdBWdcbfQ@mail.gmail.com>
Date: Thu, 12 Jun 2025 18:08:16 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: William Liu <will@...lsroot.io>
Cc: Savy <savy@...t3mfailure.io>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>, Cong Wang <xiyou.wangcong@...il.com>,
Victor Nogueira <victor@...atatu.com>, Pedro Tammela <pctammela@...atatu.com>,
Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>,
Stephen Hemminger <stephen@...workplumber.org>, Davide Caratti <dcaratti@...hat.com>
Subject: Re: [BUG] net/sched: Soft Lockup/Task Hang and OOM Loop in netem_dequeue
Hi William,
Apologies again for the latency.
On Mon, Jun 9, 2025 at 11:31 AM William Liu <will@...lsroot.io> wrote:
>
> On Monday, June 9th, 2025 at 12:27 PM, Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> >
> > I didnt finish my thought on that: I meant just dont allow a second
> > netem to be added to a specific tree if one already exists. Dont
> > bother checking for duplication.
> >
> > cheers,
> > jamal
> >
> > > > [1] see "loopy fun" in https://lwn.net/Articles/719297/
> > >
> Hi Jamal,
>
> I came up with the following fix last night to disallow adding a netem qdisc if one of its ancestral qdiscs is a netem. It's just a draft -I will clean it up, move qdisc_match_from_root to sch_generic, add test cases, and submit a formal patchset for review if it looks good to you. Please let us know if you catch any edge cases or correctness issues we might be missing.
>
It is a reasonable approach for fixing the obvious case you are
facing. But I am still concerned.
Potentially if you had another netem on a different branch of the tree
it may still be problematic.
Consider a prio qdisc with 3 bands each with a netem child with duplication on.
Your solution will solve it for each branch if one tries to add a
netem child to any of these netems.
But consider you have a filter on the root qdisc or some intermediate
qdisc and an associated action that changes skb->prio; when it hits
netem and gets duplicated then when it goes back to the root it may be
classified by prio to a different netem which will duplicate and on
and on..
BTW: I gave the example of skb->prio but this could be skb->mark.
Perhaps we should only allow one netem per tree or allow more but
check for duplication and only allow one per tree...
cheers,
jamal
> Also, please let us know if you would us to bring in fixes for the 2 other small issues we discussed previously - moving the duplication after the initial enqueue to more accurately respect the limit check, and having loss take priority over duplication.
>
> We tested with the following configurations, all of which are illegal now when we add the second netem (tc prints out RTNETLINK answers: Invalid argument).
>
> Netem parent is netem:
> tc qdisc add dev lo root handle 1: netem limit 1 duplicate 100%
> tc qdisc add dev lo parent 1: handle 2: netem gap 1 limit 1 duplicate 100% delay 1us reorder 100%
>
> Qdisc tree root is netem:
> tc qdisc add dev lo root handle 1:0 netem limit 1 duplicate 100%
> tc qdisc add dev lo parent 1:0 handle 2:0 hfsc
> tc class add dev lo parent 2:0 classid 2:1 hfsc rt m2 10Mbit
> tc qdisc add dev lo parent 2:1 handle 3:0 netem duplicate 100%
> tc class add dev lo parent 2:0 classid 2:2 hfsc rt m2 10Mbit
> tc qdisc add dev lo parent 2:2 handle 4:0 netem duplicate 100%
>
> netem grandparent is netem:
> tc qdisc add dev lo root handle 1:0 tbf rate 8bit burst 100b latency 1s
> tc qdisc add dev lo parent 1:0 handle 2:0 netem gap 1 limit 1 duplicate 100% delay 1us reorder 100%
> tc qdisc add dev lo parent 2:0 handle 3:0 hfsc
> tc class add dev lo parent 3:0 classid 3:1 hfsc rt m2 10Mbit
> tc qdisc add dev lo parent 3:1 handle 4:0 netem duplicate 100%
>
> netem great-grandparent is netem:
> tc qdisc add dev lo root handle 1:0 netem limit 1 duplicate 100%
> tc qdisc add dev lo parent 1:0 handle 2:0 hfsc
> tc class add dev lo parent 2:0 classid 2:1 hfsc rt m2 10Mbit
> tc qdisc add dev lo parent 2:1 handle 3:0 tbf rate 8bit burst 100b latency 1s
> tc qdisc add dev lo parent 3:0 handle 4:0 netem duplicate 100%
>
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..6178cd1453c5 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -1085,6 +1085,52 @@ static int netem_change(struct Qdisc *sch, struct nlattr *opt,
> return ret;
> }
>
> +static const struct Qdisc_class_ops netem_class_ops;
> +
> +static struct Qdisc *qdisc_match_from_root(struct Qdisc *root, u32 handle)
> +{
> + struct Qdisc *q;
> +
> + if (!qdisc_dev(root))
> + return (root->handle == handle ? root : NULL);
> +
> + if (!(root->flags & TCQ_F_BUILTIN) &&
> + root->handle == handle)
> + return root;
> +
> + hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, handle,
> + lockdep_rtnl_is_held()) {
> + if (q->handle == handle)
> + return q;
> + }
> + return NULL;
> +}
> +
> +static bool has_netem_ancestor(struct Qdisc *sch) {
> + struct Qdisc *root, *parent, *curr;
> + bool ret = false;
> +
> + sch_tree_lock(sch);
> + curr = sch;
> + root = qdisc_root_sleeping(sch);
> + parent = qdisc_match_from_root(root, TC_H_MAJ(curr->parent));
> +
> + while (parent != NULL) {
> + if (parent->ops->cl_ops == &netem_class_ops) {
> + ret = true;
> + pr_warn("Ancestral netem already exists, cannot nest netem");
> + goto unlock;
> + }
> +
> + curr = parent;
> + parent = qdisc_match_from_root(root, TC_H_MAJ(curr->parent));
> + }
> +
> +unlock:
> + sch_tree_unlock(sch);
> + return ret;
> +}
> +
> static int netem_init(struct Qdisc *sch, struct nlattr *opt,
> struct netlink_ext_ack *extack)
> {
> @@ -1093,6 +1139,9 @@ static int netem_init(struct Qdisc *sch, struct nlattr *opt,
>
> qdisc_watchdog_init(&q->watchdog, sch);
>
> + if (has_netem_ancestor(sch))
> + return -EINVAL;
> +
> if (!opt)
> return -EINVAL;
>
> @@ -1330,3 +1379,4 @@ module_init(netem_module_init)
> module_exit(netem_module_exit)
> MODULE_LICENSE("GPL");
> MODULE_DESCRIPTION("Network characteristics emulator qdisc");
Powered by blists - more mailing lists