[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <DglTO9NHmtFTRrCJf07R16_tYUUqoTV7M0hID_k-ryn5mAhe4ADq1mBpAuxNK24ZTnzIPaPq4x1woAtqZGXgAQS4k64C4SGRCfupe3H3dRs=@willsroot.io>
Date: Tue, 20 May 2025 14:04:28 +0000
From: William Liu <will@...lsroot.io>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: "netdev@...r.kernel.org" <netdev@...r.kernel.org>, Savy <savy@...t3mfailure.io>, Cong Wang <xiyou.wangcong@...il.com>, Victor Nogueira <victor@...atatu.com>, Pedro Tammela <pctammela@...atatu.com>, Paolo Abeni <pabeni@...hat.com>, Jakub Kicinski <kuba@...nel.org>, Stephen Hemminger <stephen@...workplumber.org>, Davide Caratti <dcaratti@...hat.com>
Subject: Re: [BUG] net/sched: Soft Lockup/Task Hang and OOM Loop in netem_dequeue
On Saturday, May 17th, 2025 at 2:09 PM, Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
>
> Unfortunately we cant change the approach of looping to the root at
> this point because it may be expected behavior by some setups.
>
> To the setup in the script you shared: that is most certainly not a
> practical setup
> So, two approaches to resolve it:
> 1) Either we disallow altogether the ability to have a hierarchy of
> netems in a qdisc tree. It would mean to scan the tree on
> configuration and make sure there's not already a parent netem in
> existence. If you want to be more fine grained we can check that netem
> is not trying to duplicate packets
>
> or 2) we put a loop counter to make sure - for the case of netem - we
> dont loop more than once (which i believe is the design intent for
> netem).
>
> cheers,
> jamal
Hi Jamal,
We decided to opt for the second approach by adding a bitfield in sk_buff. We also moved the duplication to happen after the original sk_buff finishes the enqueue to avoid using a stale limit check value.
We also considered the first approach by traversing the tree and checking for netem_ops as a way to idenitfy the netem qdisc, but feel that this is not very elegant.
Of course, this approach would disable duplication in netem qdiscs
that are children of a netem qdisc with duplication. However, we
are not aware of a good way to do a real de-duplication check like that for sk_buff. Let us know what you think of this patch below.
Best,
Will
Savy
>From 33af24d4bef8b141b608fa513528a804f689f823 Mon Sep 17 00:00:00 2001
From: William Liu <will@...lsroot.io>
Date: Mon, 19 May 2025 08:46:15 -0700
Subject: [PATCH] net/sched: Fix duplication logic in netem_enqueue
netem_enqueue's duplication prevention logic is broken when multiple
netem qdiscs with duplication enabled are stacked together as seen
in [1]. Add a bit in sk_buff to track whether it has previously been
duplicated by netem. Also move the duplication logic to happen after
the initial packet has finished the enqueue process to preserve the
accuracy of the limit check.
[1] https://lore.kernel.org/netdev/8DuRWwfqjoRDLDmBMlIfbrsZg9Gx50DHJc1ilxsEBNe2D6NMoigR_eIRIG0LOjMc3r10nUUZtArXx4oZBIdUfZQrwjcQhdinnMis_0G7VEk=@willsroot.io/
Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
Reported-by: William Liu <will@...lsroot.io>
Reported-by: Savino Dicanosa <savy@...t3mfailure.io>
Signed-off-by: William Liu <will@...lsroot.io>
Signed-off-by: Savino Dicanosa <savy@...t3mfailure.io>
---
include/linux/skbuff.h | 4 ++++
net/sched/sch_netem.c | 31 +++++++++++++++----------------
2 files changed, 19 insertions(+), 16 deletions(-)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b974a277975a..e6b53af6322b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -844,6 +844,7 @@ enum skb_tstamp_type {
* @csum_level: indicates the number of consecutive checksums found in
* the packet minus one that have been verified as
* CHECKSUM_UNNECESSARY (max 3)
+ * @netem_duplicate: indicates that netem has already duplicated this packet
* @unreadable: indicates that at least 1 of the fragments in this skb is
* unreadable.
* @dst_pending_confirm: need to confirm neighbour
@@ -1026,6 +1027,9 @@ struct sk_buff {
__u8 slow_gro:1;
#if IS_ENABLED(CONFIG_IP_SCTP)
__u8 csum_not_inet:1;
+#endif
+#if IS_ENABLED(CONFIG_NET_SCH_NETEM)
+ __u8 netem_duplicate:1;
#endif
__u8 unreadable:1;
#if defined(CONFIG_NET_SCHED) || defined(CONFIG_NET_XGRESS)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index fdd79d3ccd8c..ce6e55b49acb 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
skb->prev = NULL;
/* Random duplication */
- if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
+ if (!skb->netem_duplicate && q->duplicate &&
+ q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
++count;
/* Drop packet? */
@@ -531,21 +532,6 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return NET_XMIT_DROP;
}
- /*
- * If doing duplication then re-insert at top of the
- * qdisc tree, since parent queuer expects that only one
- * skb will be queued.
- */
- if (skb2) {
- struct Qdisc *rootq = qdisc_root_bh(sch);
- u32 dupsave = q->duplicate; /* prevent duplicating a dup... */
-
- q->duplicate = 0;
- rootq->enqueue(skb2, rootq, to_free);
- q->duplicate = dupsave;
- skb2 = NULL;
- }
-
qdisc_qstats_backlog_inc(sch, skb);
cb = netem_skb_cb(skb);
@@ -613,6 +599,19 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
sch->qstats.requeues++;
}
+ /*
+ * If doing duplication then re-insert at top of the
+ * qdisc tree, since parent queuer expects that only one
+ * skb will be queued.
+ */
+ if (skb2) {
+ struct Qdisc *rootq = qdisc_root_bh(sch);
+
+ skb2->netem_duplicate = 1;
+ rootq->enqueue(skb2, rootq, to_free);
+ skb2 = NULL;
+ }
+
finish_segs:
if (skb2)
__qdisc_drop(skb2, to_free);
--
2.43.0
Powered by blists - more mailing lists