lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <pGE9OHWRSf4oJwC4gS0oPonBy8_0WsDthxgLzBYGBtMVeT_EDc-HAz8NbhJxcWe0NEUrf_a7Fyq2op5FVFujfc2KyO-I38Yx_HlQhFwB0Cs=@willsroot.io>
Date: Mon, 14 Jul 2025 02:30:26 +0000
From: William Liu <will@...lsroot.io>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: netdev@...r.kernel.org, jhs@...atatu.com, stephen@...workplumber.org, Savino Dicanosa <savy@...t3mfailure.io>
Subject: Re: [Patch v3 net 1/4] net_sched: Implement the right netem duplication behavior

On Sunday, July 13th, 2025 at 9:48 PM, Cong Wang <xiyou.wangcong@...il.com> wrote:

> 
> 
> In the old behavior, duplicated packets were sent back to the root qdisc,
> which could create dangerous infinite loops in hierarchical setups -
> imagine a scenario where each level of a multi-stage netem hierarchy kept
> feeding duplicates back to the top, potentially causing system instability
> or resource exhaustion.
> 
> The new behavior elegantly solves this by enqueueing duplicates to the same
> qdisc that created them, ensuring that packet duplication occurs exactly
> once per netem stage in a controlled, predictable manner. This change
> enables users to safely construct complex network emulation scenarios using
> netem hierarchies (like the 4x multiplication demonstrated in testing)
> without worrying about runaway packet generation, while still preserving
> the intended duplication effects.
> 
> Users can now confidently chain multiple netem qdiscs together to achieve
> sophisticated network impairment combinations, knowing that each stage will
> apply its effects exactly once to the packet flow, making network testing
> scenarios more reliable and results more deterministic.
> 
> I tested netem packet duplication in two configurations:
> 1. Nest netem-to-netem hierarchy using parent/child attachment
> 2. Single netem using prio qdisc with netem leaf
> 
> Setup commands and results:
> 
> Single netem hierarchy (prio + netem):
> tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> tc filter add dev lo parent 1:0 protocol ip matchall classid 1:1
> tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> 
> Result: 2x packet multiplication (1→2 packets)
> 2 echo requests + 4 echo replies = 6 total packets
> 
> Expected behavior: Only one netem stage exists in this hierarchy, so
> 1 ping becomes 2 packets (100% duplication). The 2 echo requests generate
> 2 echo replies, which also get duplicated to 4 replies, yielding the
> predictable total of 6 packets (2 requests + 4 replies).
> 
> Nest netem hierarchy (netem + netem):
> tc qdisc add dev lo root handle 1: netem limit 1000 duplicate 100%
> tc qdisc add dev lo parent 1: handle 2: netem limit 1000 duplicate 100%
> 
> Result: 4x packet multiplication (1→2→4 packets)
> 4 echo requests + 16 echo replies = 20 total packets
> 
> Expected behavior: Root netem duplicates 1 ping to 2 packets, child netem
> receives 2 packets and duplicates each to create 4 total packets. Since
> ping operates bidirectionally, 4 echo requests generate 4 echo replies,
> which also get duplicated through the same hierarchy (4→8→16), resulting
> in the predictable total of 20 packets (4 requests + 16 replies).
> 
> The new netem duplication behavior does not break the documented
> semantics of "creates a copy of the packet before queuing." The man page
> description remains true since duplication occurs before the queuing
> process, creating both original and duplicate packets that are then
> enqueued. The documentation does not specify which qdisc should receive
> the duplicates, only that copying happens before queuing. The implementation
> choice to enqueue duplicates to the same qdisc (rather than root) is an
> internal detail that maintains the documented behavior while preventing
> infinite loops in hierarchical configurations.
> 
> Fixes: 0afb51e72855 ("[PKT_SCHED]: netem: reinsert for duplication")
> Reported-by: William Liu will@...lsroot.io
> 
> Reported-by: Savino Dicanosa savy@...t3mfailure.io
> 
> Signed-off-by: Cong Wang xiyou.wangcong@...il.com
> 
> ---
> net/sched/sch_netem.c | 26 +++++++++++++++-----------
> 1 file changed, 15 insertions(+), 11 deletions(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index fdd79d3ccd8c..191f64bd68ff 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -165,6 +165,7 @@ struct netem_sched_data {
> */
> struct netem_skb_cb {
> u64 time_to_send;
> + u8 duplicate : 1;
> };
> 
> static inline struct netem_skb_cb *netem_skb_cb(struct sk_buff *skb)
> @@ -460,8 +461,16 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> skb->prev = NULL;
> 
> 
> /* Random duplication */
> - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> 
> - ++count;
> + if (q->duplicate) {
> 
> + bool dup = true;
> +
> + if (netem_skb_cb(skb)->duplicate) {
> 
> + netem_skb_cb(skb)->duplicate = 0;
> 
> + dup = false;
> + }
> + if (dup && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> 
> + ++count;
> + }
> 
> /* Drop packet? */
> if (loss_event(q)) {
> @@ -532,17 +541,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc sch,
> }
> 
> /
> - * If doing duplication then re-insert at top of the
> - * qdisc tree, since parent queuer expects that only one
> - * skb will be queued.
> + * If doing duplication then re-insert at the same qdisc,
> + * as going back to the root would induce loops.
> */
> if (skb2) {
> - struct Qdisc rootq = qdisc_root_bh(sch);
> - u32 dupsave = q->duplicate; / prevent duplicating a dup... */
> 
> -
> - q->duplicate = 0;
> 
> - rootq->enqueue(skb2, rootq, to_free);
> 
> - q->duplicate = dupsave;
> 
> + netem_skb_cb(skb2)->duplicate = 1;
> 
> + qdisc_enqueue(skb2, sch, to_free);
> skb2 = NULL;
> }
> 
> --
> 2.34.1

FWIW, I suggested changing this behavior to not enqueue from the root a while ago too on the security mailing list for the HFSC rsc bug (as the re-entrancy violated assumptions in other qdiscs), but was told some users might be expecting that behavior and we would break their setups.

If we really want to preserve the ability to have multiple duplicating netems in a tree, I think Jamal had a good suggestion here to rely on tc_skb_ext extensions [1].

However, I noted that there are implementation issues that we would have to deal with. Copying what I said there [2]:

"The tc_skb_ext approach has a problem... the config option that enables it is NET_TC_SKB_EXT. I assumed this is a generic name for skb extensions in the tc subsystem, but unfortunately this is hardcoded for NET_CLS_ACT recirculation support.

So what this means is we have the following choices:
1. Make SCH_NETEM depend on NET_CLS_ACT and NET_TC_SKB_EXT
2. Add "|| IS_ENABLED(CONFIG_SCH_NETEM)" next to "IS_ENABLED(CONFIG_NET_TC_SKB_EXT)"
3. Separate NET_TC_SKB_EXT and the idea of recirculation support. But I'm not sure how people feel about renaming config options. And this would require a small change to the Mellanox driver subsystem.

None of these sound too nice to do, and I'm not sure which approach to take. In an ideal world, 3 would be best, but I'm not sure how others would feel about all that just to account for a netem edge case."

Of course, we can add an extra extension enum for netem but that will just make this even messier imo.

[1] https://lore.kernel.org/netdev/CAM0EoMmBdZBzfUAms5-0hH5qF5ODvxWfgqrbHaGT6p3-uOD6vg@mail.gmail.com/
[2] https://lore.kernel.org/netdev/lhR3z8brE3wSKO4PDITIAGXGGW8vnrt1zIPo7C10g2rH0zdQ1lA8zFOuUBklLOTAgMcw4Z6N5YnqRXRzWnkHO-unr5g62msCAUHow-NmY7k=@willsroot.io/



Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ