lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lhR3z8brE3wSKO4PDITIAGXGGW8vnrt1zIPo7C10g2rH0zdQ1lA8zFOuUBklLOTAgMcw4Z6N5YnqRXRzWnkHO-unr5g62msCAUHow-NmY7k=@willsroot.io>
Date: Sun, 06 Jul 2025 14:59:11 +0000
From: William Liu <will@...lsroot.io>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Cong Wang <xiyou.wangcong@...il.com>, netdev@...r.kernel.org, stephen@...workplumber.org, Savino Dicanosa <savy@...t3mfailure.io>
Subject: Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent infinite loops

On Saturday, July 5th, 2025 at 1:52 PM, Jamal Hadi Salim <jhs@...atatu.com> wrote:

> 
> 
> On Fri, Jul 4, 2025 at 8:48 PM Cong Wang xiyou.wangcong@...il.com wrote:
> 
> > On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > 
> > > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim jhs@...atatu.com wrote:
> > > 
> > > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang xiyou.wangcong@...il.com wrote:
> > > > 
> > > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > 
> > > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > > --- a/net/sched/sch_netem.c
> > > > > > +++ b/net/sched/sch_netem.c
> > > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > > skb->prev = NULL;
> > > > > > 
> > > > > > /* Random duplication */
> > > > > > - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > > + if (tc_skb_cb(skb)->duplicate &&
> > > > > 
> > > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > > stupid copy-n-paste... Sorry for this mistake.
> > > > 
> > > > I understood you earlier, Cong. My view still stands:
> > > > You are adding logic to a common data structure for a use case that
> > 
> > You are exaggerating this. I only added 1 bit to the core data structure,
> > the code logic remains in the netem, so it is contained within netem.
> 
> 
> Try it out ;->
> 
> Here's an even simpler setup:
> 
> sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> 0 0 0 0 0 0 0 0 0 0 0
> sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> netem_bug_test.o sec classifier/pass classid 1:1
> sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> then:
> ping -c 1 127.0.0.1
> 
> Note: there are other issues as well but i thought citing the ebpf one
> was sufficient to get the point across.
> 
> > > > really makes no sense. The ROI is not good.
> > 
> > Speaking of ROI, I think you need to look at the patch stats:
> > 
> > William/Your patch:
> > 1 file changed, 40 insertions(+)
> > 
> > My patch:
> > 2 files changed, 4 insertions(+), 4 deletions(-)
> 
> 
> ROI is not just about LOC. The consequences of a patch are also part
> of that formula. And let's not forget the time spent so far debating
> instead of plugging the hole.
> 
> > > > BTW: I am almost certain you will hit other issues when this goes out
> > > > or when you actually start to test and then you will have to fix more
> > > > spots.
> > > 
> > > Here's an example that breaks it:
> > > 
> > > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > > 0 0 0 0 0 0 0 0 0 0 0
> > > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > > netem_bug_test.o sec classifier/pass classid 1:1
> > > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > > duplicate 100% delay 1us reorder 100%
> > > 
> > > And the ping 127.0.0.1 -c 1
> > > I had to fix your patch for correctness (attached)
> > > 
> > > the ebpf prog is trivial - make it just return the classid or even zero.
> > 
> > Interesting, are you sure this works before my patch?
> > 
> > I don't intend to change any logic except closing the infinite loop. IOW,
> > if it didn't work before, I don't expect to make it work with this patch,
> > this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> > Otherwise it would become a feature improvement. (Don't get me wrong, I
> > think this feature should be improved rather than simply forbidden, it just
> > belongs to a different patch.)
> 
> 
> A quick solution is what William had. I asked him to use ext_cb not
> because i think it is a better solution but just so we can move
> forward.
> Agree that for a longer term we need a more generic solution as discussed ...
> 
> cheers,
> jamal

The tc_skb_ext approach has a problem... the config option that enables it is NET_TC_SKB_EXT. I assumed this is a generic name for skb extensions in the tc subsystem, but unfortunately this is hardcoded for NET_CLS_ACT recirculation support.

So what this means is we have the following choices:
1. Make SCH_NETEM depend on NET_CLS_ACT and NET_TC_SKB_EXT
2. Add "|| IS_ENABLED(CONFIG_SCH_NETEM)" next to "IS_ENABLED(CONFIG_NET_TC_SKB_EXT)"
3. Separate NET_TC_SKB_EXT and the idea of recirculation support. But I'm not sure how people feel about renaming config options. And this would require a small change to the Mellanox driver subsystem.

None of these sound too nice to do, and I'm not sure which approach to take. In an ideal world, 3 would be best, but I'm not sure how others would feel about all that just to account for a netem edge case.

Best,
William

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ