[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM0EoM=99ufQSzbYZU=wz8fbYOQ2v+cMa7BX1EM6OHk+dBrE0Q@mail.gmail.com>
Date: Sat, 5 Jul 2025 09:52:05 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Cong Wang <xiyou.wangcong@...il.com>
Cc: netdev@...r.kernel.org, will@...lsroot.io, stephen@...workplumber.org,
Savino Dicanosa <savy@...t3mfailure.io>
Subject: Re: [Patch net 1/2] netem: Fix skb duplication logic to prevent
infinite loops
On Fri, Jul 4, 2025 at 8:48 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
>
> On Wed, Jul 02, 2025 at 11:04:22AM -0400, Jamal Hadi Salim wrote:
> > On Wed, Jul 2, 2025 at 10:12 AM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > >
> > > On Tue, Jul 1, 2025 at 9:57 PM Cong Wang <xiyou.wangcong@...il.com> wrote:
> > > >
> > > > On Tue, Jul 01, 2025 at 04:13:05PM -0700, Cong Wang wrote:
> > > > > diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> > > > > index fdd79d3ccd8c..33de9c3e4d1b 100644
> > > > > --- a/net/sched/sch_netem.c
> > > > > +++ b/net/sched/sch_netem.c
> > > > > @@ -460,7 +460,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch,
> > > > > skb->prev = NULL;
> > > > >
> > > > > /* Random duplication */
> > > > > - if (q->duplicate && q->duplicate >= get_crandom(&q->dup_cor, &q->prng))
> > > > > + if (tc_skb_cb(skb)->duplicate &&
> > > >
> > > > Oops, this is clearly should be !duplicate... It was lost during my
> > > > stupid copy-n-paste... Sorry for this mistake.
> > > >
> > >
> > > I understood you earlier, Cong. My view still stands:
> > > You are adding logic to a common data structure for a use case that
>
> You are exaggerating this. I only added 1 bit to the core data structure,
> the code logic remains in the netem, so it is contained within netem.
Try it out ;->
Here's an even simpler setup:
sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
netem_bug_test.o sec classifier/pass classid 1:1
sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
then:
ping -c 1 127.0.0.1
Note: there are other issues as well but i thought citing the ebpf one
was sufficient to get the point across.
>
> > > really makes no sense. The ROI is not good.
>
> Speaking of ROI, I think you need to look at the patch stats:
>
> William/Your patch:
> 1 file changed, 40 insertions(+)
>
> My patch:
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
ROI is not just about LOC. The consequences of a patch are also part
of that formula. And let's not forget the time spent so far debating
instead of plugging the hole.
>
> > > BTW: I am almost certain you will hit other issues when this goes out
> > > or when you actually start to test and then you will have to fix more
> > > spots.
> > >
> > Here's an example that breaks it:
> >
> > sudo tc qdisc add dev lo root handle 1: prio bands 3 priomap 0 0 0 0 0
> > 0 0 0 0 0 0 0 0 0 0 0
> > sudo tc filter add dev lo parent 1:0 protocol ip bpf obj
> > netem_bug_test.o sec classifier/pass classid 1:1
> > sudo tc qdisc add dev lo parent 1:1 handle 10: netem limit 4 duplicate 100%
> > sudo tc qdisc add dev lo parent 10: handle 30: netem gap 1 limit 4
> > duplicate 100% delay 1us reorder 100%
> >
> > And the ping 127.0.0.1 -c 1
> > I had to fix your patch for correctness (attached)
> >
> >
> > the ebpf prog is trivial - make it just return the classid or even zero.
>
> Interesting, are you sure this works before my patch?
>
> I don't intend to change any logic except closing the infinite loop. IOW,
> if it didn't work before, I don't expect to make it work with this patch,
> this patch merely fixes the infinite loop, which is sufficient as a bug fix.
> Otherwise it would become a feature improvement. (Don't get me wrong, I
> think this feature should be improved rather than simply forbidden, it just
> belongs to a different patch.)
A quick solution is what William had. I asked him to use ext_cb not
because i think it is a better solution but just so we can move
forward.
Agree that for a longer term we need a more generic solution as discussed ...
cheers,
jamal
Powered by blists - more mailing lists