netdev - Re: [PATCH net] net/sched: fq_pie: avoid stalls in fq_pie

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAM0EoM=o8EVdcpHP3FAVeoLOp=vrEY9C1h+_mi7CrkrnLLp3SQ@mail.gmail.com>
Date: Thu, 31 Aug 2023 07:17:34 -0400
From: Jamal Hadi Salim <jhs@...atatu.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Michal Kubiak <michal.kubiak@...el.com>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com, syzbot+e46fbd5289363464bc13@...kaller.appspotmail.com
Subject: Re: [PATCH net] net/sched: fq_pie: avoid stalls in fq_pie_timer()

On Wed, Aug 30, 2023 at 11:53 AM Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Wed, Aug 30, 2023 at 5:50 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> >
> > On Wed, Aug 30, 2023 at 11:00 AM Eric Dumazet <edumazet@...gle.com> wrote:
> > >
> > > On Wed, Aug 30, 2023 at 4:48 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
> > > >
> > > > On Wed, Aug 30, 2023 at 10:30 AM Michal Kubiak <michal.kubiak@...el.com> wrote:
> > > > >
> > > > > On Tue, Aug 29, 2023 at 12:35:41PM +0000, Eric Dumazet wrote:
> > > > > > When setting a high number of flows (limit being 65536),
> > > > > > fq_pie_timer() is currently using too much time as syzbot reported.
> > > > > >
> > > > > > Add logic to yield the cpu every 2048 flows (less than 150 usec
> > > > > > on debug kernels).
> > > > > > It should also help by not blocking qdisc fast paths for too long.
> > > > > > Worst case (65536 flows) would need 31 jiffies for a complete scan.
> > > > > >
> > > > > > Relevant extract from syzbot report:
> > > > > >
> > > > > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2663 jiffies s: 873 root: 0x1/.
> > > > > > rcu: blocking rcu_node structures (internal RCU debug):
> > > > > > Sending NMI from CPU 1 to CPUs 0:
> > > > > > NMI backtrace for cpu 0
> > > > > > CPU: 0 PID: 5177 Comm: syz-executor273 Not tainted 6.5.0-syzkaller-00453-g727dbda16b83 #0
> > > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
> > > > > > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
> > > > > > RIP: 0010:write_comp_data+0x21/0x90 kernel/kcov.c:236
> > > > > > Code: 2e 0f 1f 84 00 00 00 00 00 65 8b 05 01 b2 7d 7e 49 89 f1 89 c6 49 89 d2 81 e6 00 01 00 00 49 89 f8 65 48 8b 14 25 80 b9 03 00 <a9> 00 01 ff 00 74 0e 85 f6 74 59 8b 82 04 16 00 00 85 c0 74 4f 8b
> > > > > > RSP: 0018:ffffc90000007bb8 EFLAGS: 00000206
> > > > > > RAX: 0000000000000101 RBX: ffffc9000dc0d140 RCX: ffffffff885893b0
> > > > > > RDX: ffff88807c075940 RSI: 0000000000000100 RDI: 0000000000000001
> > > > > > RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> > > > > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000dc0d178
> > > > > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > > > > > FS:  0000555555d54380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00007f6b442f6130 CR3: 000000006fe1c000 CR4: 00000000003506f0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > Call Trace:
> > > > > >  <NMI>
> > > > > >  </NMI>
> > > > > >  <IRQ>
> > > > > >  pie_calculate_probability+0x480/0x850 net/sched/sch_pie.c:415
> > > > > >  fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387
> > > > > >  call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
> > > > > >
> > > > > > Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
> > > > > > Link: https://lore.kernel.org/lkml/00000000000017ad3f06040bf394@google.com/
> > > > > > Reported-by: syzbot+e46fbd5289363464bc13@...kaller.appspotmail.com
> > > > > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> > > > >
> > > > > The code logic and style looks good to me.
> > > > > However, I don't have experience with that code to estimate if 2048
> > > > > flows per round is enough to avoid stalls for all normal circumstances,
> > > > > so I guess someone else should take a look.
> > > > >
> > > >
> > > > Eric, I had the same question: Why 2048 (why not 12 for example? ;->).
> > > > Could that number make more sense to add as an init attribute? Not
> > > > asking you to add it but i or somebody else could send a followup
> > > > patch after.
> > >
> > > This is based on experimentation.
> > >
> > > I started using 1024, then saw that using 2048 was okay.
> > >
> > > I think I gave some numbers in the changelog :
> > > "(less than 150 usec on debug kernels)."
> > >
> > > Spending 150 usec every jiffie seems reasonable to me.
> > >
> > > Honestly, I am not sure if anyone was/is using a high number of flows,
> > > given that whole qdisc enqueue/dequeue operations were frozen every 15ms for
> > > 2 or 3 ms on non debug kernels :/
> >
> > Unfortunately such numbers tend to depend on the CPU used etc. Once
> > your patch goes in we can add an extension to set a netlink attribute
> > so the user can change this value (by default keep it at 2048).
> >
>
> Then syzbot will set this new attribute to 65536 and we are back to
> the initial bug.

Hail our new overlord and savior.

cheers,
jamal