netdev - Re: [PATCH net] net/sched: fq_pie: avoid stalls in fq_pie

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANn89iLf6+6679LrTV-c2XZWMEeRc3O0N+++yKNqQUxQzjspJw@mail.gmail.com>
Date: Wed, 30 Aug 2023 17:00:25 +0200
From: Eric Dumazet <edumazet@...gle.com>
To: Jamal Hadi Salim <jhs@...atatu.com>
Cc: Michal Kubiak <michal.kubiak@...el.com>, "David S . Miller" <davem@...emloft.net>, 
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>, 
	Cong Wang <xiyou.wangcong@...il.com>, Jiri Pirko <jiri@...nulli.us>, netdev@...r.kernel.org, 
	eric.dumazet@...il.com, syzbot+e46fbd5289363464bc13@...kaller.appspotmail.com
Subject: Re: [PATCH net] net/sched: fq_pie: avoid stalls in fq_pie_timer()

On Wed, Aug 30, 2023 at 4:48 PM Jamal Hadi Salim <jhs@...atatu.com> wrote:
>
> On Wed, Aug 30, 2023 at 10:30 AM Michal Kubiak <michal.kubiak@...el.com> wrote:
> >
> > On Tue, Aug 29, 2023 at 12:35:41PM +0000, Eric Dumazet wrote:
> > > When setting a high number of flows (limit being 65536),
> > > fq_pie_timer() is currently using too much time as syzbot reported.
> > >
> > > Add logic to yield the cpu every 2048 flows (less than 150 usec
> > > on debug kernels).
> > > It should also help by not blocking qdisc fast paths for too long.
> > > Worst case (65536 flows) would need 31 jiffies for a complete scan.
> > >
> > > Relevant extract from syzbot report:
> > >
> > > rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-.... } 2663 jiffies s: 873 root: 0x1/.
> > > rcu: blocking rcu_node structures (internal RCU debug):
> > > Sending NMI from CPU 1 to CPUs 0:
> > > NMI backtrace for cpu 0
> > > CPU: 0 PID: 5177 Comm: syz-executor273 Not tainted 6.5.0-syzkaller-00453-g727dbda16b83 #0
> > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/26/2023
> > > RIP: 0010:check_kcov_mode kernel/kcov.c:173 [inline]
> > > RIP: 0010:write_comp_data+0x21/0x90 kernel/kcov.c:236
> > > Code: 2e 0f 1f 84 00 00 00 00 00 65 8b 05 01 b2 7d 7e 49 89 f1 89 c6 49 89 d2 81 e6 00 01 00 00 49 89 f8 65 48 8b 14 25 80 b9 03 00 <a9> 00 01 ff 00 74 0e 85 f6 74 59 8b 82 04 16 00 00 85 c0 74 4f 8b
> > > RSP: 0018:ffffc90000007bb8 EFLAGS: 00000206
> > > RAX: 0000000000000101 RBX: ffffc9000dc0d140 RCX: ffffffff885893b0
> > > RDX: ffff88807c075940 RSI: 0000000000000100 RDI: 0000000000000001
> > > RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> > > R10: 0000000000000000 R11: 0000000000000000 R12: ffffc9000dc0d178
> > > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> > > FS:  0000555555d54380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007f6b442f6130 CR3: 000000006fe1c000 CR4: 00000000003506f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > >  <NMI>
> > >  </NMI>
> > >  <IRQ>
> > >  pie_calculate_probability+0x480/0x850 net/sched/sch_pie.c:415
> > >  fq_pie_timer+0x1da/0x4f0 net/sched/sch_fq_pie.c:387
> > >  call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700
> > >
> > > Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
> > > Link: https://lore.kernel.org/lkml/00000000000017ad3f06040bf394@google.com/
> > > Reported-by: syzbot+e46fbd5289363464bc13@...kaller.appspotmail.com
> > > Signed-off-by: Eric Dumazet <edumazet@...gle.com>
> >
> > The code logic and style looks good to me.
> > However, I don't have experience with that code to estimate if 2048
> > flows per round is enough to avoid stalls for all normal circumstances,
> > so I guess someone else should take a look.
> >
>
> Eric, I had the same question: Why 2048 (why not 12 for example? ;->).
> Could that number make more sense to add as an init attribute? Not
> asking you to add it but i or somebody else could send a followup
> patch after.

This is based on experimentation.

I started using 1024, then saw that using 2048 was okay.

I think I gave some numbers in the changelog :
"(less than 150 usec on debug kernels)."

Spending 150 usec every jiffie seems reasonable to me.

Honestly, I am not sure if anyone was/is using a high number of flows,
given that whole qdisc enqueue/dequeue operations were frozen every 15ms for
2 or 3 ms on non debug kernels :/



> Other than that:
>
> Reviewed-by: Jamal Hadi Salim <jhs@...atatu.com>
>
> cheers,
> jamal
>
> > Thanks,
> > Reviewed-by: Michal Kubiak <michal.kubiak@...el.com>
> >