[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANn89iKN6Bkm96UUvchq99vr1J_SbHWY7D0DFD3RXU4o74J7qA@mail.gmail.com>
Date: Thu, 18 Jan 2024 13:14:02 +0100
From: Eric Dumazet <edumazet@...gle.com>
To: Paolo Abeni <pabeni@...hat.com>
Cc: Jozsef Kadlecsik <kadlec@...filter.org>, Pablo Neira Ayuso <pablo@...filter.org>,
netfilter-devel@...r.kernel.org, davem@...emloft.net, netdev@...r.kernel.org,
kuba@...nel.org, fw@...len.de
Subject: Re: [PATCH net 14/14] netfilter: ipset: fix performance regression in
swap operation
On Thu, Jan 18, 2024 at 12:08 PM Paolo Abeni <pabeni@...hat.com> wrote:
>
> On Wed, 2024-01-17 at 17:28 +0100, Eric Dumazet wrote:
> > On Wed, Jan 17, 2024 at 5:23 PM Jozsef Kadlecsik <kadlec@...filter.org> wrote:
> > >
> > > Hi,
> > >
> > > On Wed, 17 Jan 2024, Eric Dumazet wrote:
> > >
> > > > On Wed, Jan 17, 2024 at 5:00 PM Pablo Neira Ayuso <pablo@...filter.org> wrote:
> > > > >
> > > > > From: Jozsef Kadlecsik <kadlec@...filter.org>
> > > > >
> > > > > The patch "netfilter: ipset: fix race condition between swap/destroy
> > > > > and kernel side add/del/test", commit 28628fa9 fixes a race condition.
> > > > > But the synchronize_rcu() added to the swap function unnecessarily slows
> > > > > it down: it can safely be moved to destroy and use call_rcu() instead.
> > > > > Thus we can get back the same performance and preventing the race condition
> > > > > at the same time.
> > > >
> > > > ...
> > > >
> > > > >
> > > > > @@ -2357,6 +2369,9 @@ ip_set_net_exit(struct net *net)
> > > > >
> > > > > inst->is_deleted = true; /* flag for ip_set_nfnl_put */
> > > > >
> > > > > + /* Wait for call_rcu() in destroy */
> > > > > + rcu_barrier();
> > > > > +
> > > > > nfnl_lock(NFNL_SUBSYS_IPSET);
> > > > > for (i = 0; i < inst->ip_set_max; i++) {
> > > > > set = ip_set(inst, i);
> > > > > --
> > > > > 2.30.2
> > > > >
> > > >
> > > > If I am reading this right, time for netns dismantles will increase,
> > > > even for netns not using ipset
> > > >
> > > > If there is no other option, please convert "struct pernet_operations
> > > > ip_set_net_ops".exit to an exit_batch() handler,
> > > > to at least have a factorized rcu_barrier();
> > >
> > > You are right, the call to rcu_barrier() can safely be moved to
> > > ip_set_fini(). I'm going to prepare a new version of the patch.
> > >
> > > Thanks for catching it.
> >
> > I do not want to hold the series, your fix can be built as another
> > patch on top of this one.
>
> Given the timing, if we merge this series as is, it could go very soon
> into Linus' tree. I think it would be better to avoid introducing known
> regressions there.
>
> Any strong opinions vs holding this series until the problems are
> fixed? Likely a new PR will be required.
>
If this helps, here is one splat (using linux-next next-20240118)
BUG: sleeping function called from invalid context at kernel/workqueue.c:3348
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 22194, name:
syz-executor.0
preempt_count: 101, expected: 0
RCU nest depth: 0, expected: 0
3 locks held by syz-executor.0/22194:
#0: ffff8880162a2420 (sb_writers#5){.+.+}-{0:0}, at:
ksys_write+0x12f/0x260 fs/read_write.c:643
#1: ffff888042b48960 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
inode_lock include/linux/fs.h:802 [inline]
#1: ffff888042b48960 (&sb->s_type->i_mutex_key#12){+.+.}-{3:3}, at:
shmem_file_write_iter+0x8c/0x140 mm/shmem.c:2883
#2: ffffffff8d5aeb00 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire
include/linux/rcupdate.h:298 [inline]
#2: ffffffff8d5aeb00 (rcu_callback){....}-{0:0}, at: rcu_do_batch
kernel/rcu/tree.c:2152 [inline]
#2: ffffffff8d5aeb00 (rcu_callback){....}-{0:0}, at:
rcu_core+0x7cc/0x16b0 kernel/rcu/tree.c:2433
Preemption disabled at:
[<ffffffff813c061e>] unwind_next_frame+0xce/0x2390
arch/x86/kernel/unwind_orc.c:479
CPU: 0 PID: 22194 Comm: syz-executor.0 Not tainted
6.7.0-next-20240118-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 11/17/2023
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x125/0x1b0 lib/dump_stack.c:106
__might_resched+0x3c0/0x5e0 kernel/sched/core.c:10176
start_flush_work kernel/workqueue.c:3348 [inline]
__flush_work+0x11f/0xa20 kernel/workqueue.c:3410
__cancel_work_timer+0x3f3/0x590 kernel/workqueue.c:3498
hash_ipmac6_destroy+0x337/0x420 net/netfilter/ipset/ip_set_hash_gen.h:454
ip_set_destroy_set+0x65/0x100 net/netfilter/ipset/ip_set_core.c:1180
rcu_do_batch kernel/rcu/tree.c:2158 [inline]
rcu_core+0x828/0x16b0 kernel/rcu/tree.c:2433
__do_softirq+0x218/0x8de kernel/softirq.c:553
invoke_softirq kernel/softirq.c:427 [inline]
__irq_exit_rcu kernel/softirq.c:632 [inline]
irq_exit_rcu+0xb9/0x120 kernel/softirq.c:644
sysvec_apic_timer_interrupt+0x95/0xb0 arch/x86/kernel/apic/apic.c:1076
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:649
RIP: 0010:on_stack arch/x86/include/asm/stacktrace.h:59 [inline]
Powered by blists - more mailing lists