lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQK_FptUD17REjtT1wnRyxZ2dx6sZuePsJQES-q27NKKLA@mail.gmail.com>
Date: Sat, 9 Nov 2024 16:04:45 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Hou Tao <houtao@...weicloud.com>
Cc: Kunwu Chan <kunwu.chan@...ux.dev>, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>, 
	Martin KaFai Lau <martin.lau@...ux.dev>, Eddy Z <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>, 
	KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, 
	Jiri Olsa <jolsa@...nel.org>, Sebastian Sewior <bigeasy@...utronix.de>, clrkwllms@...nel.org, 
	Steven Rostedt <rostedt@...dmis.org>, Thomas Gleixner <tglx@...utronix.de>, bpf <bpf@...r.kernel.org>, 
	LKML <linux-kernel@...r.kernel.org>, linux-rt-devel@...ts.linux.dev, 
	Kunwu Chan <chentao@...inos.cn>, syzbot+b506de56cbbb63148c33@...kaller.appspotmail.com
Subject: Re: [PATCH] bpf: Convert lpm_trie::lock to 'raw_spinlock_t'

On Fri, Nov 8, 2024 at 6:46 PM Hou Tao <houtao@...weicloud.com> wrote:
>
> Hi Alexei,
>
> On 11/9/2024 4:22 AM, Alexei Starovoitov wrote:
> > On Thu, Nov 7, 2024 at 10:32 PM Kunwu Chan <kunwu.chan@...ux.dev> wrote:
> >> From: Kunwu Chan <chentao@...inos.cn>
> >>
> >> When PREEMPT_RT is enabled, 'spinlock_t' becomes preemptible
> >> and bpf program has owned a raw_spinlock under a interrupt handler,
> >> which results in invalid lock acquire context.
> >>
> >> [ BUG: Invalid wait context ]
> >> 6.12.0-rc5-next-20241031-syzkaller #0 Not tainted
> >> -----------------------------
> >> swapper/0/0 is trying to lock:
> >> ffff8880261e7a00 (&trie->lock){....}-{3:3},
> >> at: trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462
> >> other info that might help us debug this:
> >> context-{3:3}
> >> 5 locks held by swapper/0/0:
> >>  #0: ffff888020bb75c8 (&vp_dev->lock){-...}-{3:3},
> >> at: vp_vring_interrupt drivers/virtio/virtio_pci_common.c:80 [inline]
> >>  #0: ffff888020bb75c8 (&vp_dev->lock){-...}-{3:3},
> >> at: vp_interrupt+0x142/0x200 drivers/virtio/virtio_pci_common.c:113
> >>  #1: ffff88814174a120 (&vb->stop_update_lock){-...}-{3:3},
> >> at: spin_lock include/linux/spinlock.h:351 [inline]
> >>  #1: ffff88814174a120 (&vb->stop_update_lock){-...}-{3:3},
> >> at: stats_request+0x6f/0x230 drivers/virtio/virtio_balloon.c:438
> >>  #2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
> >>  #2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
> >>  #2: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: __queue_work+0x199/0xf50 kernel/workqueue.c:2259
> >>  #3: ffff8880b863dd18 (&pool->lock){-.-.}-{2:2},
> >> at: __queue_work+0x759/0xf50
> >>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: rcu_lock_acquire include/linux/rcupdate.h:337 [inline]
> >>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: rcu_read_lock include/linux/rcupdate.h:849 [inline]
> >>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: __bpf_trace_run kernel/trace/bpf_trace.c:2339 [inline]
> >>  #4: ffffffff8e939f20 (rcu_read_lock){....}-{1:3},
> >> at: bpf_trace_run1+0x1d6/0x520 kernel/trace/bpf_trace.c:2380
> >> stack backtrace:
> >> CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted
> >> 6.12.0-rc5-next-20241031-syzkaller #0
> >> Hardware name: Google Compute Engine/Google Compute Engine,
> >> BIOS Google 09/13/2024
> >> Call Trace:
> >>  <IRQ>
> >>  __dump_stack lib/dump_stack.c:94 [inline]
> >>  dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
> >>  print_lock_invalid_wait_context kernel/locking/lockdep.c:4826 [inline]
> >>  check_wait_context kernel/locking/lockdep.c:4898 [inline]
> >>  __lock_acquire+0x15a8/0x2100 kernel/locking/lockdep.c:5176
> >>  lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5849
> >>  __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >>  _raw_spin_lock_irqsave+0xd5/0x120 kernel/locking/spinlock.c:162
> >>  trie_delete_elem+0x96/0x6a0 kernel/bpf/lpm_trie.c:462
> > This trace is from non-RT kernel where spin_lock == raw_spin_lock.
>
> Yes. However, I think the reason for the warning is that lockdep
> considers the case is possible under PREEMPT_RT and it violates the rule
> of lock [1].
>
> [1]:
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=560af5dc839eef08a273908f390cfefefb82aa04
> >
> > I don't think Hou's explanation earlier is correct.
> > https://lore.kernel.org/bpf/e14d8882-4760-7c9c-0cfc-db04eda494ee@huaweicloud.com/
>
> OK. Is the bpf mem allocator part OK for you ?
> >
> >>  bpf_prog_2c29ac5cdc6b1842+0x43/0x47
> >>  bpf_dispatcher_nop_func include/linux/bpf.h:1290 [inline]
> >>  __bpf_prog_run include/linux/filter.h:701 [inline]
> >>  bpf_prog_run include/linux/filter.h:708 [inline]
> >>  __bpf_trace_run kernel/trace/bpf_trace.c:2340 [inline]
> >>  bpf_trace_run1+0x2ca/0x520 kernel/trace/bpf_trace.c:2380
> >>  trace_workqueue_activate_work+0x186/0x1f0 include/trace/events/workqueue.h:59
> >>  __queue_work+0xc7b/0xf50 kernel/workqueue.c:2338
> >>  queue_work_on+0x1c2/0x380 kernel/workqueue.c:2390
> > here irqs are disabled, but raw_spin_lock in lpm should be fine.
> >
> >>  queue_work include/linux/workqueue.h:662 [inline]
> >>  stats_request+0x1a3/0x230 drivers/virtio/virtio_balloon.c:441
> >>  vring_interrupt+0x21d/0x380 drivers/virtio/virtio_ring.c:2595
> >>  vp_vring_interrupt drivers/virtio/virtio_pci_common.c:82 [inline]
> >>  vp_interrupt+0x192/0x200 drivers/virtio/virtio_pci_common.c:113
> >>  __handle_irq_event_percpu+0x29a/0xa80 kernel/irq/handle.c:158
> >>  handle_irq_event_percpu kernel/irq/handle.c:193 [inline]
> >>  handle_irq_event+0x89/0x1f0 kernel/irq/handle.c:210
> >>  handle_fasteoi_irq+0x48a/0xae0 kernel/irq/chip.c:720
> >>  generic_handle_irq_desc include/linux/irqdesc.h:173 [inline]
> >>  handle_irq arch/x86/kernel/irq.c:247 [inline]
> >>  call_irq_handler arch/x86/kernel/irq.c:259 [inline]
> >>  __common_interrupt+0x136/0x230 arch/x86/kernel/irq.c:285
> >>  common_interrupt+0xb4/0xd0 arch/x86/kernel/irq.c:278
> >>  </IRQ>
> >>
> >> Reported-by: syzbot+b506de56cbbb63148c33@...kaller.appspotmail.com
> >> Closes: https://lore.kernel.org/bpf/6723db4a.050a0220.35b515.0168.GAE@google.com/
> >> Fixes: 66150d0dde03 ("bpf, lpm: Make locking RT friendly")
> >> Signed-off-by: Kunwu Chan <chentao@...inos.cn>
> >> ---
> >>  kernel/bpf/lpm_trie.c | 12 ++++++------
> >>  1 file changed, 6 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> >> index 9b60eda0f727..373cdcfa0505 100644
> >> --- a/kernel/bpf/lpm_trie.c
> >> +++ b/kernel/bpf/lpm_trie.c
> >> @@ -35,7 +35,7 @@ struct lpm_trie {
> >>         size_t                          n_entries;
> >>         size_t                          max_prefixlen;
> >>         size_t                          data_size;
> >> -       spinlock_t                      lock;
> >> +       raw_spinlock_t                  lock;
> >>  };
> > We're certainly not going back.
>
> Only switching from spinlock_t to raw_spinlock_t is not enough, running
> it under PREEMPT_RT after the change will still trigger the similar
> lockdep warning. That is because kmalloc() may acquire a spinlock_t as
> well. However, after changing the kmalloc and its variants to bpf memory
> allocator, I think the switch to raw_spinlock_t will be safe. I have
> already written a draft patch set. Will post after after polishing and
> testing it. WDYT ?

Switching lpm to bpf_mem_alloc would address the issue.
Why do you want a switch to raw_spin_lock as well?
kfree_rcu() is already done outside of the lock.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ