[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <2c3bec7a-812c-0a65-f8c1-b9749430adba@linux.dev>
Date: Fri, 28 Jul 2023 09:16:32 -0700
From: Yonghong Song <yonghong.song@...ux.dev>
To: Jiri Olsa <olsajiri@...il.com>, tglozar@...hat.com
Cc: linux-kernel@...r.kernel.org, john.fastabend@...il.com,
jakub@...udflare.com, davem@...emloft.net, edumazet@...gle.com,
kuba@...nel.org, pabeni@...hat.com, netdev@...r.kernel.org,
bpf@...r.kernel.org
Subject: Re: [PATCH net] bpf: sockmap: Remove preempt_disable in
sock_map_sk_acquire
On 7/28/23 4:48 AM, Jiri Olsa wrote:
> On Fri, Jul 28, 2023 at 08:44:11AM +0200, tglozar@...hat.com wrote:
>> From: Tomas Glozar <tglozar@...hat.com>
>>
>> Disabling preemption in sock_map_sk_acquire conflicts with GFP_ATOMIC
>> allocation later in sk_psock_init_link on PREEMPT_RT kernels, since
>> GFP_ATOMIC might sleep on RT (see bpf: Make BPF and PREEMPT_RT co-exist
>> patchset notes for details).
>>
>> This causes calling bpf_map_update_elem on BPF_MAP_TYPE_SOCKMAP maps to
>> BUG (sleeping function called from invalid context) on RT kernels.
>>
>> preempt_disable was introduced together with lock_sk and rcu_read_lock
>> in commit 99ba2b5aba24e ("bpf: sockhash, disallow bpf_tcp_close and update
>> in parallel"), probably to match disabled migration of BPF programs, and
>> is no longer necessary.
>>
>> Remove preempt_disable to fix BUG in sock_map_update_common on RT.
>
> FYI, I'm not sure it's related but I started to see following splat recently:
>
> [ 189.360689][ T658] =============================
> [ 189.361149][ T658] [ BUG: Invalid wait context ]
> [ 189.361588][ T658] 6.5.0-rc2+ #589 Tainted: G OE
> [ 189.362174][ T658] -----------------------------
> [ 189.362660][ T658] test_progs/658 is trying to lock:
> [ 189.363176][ T658] ffff8881702652b8 (&psock->link_lock){....}-{3:3}, at: sock_map_update_common+0x1c4/0x340
> [ 189.364152][ T658] other info that might help us debug this:
> [ 189.364689][ T658] context-{5:5}
> [ 189.365021][ T658] 3 locks held by test_progs/658:
> [ 189.365508][ T658] #0: ffff888177611a80 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_map_update_elem_sys+0x82/0x260
> [ 189.366503][ T658] #1: ffffffff835a3180 (rcu_read_lock){....}-{1:3}, at: sock_map_update_elem_sys+0x78/0x260
> [ 189.367470][ T658] #2: ffff88816cf19240 (&stab->lock){+...}-{2:2}, at: sock_map_update_common+0x12a/0x340
> [ 189.368420][ T658] stack backtrace:
> [ 189.368806][ T658] CPU: 0 PID: 658 Comm: test_progs Tainted: G OE 6.5.0-rc2+ #589 98af30b3c42d747b51da05f1d0e4899e394be6c9
> [ 189.369889][ T658] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014
> [ 189.370736][ T658] Call Trace:
> [ 189.371063][ T658] <TASK>
> [ 189.371365][ T658] dump_stack_lvl+0xb2/0x120
> [ 189.371798][ T658] __lock_acquire+0x9ad/0x2470
> [ 189.372243][ T658] ? lock_acquire+0x104/0x350
> [ 189.372680][ T658] lock_acquire+0x104/0x350
> [ 189.373104][ T658] ? sock_map_update_common+0x1c4/0x340
> [ 189.373615][ T658] ? find_held_lock+0x32/0x90
> [ 189.374074][ T658] ? sock_map_update_common+0x12a/0x340
> [ 189.374587][ T658] _raw_spin_lock_bh+0x38/0x80
> [ 189.375060][ T658] ? sock_map_update_common+0x1c4/0x340
> [ 189.375571][ T658] sock_map_update_common+0x1c4/0x340
> [ 189.376118][ T658] sock_map_update_elem_sys+0x184/0x260
> [ 189.376704][ T658] __sys_bpf+0x181f/0x2840
> [ 189.377147][ T658] __x64_sys_bpf+0x1a/0x30
> [ 189.377556][ T658] do_syscall_64+0x38/0x90
> [ 189.377980][ T658] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
> [ 189.378473][ T658] RIP: 0033:0x7fe52f47ab5d
>
> the patch did not help with that
I think the above splat is not related to this patch. In function
sock_map_update_common func we have
raw_spin_lock_bh(&stab->lock);
sock_map_add_link(psock, link, map, &stab->sks[idx]);
spin_lock_bh(&psock->link_lock);
...
spin_unlock_bh(&psock->link_lock);
raw_spin_unlock_bh(&stab->lock);
I think you probably have CONFIG_PROVE_RAW_LOCK_NESTING turned on
in your config.
In the above case, for RT kernel, spin_lock_bh will become
'mutex' and it is sleepable, while raw_spin_lock_bh remains
to be a spin lock. The warning is about potential
locking violation with RT kernel.
To fix the issue, you can convert spin_lock_bh to raw_spin_lock_bh
to silence the warning.
>
> jirka
>
>>
>> Signed-off-by: Tomas Glozar <tglozar@...hat.com>
>> ---
>> net/core/sock_map.c | 2 --
>> 1 file changed, 2 deletions(-)
>>
>> diff --git a/net/core/sock_map.c b/net/core/sock_map.c
>> index 19538d628714..08ab108206bf 100644
>> --- a/net/core/sock_map.c
>> +++ b/net/core/sock_map.c
>> @@ -115,7 +115,6 @@ static void sock_map_sk_acquire(struct sock *sk)
>> __acquires(&sk->sk_lock.slock)
>> {
>> lock_sock(sk);
>> - preempt_disable();
>> rcu_read_lock();
>> }
>>
>> @@ -123,7 +122,6 @@ static void sock_map_sk_release(struct sock *sk)
>> __releases(&sk->sk_lock.slock)
>> {
>> rcu_read_unlock();
>> - preempt_enable();
>> release_sock(sk);
>> }
>>
>> --
>> 2.39.3
>>
>>
>
Powered by blists - more mailing lists