netdev - Re: [PATCH bpf-next v1] bpf, sockmap: Fix concurrency issues between memory charge and uncharge

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aCorf4Cq3Fuwiw2h@pop-os.localdomain>
Date: Sun, 18 May 2025 11:48:31 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: bpf@...r.kernel.org, John Fastabend <john.fastabend@...il.com>,
	Jakub Sitnicki <jakub@...udflare.com>,
	"David S. Miller" <davem@...emloft.net>,
	Eric Dumazet <edumazet@...gle.com>,
	Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
	Simon Horman <horms@...nel.org>,
	Cong Wang <cong.wang@...edance.com>,
	Alexei Starovoitov <ast@...nel.org>, netdev@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v1] bpf, sockmap: Fix concurrency issues between
 memory charge and uncharge

On Thu, May 08, 2025 at 02:24:22PM +0800, Jiayuan Chen wrote:
> Triggering WARN_ON_ONCE(sk->sk_forward_alloc) by running the following
> command, followed by pressing Ctrl-C after 2 seconds:
> ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
> '''
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 40 at net/ipv4/af_inet.c inet_sock_destruct
> 
> Call Trace:
> <TASK>
> __sk_destruct+0x46/0x222
> sk_psock_destroy+0x22f/0x242
> process_one_work+0x504/0x8a8
> ? process_one_work+0x39d/0x8a8
> ? __pfx_process_one_work+0x10/0x10
> ? worker_thread+0x44/0x2ae
> ? __list_add_valid_or_report+0x83/0xea
> ? srso_return_thunk+0x5/0x5f
> ? __list_add+0x45/0x52
> process_scheduled_works+0x73/0x82
> worker_thread+0x1ce/0x2ae
> '''
> 
> Reason:
> When we are in the backlog process, we allocate sk_msg and then perform
> the charge process. Meanwhile, in the user process context, the recvmsg()
> operation performs the uncharge process, leading to concurrency issues
> between them.
> 
> The charge process (2 functions):
> 1. sk_rmem_schedule(size) -> sk_forward_alloc increases by PAGE_SIZE
>                              multiples
> 2. sk_mem_charge(size)    -> sk_forward_alloc -= size
> 
> The uncharge process (sk_mem_uncharge()):
> 3. sk_forward_alloc += size
> 4. check if sk_forward_alloc > PAGE_SIZE
> 5. reclaim    -> sk_forward_alloc decreases, possibly becoming 0
> 
> Because the sk performing charge and uncharge is not locked
> (mainly because the backlog process does not lock the socket), therefore,
> steps 1 to 5 will execute concurrently as follows:
> 
> cpu0                                cpu1
> 1
>                                     3
>                                     4   --> sk_forward_alloc >= PAGE_SIZE
>                                     5   --> reclaim sk_forward_alloc
> 2 --> sk_forward_alloc may
>       become negative
> 
> Solution:
> 1. Add locking to the kfree_sk_msg() process, which is only called in the
>    user process context.
> 2. Integrate the charge process into sk_psock_create_ingress_msg() in the
>    backlog process and add locking.
> 3. Reuse the existing psock->ingress_lock.

Reusing the psock->ingress_lock looks weird to me, as it is intended for
locking ingress queue, at least at the time it was introduced.

And technically speaking, it is the sock lock which is supposed to serialize
socket charging.

So is there any better solution here?

Thanks.