[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <aCorf4Cq3Fuwiw2h@pop-os.localdomain>
Date: Sun, 18 May 2025 11:48:31 -0700
From: Cong Wang <xiyou.wangcong@...il.com>
To: Jiayuan Chen <jiayuan.chen@...ux.dev>
Cc: bpf@...r.kernel.org, John Fastabend <john.fastabend@...il.com>,
Jakub Sitnicki <jakub@...udflare.com>,
"David S. Miller" <davem@...emloft.net>,
Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Simon Horman <horms@...nel.org>,
Cong Wang <cong.wang@...edance.com>,
Alexei Starovoitov <ast@...nel.org>, netdev@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH bpf-next v1] bpf, sockmap: Fix concurrency issues between
memory charge and uncharge
On Thu, May 08, 2025 at 02:24:22PM +0800, Jiayuan Chen wrote:
> Triggering WARN_ON_ONCE(sk->sk_forward_alloc) by running the following
> command, followed by pressing Ctrl-C after 2 seconds:
> ./bench sockmap -c 2 -p 1 -a --rx-verdict-ingress
> '''
> ------------[ cut here ]------------
> WARNING: CPU: 2 PID: 40 at net/ipv4/af_inet.c inet_sock_destruct
>
> Call Trace:
> <TASK>
> __sk_destruct+0x46/0x222
> sk_psock_destroy+0x22f/0x242
> process_one_work+0x504/0x8a8
> ? process_one_work+0x39d/0x8a8
> ? __pfx_process_one_work+0x10/0x10
> ? worker_thread+0x44/0x2ae
> ? __list_add_valid_or_report+0x83/0xea
> ? srso_return_thunk+0x5/0x5f
> ? __list_add+0x45/0x52
> process_scheduled_works+0x73/0x82
> worker_thread+0x1ce/0x2ae
> '''
>
> Reason:
> When we are in the backlog process, we allocate sk_msg and then perform
> the charge process. Meanwhile, in the user process context, the recvmsg()
> operation performs the uncharge process, leading to concurrency issues
> between them.
>
> The charge process (2 functions):
> 1. sk_rmem_schedule(size) -> sk_forward_alloc increases by PAGE_SIZE
> multiples
> 2. sk_mem_charge(size) -> sk_forward_alloc -= size
>
> The uncharge process (sk_mem_uncharge()):
> 3. sk_forward_alloc += size
> 4. check if sk_forward_alloc > PAGE_SIZE
> 5. reclaim -> sk_forward_alloc decreases, possibly becoming 0
>
> Because the sk performing charge and uncharge is not locked
> (mainly because the backlog process does not lock the socket), therefore,
> steps 1 to 5 will execute concurrently as follows:
>
> cpu0 cpu1
> 1
> 3
> 4 --> sk_forward_alloc >= PAGE_SIZE
> 5 --> reclaim sk_forward_alloc
> 2 --> sk_forward_alloc may
> become negative
>
> Solution:
> 1. Add locking to the kfree_sk_msg() process, which is only called in the
> user process context.
> 2. Integrate the charge process into sk_psock_create_ingress_msg() in the
> backlog process and add locking.
> 3. Reuse the existing psock->ingress_lock.
Reusing the psock->ingress_lock looks weird to me, as it is intended for
locking ingress queue, at least at the time it was introduced.
And technically speaking, it is the sock lock which is supposed to serialize
socket charging.
So is there any better solution here?
Thanks.
Powered by blists - more mailing lists