[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1eb1715a-c1b0-4741-8d2d-f66adffcf91d@linux.dev>
Date: Mon, 12 Jan 2026 16:15:17 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Amery Hung <ameryhung@...il.com>
Cc: netdev@...r.kernel.org, alexei.starovoitov@...il.com, andrii@...nel.org,
daniel@...earbox.net, memxor@...il.com, martin.lau@...nel.org,
kpsingh@...nel.org, yonghong.song@...ux.dev, song@...nel.org,
haoluo@...gle.com, kernel-team@...a.com, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 10/16] bpf: Support lockless unlink when
freeing map or local storage
On 1/12/26 2:38 PM, Amery Hung wrote:
>> [ btw, a nit, I think it can use a better function name instead of
>> "lockless". This function still takes the lock if it can. ]
> Does using _nofail suffix make it more clear?
sgtm.
>
>>> sure how destroy() can hold b->lock in a way that causes map_free() to
>>> fail acquiring b->lock.
>> I recall ETIMEDOUT was mentioned to be the likely (only?) case here.
>> Assume the b->lock did fail in map_free here, there are >1 selem(s)
>> using the same b->lock. Is it always true that the selem that failed at
>> the b->lock in map_free() here must race with the very same selem in
>> destroy()?
This is still an open issue/question.
>>
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * Only let destroy() unlink from local_storage->list and do mem_uncharge
>>>>> + * as owner is guaranteed to be valid in destroy().
>>>>> + */
>>>>> + if (local_storage && caller == BPF_LOCAL_STORAGE_DESTROY) {
>>>> If I read here correctly, only bpf_local_storage_destroy() can do the
>>>> bpf_selem_free(). For example, if a bpf_sk_storage_map is going away,
>>>> the selem (which is memcg charged) will stay in the sk until the sk is
>>>> closed?
>>> You read it correctly and Yes there will be stale elements in
>>> local_storage->list.
>>>
>>> I would hope the unlink from local_storage part is doable from
>>> map_free() and destroy(), but it is hard to guarantee (1) mem_uncharge
>>> is done only once (2) while the owner is still valid in a lockless
>>> way.
>> This needs to be addressed. It cannot leave the selem lingering. At
>> least the selem should be freed for the common case (i.e., succeeds in
>> both locks). Lingering selem is ok in case of lock failure. Many sk can
>> be long-lived connections. The user space may want to recreate the map,
>> and it will be limited by the memcg.
> I think this is doable by maintaining a local memory charge in local storage.
>
> So, remove selem->size and have a copy of total selem memory charge in
> a new local_storage->selem_size. Update will be protected by
> local_storage->lock in common paths (not in bpf_selem_unlink_nofail).
> More specifically, charge in bpf_selem_link_storage_nolock() when a
> selem is going to be publicized. uncharge in
> bpf_selem_unlink_storage_nolock() when a selem is being deleted. Then,
> in destroy() we simply get the total amount to be uncharged from the
> owner from local_storage->selem_size.
Right, I had a similar thought before. Because of the nofail/lockless
consideration, I suspect the local_storage->selem_size will need to be
atomic. I am not sure if it is enough though, e.g. there is a debug/warn
on sk->sk_omem_alloc in __sk_destruct to ensure it is 0, so I stopped
thinking on it for now, but it could be a direction to explore.
If it is the case, it is another atomic in the destruct/map_free code
path. I am still open-minded on the nofail requirement for both locks,
but complexity is building up. Kumar has also commented that b->lock in
map_free should not fail. Regardless, I think lets get the nofail code
path for map_free() sorted out first. Then we can move on to handle this
case.
Powered by blists - more mailing lists