netdev - Re: [PATCH bpf-next v3 10/16] bpf: Support lockless unlink when freeing map or local storage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <1eb1715a-c1b0-4741-8d2d-f66adffcf91d@linux.dev>
Date: Mon, 12 Jan 2026 16:15:17 -0800
From: Martin KaFai Lau <martin.lau@...ux.dev>
To: Amery Hung <ameryhung@...il.com>
Cc: netdev@...r.kernel.org, alexei.starovoitov@...il.com, andrii@...nel.org,
 daniel@...earbox.net, memxor@...il.com, martin.lau@...nel.org,
 kpsingh@...nel.org, yonghong.song@...ux.dev, song@...nel.org,
 haoluo@...gle.com, kernel-team@...a.com, bpf@...r.kernel.org
Subject: Re: [PATCH bpf-next v3 10/16] bpf: Support lockless unlink when
 freeing map or local storage



On 1/12/26 2:38 PM, Amery Hung wrote:
>> [ btw, a nit, I think it can use a better function name instead of
>> "lockless". This function still takes the lock if it can. ]
> Does using _nofail suffix make it more clear?

sgtm.

> 
>>> sure how destroy() can hold b->lock in a way that causes map_free() to
>>> fail acquiring b->lock.
>> I recall ETIMEDOUT was mentioned to be the likely (only?) case here.
>> Assume the b->lock did fail in map_free here, there are >1 selem(s)
>> using the same b->lock. Is it always true that the selem that failed at
>> the b->lock in map_free() here must race with the very same selem in
>> destroy()?

This is still an open issue/question.

>>
>>>>> +             }
>>>>> +     }
>>>>> +
>>>>> +     /*
>>>>> +      * Only let destroy() unlink from local_storage->list and do mem_uncharge
>>>>> +      * as owner is guaranteed to be valid in destroy().
>>>>> +      */
>>>>> +     if (local_storage && caller == BPF_LOCAL_STORAGE_DESTROY) {
>>>> If I read here correctly, only bpf_local_storage_destroy() can do the
>>>> bpf_selem_free(). For example, if a bpf_sk_storage_map is going away,
>>>> the selem (which is memcg charged) will stay in the sk until the sk is
>>>> closed?
>>> You read it correctly and Yes there will be stale elements in
>>> local_storage->list.
>>>
>>> I would hope the unlink from local_storage part is doable from
>>> map_free() and destroy(), but it is hard to guarantee (1) mem_uncharge
>>> is done only once (2) while the owner is still valid in a lockless
>>> way.
>> This needs to be addressed. It cannot leave the selem lingering. At
>> least the selem should be freed for the common case (i.e., succeeds in
>> both locks). Lingering selem is ok in case of lock failure. Many sk can
>> be long-lived connections. The user space may want to recreate the map,
>> and it will be limited by the memcg.
> I think this is doable by maintaining a local memory charge in local storage.
> 
> So, remove selem->size and have a copy of total selem memory charge in
> a new local_storage->selem_size. Update will be protected by
> local_storage->lock in common paths (not in bpf_selem_unlink_nofail).
> More specifically, charge in bpf_selem_link_storage_nolock() when a
> selem is going to be publicized. uncharge in
> bpf_selem_unlink_storage_nolock() when a selem is being deleted. Then,
> in destroy() we simply get the total amount to be uncharged from the
> owner from local_storage->selem_size.

Right, I had a similar thought before. Because of the nofail/lockless 
consideration, I suspect the local_storage->selem_size will need to be 
atomic. I am not sure if it is enough though, e.g. there is a debug/warn 
on sk->sk_omem_alloc in __sk_destruct to ensure it is 0, so I stopped 
thinking on it for now, but it could be a direction to explore.

If it is the case, it is another atomic in the destruct/map_free code 
path. I am still open-minded on the nofail requirement for both locks, 
but complexity is building up. Kumar has also commented that b->lock in 
map_free should not fail. Regardless, I think lets get the nofail code 
path for map_free() sorted out first. Then we can move on to handle this 
case.