lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <d191084a-4ab4-8269-640f-1ecf269418a6@huaweicloud.com>
Date: Sat, 8 Feb 2025 18:16:59 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Toke Høiland-Jørgensen <toke@...nel.org>,
 bpf@...r.kernel.org, rcu@...r.kernel.org
Cc: linux-kernel@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
 Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
 Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman
 <eddyz87@...il.com>, Song Liu <song@...nel.org>,
 Yonghong Song <yonghong.song@...ux.dev>,
 John Fastabend <john.fastabend@...il.com>, KP Singh <kpsingh@...nel.org>,
 Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
 Jiri Olsa <jolsa@...nel.org>, "Paul E . McKenney" <paulmck@...nel.org>,
 Cody Haas <chaas@...tgames.com>, Hou Tao <hotforest@...il.com>
Subject: Re: [RESEND] [PATCH bpf-next 2/3] bpf: Overwrite the element in hash
 map atomically

Hi Toke,

On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote:
> Hou Tao <houtao@...weicloud.com> writes:
>
>> +cc Cody Haas
>>
>> Sorry for the resend. I sent the reply in the HTML format.
>>
>> On 2/4/2025 4:28 PM, Hou Tao wrote:
>>> Currently, the update of existing element in hash map involves two
>>> steps:
>>> 1) insert the new element at the head of the hash list
>>> 2) remove the old element
>>>
>>> It is possible that the concurrent lookup operation may fail to find
>>> either the old element or the new element if the lookup operation starts
>>> before the addition and continues after the removal.
>>>
>>> Therefore, replacing the two-step update with an atomic update. After
>>> the change, the update will be atomic in the perspective of the lookup
>>> operation: it will either find the old element or the new element.
>>>
>>> Signed-off-by: Hou Tao <hotforest@...il.com>
>>> ---
>>>  kernel/bpf/hashtab.c | 14 ++++++++------
>>>  1 file changed, 8 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
>>> index 4a9eeb7aef85..a28b11ce74c6 100644
>>> --- a/kernel/bpf/hashtab.c
>>> +++ b/kernel/bpf/hashtab.c
>>> @@ -1179,12 +1179,14 @@ static long htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>>>  		goto err;
>>>  	}
>>>  
>>> -	/* add new element to the head of the list, so that
>>> -	 * concurrent search will find it before old elem
>>> -	 */
>>> -	hlist_nulls_add_head_rcu(&l_new->hash_node, head);
>>> -	if (l_old) {
>>> -		hlist_nulls_del_rcu(&l_old->hash_node);
>>> +	if (!l_old) {
>>> +		hlist_nulls_add_head_rcu(&l_new->hash_node, head);
>>> +	} else {
>>> +		/* Replace the old element atomically, so that
>>> +		 * concurrent search will find either the new element or
>>> +		 * the old element.
>>> +		 */
>>> +		hlist_nulls_replace_rcu(&l_new->hash_node, &l_old->hash_node);
>>>  
>>>  		/* l_old has already been stashed in htab->extra_elems, free
>>>  		 * its special fields before it is available for reuse. Also
>>>
>> After thinking about it the second time, the atomic list replacement on
>> the update side is enough to make lookup operation always find the
>> existing element. However, due to the immediate reuse, the lookup may
>> find an unexpected value. Maybe we should disable the immediate reuse
>> for specific map (e.g., htab of maps).
> Hmm, in an RCU-protected data structure, reusing the memory before an
> RCU grace period has elapsed is just as wrong as freeing it, isn't it?
> I.e., the reuse logic should have some kind of call_rcu redirection to
> be completely correct?

Not for all cases. There is SLAB_TYPESAFE_BY_RCU-typed slab. For hash
map, the reuse is also tricky (e.g., the goto again case in
lookup_nulls_elem_raw), however it can not prevent the lookup procedure
from returning unexpected value. I had post a patch set [1] to "fix"
that, but Alexei said it is "a known quirk". Here I am not sure about
whether it is reasonable to disable the reuse for htab of maps only. I
will post a v2 for the patch set.

[1]:
https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/
> -Toke


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ