lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAO3-PbqOFMMzcy9VBLaTtvihnuCJ2u38RMPnYNPckjF8BzdDkQ@mail.gmail.com>
Date: Thu, 6 Feb 2025 22:12:25 -0600
From: Yan Zhai <yan@...udflare.com>
To: Hou Tao <houtao@...weicloud.com>
Cc: bpf@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>, 
	Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>, 
	Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>, 
	Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>, 
	Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>, 
	Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>, 
	Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>, Brian Vazquez <brianvv@...gle.com>, 
	linux-kernel@...r.kernel.org, linux-kselftest@...r.kernel.org, 
	kernel-team@...udflare.com
Subject: Re: [PATCH bpf] bpf: skip non existing key in generic_map_lookup_batch

On Thu, Feb 6, 2025 at 2:20 AM Yan Zhai <yan@...udflare.com> wrote:
>
> On Thu, Feb 6, 2025 at 12:40 AM Hou Tao <houtao@...weicloud.com> wrote:
> >
> >
> >
> > Yes, the retry logic doesn't change the previous key. Thanks for the
> > clarifying.
> > > And by "skipping to the next key", it's simply
> > >
> > >   if (err == -ENOENT)
> > >        goto next_key;
> > >
> > > Note the "next_key" label was not in the current codebase. It is only
> > > in my posted patch. I don't think this would break lpm_trie unless I
> > > missed something.
> >
> > I was trying to say that the proposed patch may break the lookup_batch
> > operation for lpm_trie, and let me explain step by step:
> >
> > For LPM trie map:
> > (1) ->map_get_next_key(map, prev_key, key) returns a valid key
> >
> > (2) bpf_map_copy_value() return -ENOMENT
> > It means the key must be deleted concurrently.
> >
>
> I see what you mean now, thanks for the detailed explanation!
>
> So for lpm_trie, if an element is deleted between get_next_key and
> copy_value, then retry would still proceed to its next tree node. But
> with or without retry, I think we cannot prevent the key from cycling
> back to the beginning: an element can be deleted after copy_value, so
> the key becomes invalid. After swap with prev_key and call
> bpf_get_next_key, it cycles back to the leftmost. Similar can happen
> if during retry the prev_key also gets invalidated. IIUC the rcu
> locking in lookup really cannot prevent the tree structure from
> changes.
>
> On the other hand, bpf_map_get_next_key manual already specifies "If a
> key is not found, the operation returns zero and sets the next_key
> pointer to the key of the first element.". IMHO it probably makes more
> sense that, regardless of normal or batch lookup, it is ultimately the
> user's responsibility to properly synchronize, or deduplicate what is
> returned from the kernel. I intend to keep the "skip to next" to save
> the unnecessary complexity here, unless there are strong objections
> that I should not.

Thought about this again. It is interesting that hashmap batch lookup
actually guarantees no duplicate in output today. If we want to
achieve the same for lpm_trie, it should have its own batch lookup
routine, which could properly lock the trie against mutation. For the
generic batch, it is still better to be simple. Let me put this in the
V2 patchset for discussion.

best
Yan

>
> > (3) goto next_key
> > It swaps the prev_key and key
> >
> > (4) ->map_get_next_key(map, prev_key, key) again
> > prev_key points to a non-existing key, for LPM trie it will treat just
> > like prev_key=NULL case, the returned key will be duplicated.
> >

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ