[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAADnVQJ=MPyoL3YrcCvCUG43R37bB7D6kuqysD-SJCQZE2yQrw@mail.gmail.com>
Date: Fri, 7 Feb 2025 18:13:00 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Yan Zhai <yan@...udflare.com>
Cc: bpf <bpf@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau <martin.lau@...ux.dev>,
Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, KP Singh <kpsingh@...nel.org>,
Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>, Brian Vazquez <brianvv@...gle.com>,
LKML <linux-kernel@...r.kernel.org>,
"open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@...r.kernel.org>, kernel-team <kernel-team@...udflare.com>,
Hou Tao <houtao@...weicloud.com>
Subject: Re: [PATCH v2 bpf 1/2] bpf: skip non exist keys in generic_map_lookup_batch
On Thu, Feb 6, 2025 at 9:45 PM Yan Zhai <yan@...udflare.com> wrote:
>
> The generic_map_lookup_batch currently returns EINTR if it fails with
> ENOENT and retries several times on bpf_map_copy_value. The next batch
> would start from the same location, presuming it's a transient issue.
> This is incorrect if a map can actually have "holes", i.e.
> "get_next_key" can return a key that does not point to a valid value. At
> least the array of maps type may contain such holes legitly. Right now
> these holes show up, generic batch lookup cannot proceed any more. It
> will always fail with EINTR errors.
>
> Rather, do not retry in generic_map_lookup_batch. If it finds a non
> existing element, skip to the next key. This simple solution comes with
> a price that transient errors may not be recovered, and the iteration
> might cycle back to the first key under parallel deletion. For example,
> Hou Tao <houtao@...weicloud.com> pointed out a following scenario:
>
> For LPM trie map:
> (1) ->map_get_next_key(map, prev_key, key) returns a valid key
>
> (2) bpf_map_copy_value() return -ENOMENT
> It means the key must be deleted concurrently.
>
> (3) goto next_key
> It swaps the prev_key and key
>
> (4) ->map_get_next_key(map, prev_key, key) again
> prev_key points to a non-existing key, for LPM trie it will treat just
> like prev_key=NULL case, the returned key will be duplicated.
>
> With the retry logic, the iteration can continue to the key next to the
> deleted one. But if we directly skip to the next key, the iteration loop
> would restart from the first key for the lpm_trie type.
>
> However, not all races may be recovered. For example, if current key is
> deleted after instead of before bpf_map_copy_value, or if the prev_key
> also gets deleted, then the loop will still restart from the first key
> for lpm_tire anyway. For generic lookup it might be better to stay
> simple, i.e. just skip to the next key. To guarantee that the output
> keys are not duplicated, it is better to implement map type specific
> batch operations, which can properly lock the trie and synchronize with
> concurrent mutators.
>
> Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
> Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
> Signed-off-by: Yan Zhai <yan@...udflare.com>
> ---
> v1->v2: incorporate more useful information inside commit message.
> v1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
> ---
> kernel/bpf/syscall.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c420edbfb7c8..5d0a4db6fb85 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1979,8 +1979,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> void __user *values = u64_to_user_ptr(attr->batch.values);
> void __user *keys = u64_to_user_ptr(attr->batch.keys);
> void *buf, *buf_prevkey, *prev_key, *key, *value;
> - int err, retry = MAP_LOOKUP_RETRIES;
> u32 value_size, cp, max_count;
> + int err;
>
> if (attr->batch.elem_flags & ~BPF_F_LOCK)
> return -EINVAL;
> @@ -2026,14 +2026,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> err = bpf_map_copy_value(map, key, value,
> attr->batch.elem_flags);
>
> - if (err == -ENOENT) {
> - if (retry) {
> - retry--;
> - continue;
> - }
> - err = -EINTR;
> - break;
> - }
> + if (err == -ENOENT)
> + goto next_key;
>
> if (err)
> goto free_buf;
> @@ -2048,12 +2042,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
> goto free_buf;
> }
>
> + cp++;
> +next_key:
> if (!prev_key)
> prev_key = buf_prevkey;
>
> swap(prev_key, key);
> - retry = MAP_LOOKUP_RETRIES;
> - cp++;
#define MAP_LOOKUP_RETRIES is unused after this patch.
Delete it as well.
Pls keep acks on respin.
pw-bot: cr
Powered by blists - more mailing lists