[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a74970e1-9545-a975-3501-cfb17277f303@huaweicloud.com>
Date: Sat, 8 Feb 2025 08:51:45 +0800
From: Hou Tao <houtao@...weicloud.com>
To: Yan Zhai <yan@...udflare.com>, bpf@...r.kernel.org
Cc: Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>,
John Fastabend <john.fastabend@...il.com>,
Andrii Nakryiko <andrii@...nel.org>, Martin KaFai Lau
<martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>,
Song Liu <song@...nel.org>, Yonghong Song <yonghong.song@...ux.dev>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>,
Hao Luo <haoluo@...gle.com>, Jiri Olsa <jolsa@...nel.org>,
Mykola Lysenko <mykolal@...com>, Shuah Khan <shuah@...nel.org>,
Brian Vazquez <brianvv@...gle.com>, linux-kernel@...r.kernel.org,
linux-kselftest@...r.kernel.org, kernel-team@...udflare.com
Subject: Re: [PATCH v2 bpf 1/2] bpf: skip non exist keys in
generic_map_lookup_batch
On 2/7/2025 1:45 PM, Yan Zhai wrote:
> The generic_map_lookup_batch currently returns EINTR if it fails with
> ENOENT and retries several times on bpf_map_copy_value. The next batch
> would start from the same location, presuming it's a transient issue.
> This is incorrect if a map can actually have "holes", i.e.
> "get_next_key" can return a key that does not point to a valid value. At
> least the array of maps type may contain such holes legitly. Right now
> these holes show up, generic batch lookup cannot proceed any more. It
> will always fail with EINTR errors.
>
> Rather, do not retry in generic_map_lookup_batch. If it finds a non
> existing element, skip to the next key. This simple solution comes with
> a price that transient errors may not be recovered, and the iteration
> might cycle back to the first key under parallel deletion. For example,
> Hou Tao <houtao@...weicloud.com> pointed out a following scenario:
>
> For LPM trie map:
> (1) ->map_get_next_key(map, prev_key, key) returns a valid key
>
> (2) bpf_map_copy_value() return -ENOMENT
> It means the key must be deleted concurrently.
>
> (3) goto next_key
> It swaps the prev_key and key
>
> (4) ->map_get_next_key(map, prev_key, key) again
> prev_key points to a non-existing key, for LPM trie it will treat just
> like prev_key=NULL case, the returned key will be duplicated.
>
> With the retry logic, the iteration can continue to the key next to the
> deleted one. But if we directly skip to the next key, the iteration loop
> would restart from the first key for the lpm_trie type.
>
> However, not all races may be recovered. For example, if current key is
> deleted after instead of before bpf_map_copy_value, or if the prev_key
> also gets deleted, then the loop will still restart from the first key
> for lpm_tire anyway. For generic lookup it might be better to stay
> simple, i.e. just skip to the next key. To guarantee that the output
> keys are not duplicated, it is better to implement map type specific
> batch operations, which can properly lock the trie and synchronize with
> concurrent mutators.
Make sense.
>
> Fixes: cb4d03ab499d ("bpf: Add generic support for lookup batch op")
> Closes: https://lore.kernel.org/bpf/Z6JXtA1M5jAZx8xD@debian.debian/
> Signed-off-by: Yan Zhai <yan@...udflare.com>
Acked-by: Hou Tao <houtao1@...wei.com>
> ---
> v1->v2: incorporate more useful information inside commit message.
> v1: https://lore.kernel.org/bpf/Z6OYbS4WqQnmzi2z@debian.debian/
> ---
> kernel/bpf/syscall.c | 16 +++++-----------
> 1 file changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index c420edbfb7c8..5d0a4db6fb85 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -1979,8 +1979,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> void __user *values = u64_to_user_ptr(attr->batch.values);
> void __user *keys = u64_to_user_ptr(attr->batch.keys);
> void *buf, *buf_prevkey, *prev_key, *key, *value;
> - int err, retry = MAP_LOOKUP_RETRIES;
> u32 value_size, cp, max_count;
> + int err;
>
> if (attr->batch.elem_flags & ~BPF_F_LOCK)
> return -EINVAL;
> @@ -2026,14 +2026,8 @@ int generic_map_lookup_batch(struct bpf_map *map,
> err = bpf_map_copy_value(map, key, value,
> attr->batch.elem_flags);
>
> - if (err == -ENOENT) {
> - if (retry) {
> - retry--;
> - continue;
> - }
> - err = -EINTR;
> - break;
> - }
> + if (err == -ENOENT)
> + goto next_key;
>
> if (err)
> goto free_buf;
> @@ -2048,12 +2042,12 @@ int generic_map_lookup_batch(struct bpf_map *map,
> goto free_buf;
> }
>
> + cp++;
> +next_key:
> if (!prev_key)
> prev_key = buf_prevkey;
>
> swap(prev_key, key);
> - retry = MAP_LOOKUP_RETRIES;
> - cp++;
> cond_resched();
> }
>
Powered by blists - more mailing lists