[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQJU9OWAWFk89P6i1RK6vXkuee5s76suHjF+uP+V4iepqQ@mail.gmail.com>
Date: Wed, 26 Feb 2025 17:59:38 -0800
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Hou Tao <houtao@...weicloud.com>
Cc: Zvi Effron <zeffron@...tgames.com>, Toke Høiland-Jørgensen <toke@...nel.org>,
bpf <bpf@...r.kernel.org>, rcu@...r.kernel.org,
LKML <linux-kernel@...r.kernel.org>, Alexei Starovoitov <ast@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, Andrii Nakryiko <andrii@...nel.org>,
Martin KaFai Lau <martin.lau@...ux.dev>, Eduard Zingerman <eddyz87@...il.com>, Song Liu <song@...nel.org>,
Yonghong Song <yonghong.song@...ux.dev>, John Fastabend <john.fastabend@...il.com>,
KP Singh <kpsingh@...nel.org>, Stanislav Fomichev <sdf@...ichev.me>, Hao Luo <haoluo@...gle.com>,
Jiri Olsa <jolsa@...nel.org>, "Paul E . McKenney" <paulmck@...nel.org>, Cody Haas <chaas@...tgames.com>,
Hou Tao <hotforest@...il.com>
Subject: Re: [RESEND] [PATCH bpf-next 2/3] bpf: Overwrite the element in hash
map atomically
On Wed, Feb 26, 2025 at 5:48 PM Hou Tao <houtao@...weicloud.com> wrote:
>
> Hi,
>
> On 2/27/2025 7:17 AM, Zvi Effron wrote:
> > On Tue, Feb 25, 2025 at 9:42 PM Alexei Starovoitov
> > <alexei.starovoitov@...il.com> wrote:
> >> On Tue, Feb 25, 2025 at 8:05 PM Hou Tao <houtao@...weicloud.com> wrote:
> >>> Hi,
> >>>
> >>> On 2/26/2025 11:24 AM, Alexei Starovoitov wrote:
> >>>> On Sat, Feb 8, 2025 at 2:17 AM Hou Tao <houtao@...weicloud.com> wrote:
> >>>>> Hi Toke,
> >>>>>
> >>>>> On 2/6/2025 11:05 PM, Toke Høiland-Jørgensen wrote:
> >>>>>> Hou Tao <houtao@...weicloud.com> writes:
> >>>>>>
> >>>>>>> +cc Cody Haas
> >>>>>>>
> >>>>>>> Sorry for the resend. I sent the reply in the HTML format.
> >>>>>>>
> >>>>>>> On 2/4/2025 4:28 PM, Hou Tao wrote:
> >>>>>>>> Currently, the update of existing element in hash map involves two
> >>>>>>>> steps:
> >>>>>>>> 1) insert the new element at the head of the hash list
> >>>>>>>> 2) remove the old element
> >>>>>>>>
> >>>>>>>> It is possible that the concurrent lookup operation may fail to find
> >>>>>>>> either the old element or the new element if the lookup operation starts
> >>>>>>>> before the addition and continues after the removal.
> >>>>>>>>
> >>>>>>>> Therefore, replacing the two-step update with an atomic update. After
> >>>>>>>> the change, the update will be atomic in the perspective of the lookup
> >>>>>>>> operation: it will either find the old element or the new element.
> >>>> I'm missing the point.
> >>>> This "atomic" replacement doesn't really solve anything.
> >>>> lookup will see one element.
> >>>> That element could be deleted by another thread.
> >>>> bucket lock and either two step update or single step
> >>>> don't change anything from the pov of bpf prog doing lookup.
> >>> The point is that overwriting an existed element may lead to concurrent
> >>> lookups return ENOENT as demonstrated by the added selftest and the
> >>> patch tried to "fix" that. However, it seems using
> >>> hlist_nulls_replace_rcu() for the overwriting update is still not
> >>> enough. Because when the lookup procedure found the old element, the old
> >>> element may be reusing, the comparison of the map key may fail, and the
> >>> lookup procedure may still return ENOENT.
> >> you mean l_old == l_new ? I don't think it's possible
> >> within htab_map_update_elem(),
> >> but htab_map_delete_elem() doing hlist_nulls_del_rcu()
> >> then free_htab_elem, htab_map_update_elem, alloc, hlist_nulls_add_head_rcu
> >> and just deleted elem is reused to be inserted
> >> into another bucket.
>
> No. I mean the following procedure in which the lookup procedure finds
> the old element and tries to match its key, one update procedure has
> already deleted the old element, and another update procedure is reusing
> the old element:
>
> lookup procedure A
> A: find the old element (instead of the new old)
>
> update procedure B
> B: delete the old element
> update procedure C on the same CPU:
> C: reuse the old element (overwrite its key and insert in
> the same bucket)
>
> A: the key is mismatched and return -ENOENT.
This is fine. It's just normal reuse.
Orthogonal to 'update as insert+delete' issue.
> It can be reproduced by increasing ctx.loop in the selftest.
> >>
> >> I'm not sure whether this new hlist_nulls_replace_rcu()
> >> primitive works with nulls logic.
> >>
> >> So back to the problem statement..
> >> Are you saying that adding new to a head while lookup is in the middle
> >> is causing it to miss an element that
> >> is supposed to be updated assuming atomicity of the update?
> >> While now update_elem() is more like a sequence of delete + insert?
> >>
> >> Hmm.
> > Yes, exactly that. Because update_elem is actually a delete + insert (actually
> > an insert + delete, I think?), it is possible for a concurrent lookup to see no
> > element instead of either the old or new value.
>
> Yep.
> >
> >>> I see. In v2 I will fallback to the original idea: adding a standalone
> >>> update procedure for htab of maps in which it will atomically overwrite
> >>> the map_ptr just like array of maps does.
> >> hold on. is this only for hash-of-maps ?
> > I believe this was also replicated for hash as well as hash-of-maps. Cody can
> > confirm, or use the reproducer he has to test that case.
>
> The fix for hash-of-maps will be much simpler and the returned map
> pointer will be correct. For other types of hash map, beside switching
> to hlist_nulls_replace_rcu(),
It's been a long time since I looked into rcu_nulls details.
Pls help me understand that this new replace_rcu_nulls()
is correct from nulls pov,
If it is then this patch set may be the right answer to non-atomic update.
And for the future, please please focus on "why" part in
the cover letter and commit logs instead of "what".
Since the only thing I got from the log was:
"Currently, the update is not atomic
because the overwrite of existing element happens in a two-steps way,
but the support of atomic update is feasible".
"is feasible" doesn't explain "why".
Link to xdp-newbie question is nice for additional context,
but reviewers should not need to go and read some thread somewhere
to understand "why" part.
All of it should be in the commit log.
> map may still be incorrect (as shown long time ago [1]), so I think
> maybe for other types of map, the atomic update doesn't matter too much.
>
> [1]:
> https://lore.kernel.org/bpf/20221230041151.1231169-1-houtao@huaweicloud.com/
A thread from 3 years ago ?! Sorry, it's not helpful to ask
people to page-in such an old context with lots of follow ups
that may or may not be relevant today.
Powered by blists - more mailing lists