[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <74d4c8e40e61dad369607ecd8b98f58a515479f0.camel@gmail.com>
Date: Fri, 07 Nov 2025 10:19:33 -0800
From: Eduard Zingerman <eddyz87@...il.com>
To: Donglin Peng <dolinux.peng@...il.com>, bot+bpf-ci@...nel.org,
ast@...nel.org, andrii.nakryiko@...il.com
Cc: zhangxiaoqin@...omi.com, linux-kernel@...r.kernel.org,
bpf@...r.kernel.org, alan.maguire@...cle.com, song@...nel.org,
pengdonglin@...omi.com, andrii@...nel.org, daniel@...earbox.net,
martin.lau@...nel.org, yonghong.song@...ux.dev, clm@...a.com,
ihor.solodrai@...ux.dev
Subject: Re: [PATCH v5 6/7] btf: Add lazy sorting validation for binary
search
On Fri, 2025-11-07 at 15:08 +0800, Donglin Peng wrote:
> On Thu, Nov 6, 2025 at 9:47 PM <bot+bpf-ci@...nel.org> wrote:
> >
> > > diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
> > > index 66cb739a0..33c327d3c 100644
> > > --- a/kernel/bpf/btf.c
> > > +++ b/kernel/bpf/btf.c
> > > @@ -552,6 +552,70 @@ u32 btf_nr_types(const struct btf *btf)
> > > return total;
> > > }
> > >
> > > +/* Verifies that BTF types are sorted in ascending order
> > > according to their
> > > + * names, with named types appearing before anonymous types. If
> > > the ordering
> > > + * is correct, counts the number of named types and updates the
> > > BTF object's
> > > + * nr_sorted_types field.
> > > + *
> > > + * Return: true if types are properly sorted, false otherwise
> > > + */
> > > +static bool btf_check_sorted(struct btf *btf)
> > > +{
> > > + const struct btf_type *t;
> > > + int i, n, k = 0, nr_sorted_types;
> > > +
> > > + if (likely(btf->nr_sorted_types != BTF_NEED_SORT_CHECK))
> > > + goto out;
> > > + btf->nr_sorted_types = 0;
> > ^
> >
> > Can multiple threads race when writing to btf->nr_sorted_types
> > here?
> > Looking at btf_find_by_name_kind()->btf_check_sorted(), I see that
> > btf_find_by_name_kind() receives a const pointer but casts away the
> > const
> > to call btf_check_sorted(). The function bpf_find_btf_id() calls
> > btf_find_by_name_kind() without holding any locks (line 737), and
> > later
> > explicitly unlocks before calling it again (lines 756-757).
> >
> > This means multiple threads can concurrently enter
> > btf_check_sorted() and
> > write to btf->nr_sorted_types. While the validation logic is
> > idempotent
> > and all threads would compute the same value, the concurrent writes
> > to the
> > same memory location without synchronization could trigger KCSAN
> > warnings.
> >
> > Should this use atomic operations, or should the validation be
> > performed
> > under a lock during BTF initialization before the BTF becomes
> > visible to
> > multiple threads?
>
> Hi, is it necessary to address this issue?
> For example, by using atomic_try_cmpxchg or WRITE/READ_ONCE? Using
> atomic_try_cmpxchg can prevent race conditions on writes but requires
> an atomic
> variable, while WRITE_ONCE/READ_ONCE can avoid KCSAN warnings. Since
> the race condition is unlikely to cause critical issues, I suggest
> using
> WRITE_ONCE/READ_ONCE.
Probably use WRITE_ONCE/READ_ONCE?
> > > +
> > > + if (btf->nr_types < 2)
> > > + goto out;
> > > +
> > > + nr_sorted_types = 0;
> > > + n = btf_nr_types(btf) - 1;
> > > + for (i = btf_start_id(btf); i < n; i++) {
> > > + k = i + 1;
> > > + if (btf_compare_type_names(&i, &k, btf) > 0)
> > > + goto out;
> > > +
> > > + t = btf_type_by_id(btf, i);
> > > + if (t->name_off)
> > > + nr_sorted_types++;
> > > + }
> > > +
> > > + t = btf_type_by_id(btf, k);
> > > + if (t->name_off)
> > > + nr_sorted_types++;
> > > + if (nr_sorted_types)
> > > + btf->nr_sorted_types = nr_sorted_types;
> > ^
> >
> > The same race condition applies to this write of nr_sorted_types as
> > well.
> >
> > > +
> > > +out:
> > > + return btf->nr_sorted_types > 0;
> > > +}
> >
> > [ ... ]
> >
> > > @@ -610,7 +674,7 @@ s32 btf_find_by_name_kind(const struct btf
> > > *btf, const char *name, u8 kind)
> > > goto out;
> > > }
> > >
> > > - if (btf->nr_sorted_types != BTF_NEED_SORT_CHECK) {
> > > + if (btf_check_sorted((struct btf *)btf)) {
> > ^
> >
> > The const cast here enables the concurrent writes discussed above.
> > Is
> > there a reason to mark the btf parameter as const if we're
> > modifying it?
>
> Hi team, is casting away const an acceptable approach for our
> codebase?
Casting away const is undefined behaviour, e.g. see paragraph 6.7.3.6
N1570 ISO/IEC 9899:201x Programming languages — C.
Both of the problems above can be avoided if kernel will do sorted
check non-lazily. But Andrii and Alexei seem to like that property.
>
> >
> >
> > ---
> > AI reviewed your patch. Please fix the bug or email reply why it's
> > not a bug.
> > See:
> > https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md
> >
> > CI run summary:
> > https://github.com/kernel-patches/bpf/actions/runs/19137195500
Powered by blists - more mailing lists