[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFULd4asiDaj1OTrqWNMr5coyPeqM1NT6v2uEqKvJicRUhekSQ@mail.gmail.com>
Date: Sat, 7 Jun 2025 10:52:51 +0200
From: Uros Bizjak <ubizjak@...il.com>
To: Dave Hansen <dave.hansen@...el.com>
Cc: Jiri Slaby <jirislaby@...nel.org>, x86@...nel.org, linux-mm@...ck.org,
linux-kernel@...r.kernel.org, linux-bcachefs@...r.kernel.org,
linux-arch@...r.kernel.org, netdev@...r.kernel.org,
Nadav Amit <nadav.amit@...il.com>, Dennis Zhou <dennis@...nel.org>, Tejun Heo <tj@...nel.org>,
Christoph Lameter <cl@...ux.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...nel.org>,
Borislav Petkov <bp@...en8.de>, Dave Hansen <dave.hansen@...ux.intel.com>,
"H. Peter Anvin" <hpa@...or.com>, Linus Torvalds <torvalds@...ux-foundation.org>,
Andy Lutomirski <luto@...nel.org>, Brian Gerst <brgerst@...il.com>,
Peter Zijlstra <peterz@...radead.org>, Shung-Hsi Yu <shung-hsi.yu@...e.com>,
Alexei Starovoitov <alexei.starovoitov@...il.com>
Subject: Re: Large modules with 6.15 [was: [PATCH v4 6/6] percpu/x86: Enable
strict percpu checks via named AS qualifiers]
On Fri, Jun 6, 2025 at 5:44 PM Dave Hansen <dave.hansen@...el.com> wrote:
>
> On 6/6/25 02:17, Jiri Slaby wrote:
> > Given this is the second time I hit a bug with this, perhaps introduce
> > an EXPERIMENTAL CONFIG option, so that random users can simply disable
> > it if an issue occurs? Without the need of patching random userspace and
> > changing random kernel headers?
>
> What about something like the attached (untested) patch? That should at
> least get folks back to the old, universal working behavior even when
> using new compilers.
IMO the commit message is unnecessarily overly dramatic. The "nasty
bugs" were in fact:
- unfortunate mix of clang < 19 and new gcc-14 [1], fixed by
robustifying the detection of typeof_unqual
[1] https://lore.kernel.org/lkml/CA+G9fYuP2bHnDvJwfMm6+8Y8UYk74qCw-2HsFyRzJDFiQ5dbpQ@mail.gmail.com/
- sparse doesn't understand new keyword, patch at [2], but sparse is
effectively unmaintained so a workaround is in place
[2] https://lore.kernel.org/linux-sparse/5b8d0dee-8fb6-45af-ba6c-7f74aff9a4b8@stanley.mountain/
- genksyms didn't understand the new keyword, fixed by [3].
[3] https://lore.kernel.org/lkml/174461594538.31282.5752735096854392083.tip-bot2@tip-bot2/
- a performance regression, again due to the unfortunate usage of old
gcc-13 [4]. The new gcc-14 would break compilation due to the missing
__percpu qualifier. This is one of the examples, where new checks
would prevent the issue during the development. Fixed with the help of
gcc-14.
[4] https://lore.kernel.org/all/CAADnVQ+iFBxauKq99=-Xk+BdG+Lv=Xgvwi1dC4fpG0utmXJiiA@mail.gmail.com/
- the issue in this thread, already fixed/worked around. Looking at
the fix, I don't think gcc is at fault, but I speculate that there
could be some invalid assumption about dwarf representation of
variables in non-default address space at play. I'll look at this one
in some more detail.
Please also note that besides the above issues, the GCC type system
and related checks around named address spaces was rock solid; there
were *zero* bugs regarding __percpu variables, and the referred patch
moves *all of them* to __seg_gs named address space. The patch builds
off an equally stable and now well proven GCC named address space
support, so in my opinion, it *is* ready for prime time. As
demonstrated in the above list of issues, it was *never* the compiler
at fault.
Let me reiterate what the patch brings to the table. It prevents
invalid references of per cpu variables to non-percpu locations. One
missing percpu dereference can have disastrous consequences (all CPUs
will access data in the shared space). Currently, the safety builds on
checking sparse logs, but sparse errors don't break the build. With
new checks in place, *every* invalid access is detected and breaks the
build with some 50 lines of errors.
Hiding these checks behind the CONFIG_EXPERT option breaks the
intention of the patch. IMO, it should be always enabled to avoid
errors, mentioned in the previous paragraph, already during the
development time.
I'm much more inclined to James' proposal. Maybe we can disable these
checks in v6.15 stable series, but leave them in v6.16? This would
leave a couple of months for distributions to update libbpf.
Thanks,
Uros.
Powered by blists - more mailing lists