[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQ+bDD8T6trgGePCUbjAcz36x1P0RqhNy0nRju_ULiw+mg@mail.gmail.com>
Date: Wed, 23 Apr 2025 08:06:57 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Vlastimil Babka <vbabka@...e.cz>
Cc: Dave Airlie <airlied@...il.com>, Shakeel Butt <shakeel.butt@...ux.dev>,
Sebastian Sewior <bigeasy@...utronix.de>, Andrew Morton <akpm@...ux-foundation.org>,
Linus Torvalds <torvalds@...ux-foundation.org>, Alexei Starovoitov <ast@...nel.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: Linux 6.15-rc3
On Wed, Apr 23, 2025 at 1:03 AM Vlastimil Babka <vbabka@...e.cz> wrote:
>
> On 4/23/25 09:14, Vlastimil Babka wrote:
> > On 4/23/25 01:37, Alexei Starovoitov wrote:
> >> On Tue, Apr 22, 2025 at 4:01 PM Dave Airlie <airlied@...il.com> wrote:
> >>>
> >>> > Alexei Starovoitov (2):
> >>> > locking/local_lock, mm: replace localtry_ helpers with
> >>> > local_trylock_t type
> >>>
> >>> This seems to have upset some phoronix nginx workload
> >>> https://www.phoronix.com/review/linux-615-nginx-regression/2
> >>
> >> 3x regression? wow.
> >> Thanks for heads up.
> >> I'm staring at the patch and don't see it.
> >> Adding more experts.
> >
> > Incidentally my work on slab sheaves using local_trylock() got to a phase
> > yesterday when after rebasing on rc3 and some refactoring I was looking at
> > sheaf stats that showed the percpu sheaves were used exactly once per cpu,
> > and other attempts failed. Which would be explained by local_trylock()
> > failing. In the context of rc3 itself it would mean the memcg stocks aren't
> > used at all because they can't be try-locked. Which could make benchmarks
> > unhappy of course, although surprising that it would be that much.
> >
> > What I suspect now is the _Generic() part doesn't work as expected. So consider:
> >
> > local_trylock() (or _irqsave variant) has no _Generic() part, does the
> > "if (READ_ONCE(tl->acquired))" and "WRITE_ONCE(tl->acquired, 1)" directly,
> > succeeds the first attempt on each cpu where executed.
> >
> > local_unlock() goes via __local_lock_release() and since the _Generic() part
> > there doesn't work, we don't do WRITE_ONCE(tl->acquired, 0); so it stays 1.
> >
> > preempt or irq handling is fine so nothing like lockdep, preempt debugging,
> > watchdogs gets suspicious, just the cpu can never succeed local_trylock() again
> >
> > local_lock(_irqsave()) uses __local_lock_acquire() which also has a
> > _Generic() part but since it doesn't work, the "lockdep_assert(tl->acquired
> > == 0);" there isn't triggered either
> >
> > In fact I've put BUG() in the _Generic() sections of _acquire() and _release()
> > and it didn't trigger, which would prove the code isn't executed. But I don't
> > know why _Generic() doesn't recognize the correct type there.
> >
> > --- a/include/linux/local_lock_internal.h
> > +++ b/include/linux/local_lock_internal.h
> > @@ -104,6 +104,7 @@ do { \
> > _Generic((lock), \
> > local_trylock_t *: ({ \
> > lockdep_assert(tl->acquired == 0); \
> > + BUG(); \
> > WRITE_ONCE(tl->acquired, 1); \
> > }), \
> > default:(void)0); \
> > @@ -173,6 +174,7 @@ do { \
> > _Generic((lock), \
> > local_trylock_t *: ({ \
> > lockdep_assert(tl->acquired == 1); \
> > + BUG(); \
> > WRITE_ONCE(tl->acquired, 0); \
> > }), \
> > default:(void)0); \
> >
>
> Oh I see, replacing the default: which "local_lock_t *:" which is the only
> other expected type, forces the compiler to actually tell me what's wrong:
>
> ./include/linux/local_lock_internal.h:174:26: error: ‘_Generic’ selector of
> type ‘__seg_gs local_lock_t *’ is not compatible with any association
That explains why I and others couldn't repro it yesterday
no matter what we tried.
We're still on gcc-13 and this bit wasn't triggering:
#define CC_HAS_TYPEOF_UNQUAL (__GNUC__ >= 14)
Upgraded the compiler and can confirm everything.
Powered by blists - more mailing lists