[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CANpmjNPb8vsNd=P3f0xASWf4i6d_9KHGWGGWGD2p+GViubQDPA@mail.gmail.com>
Date: Fri, 31 Jan 2020 20:20:17 +0100
From: Marco Elver <elver@...gle.com>
To: Eric Dumazet <edumazet@...gle.com>
Cc: Peter Zijlstra <peterz@...radead.org>,
Will Deacon <will@...nel.org>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...nel.org>,
"the arch/x86 maintainers" <x86@...nel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: Confused about hlist_unhashed_lockless()
On Fri, 31 Jan 2020 at 19:48, Eric Dumazet <edumazet@...gle.com> wrote:
>
> On Fri, Jan 31, 2020 at 10:43 AM Peter Zijlstra <peterz@...radead.org> wrote:
> >
> > On Fri, Jan 31, 2020 at 08:48:05AM -0800, Eric Dumazet wrote:
> > > BUG: KCSAN: data-race in del_timer / detach_if_pending
> >
> > > diff --git a/include/linux/timer.h b/include/linux/timer.h
> > > index 1e6650ed066d5d28251b0bd385fc37ef94c96532..0dc19a8c39c9e49a7cde3d34bfa4be8871cbc1c2
> > > 100644
> > > --- a/include/linux/timer.h
> > > +++ b/include/linux/timer.h
> > > @@ -164,7 +164,7 @@ static inline void destroy_timer_on_stack(struct
> > > timer_list *timer) { }
> > > */
> > > static inline int timer_pending(const struct timer_list * timer)
> > > {
> > > - return timer->entry.pprev != NULL;
> > > + return !hlist_unhashed_lockless(&timer->entry);
> > > }
> >
> > That's just completely wrong.
> >
> > Aside from any memory barrier issues that might or might not be there
> > (I'm waaaay to tired atm to tell), the above code is perfectly fine.
> >
> > In fact, this is a KCSAN compiler infrastructure 'bug'.
> >
> > Any load that is only compared to zero is immune to load-tearing issues.
We're not just fighting load-tearing, but also other compiler
optimizations. Eric pointed out the loop case:
while (!timer_pending(..)) { }
Without a READ_ONCE, the compiler can turn this into:
if (!timer_pending(..)) {
while (true) { }
}
So, unless we can enumerate all possible cases in which a compare to 0
data race can occur, and prove they're all safe for all compiler
optimizations, could we plase not claim it's a general KCSAN compiler
infrastructure bug.
This also makes me second-guess if the osq_lock case is safe (looking
at it, I think it's still safe, but only because the surrounding ops
imply compiler barriers and the compiler then won't mess up the loop).
> > The correct thing to do here is something like:
> >
> > static inline int timer_pending(const struct timer_list *timer)
> > {
> > /*
> > * KCSAN compiler infrastructure is insuffiently clever to
> > * realize that a 'load compared to zero' is immune to
> > * load-tearing.
> > */
I could agree with this if we knew the context in which this is used.
Unfortunately we do not know the context (unlike osq_lock), and for
all we know it could be used in a loop.
> > return data_race(timer->entry.pprev != NULL);
> > }
>
>
> This is nice, now with have data_race()
Again, if you use 'data_race()' you'll have to prove that it'll be
safe in all cases, for all compiler versions current and present. The
READ_ONCE would take that burden from you.
Thanks,
-- Marco
> Remember these patches were sent 2 months ago, at a time we were
> trying to sort out things.
>
> data_race() was merged a few days ago.
Powered by blists - more mailing lists