[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200613234030.GA25146@paulmck-ThinkPad-P72>
Date: Sat, 13 Jun 2020 16:40:30 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, linux-kernel@...r.kernel.org
Subject: Re: BUG: kernel NULL pointer dereference from check_preempt_wakeup()
On Sat, Jun 13, 2020 at 07:57:19AM -0700, Paul E. McKenney wrote:
> On Sat, Jun 13, 2020 at 09:26:40AM +0200, Thomas Gleixner wrote:
> > "Paul E. McKenney" <paulmck@...nel.org> writes:
> > > And an update based on your patch (https://paste.debian.net/1151802/)
> > > against 44ebe016df3a ("Merge branch 'proc-linus' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace").
> >
> > I'm running this patch since midnight on top of x86/entry. Still no NULL
> > pointer deref.
> >
> > The cross-check with plain x86/entry has triggered it on all instances
> > by now.
>
> That is consistent with my experience. I have not yet see a NULL pointer
> dereference with Peter's patch. As I said earlier, tests thus far
> at my end give 95% confidence that it is a fix for the NULL pointer
> problem.
>
> I have seen two other problems, but I haven't yet see them often enough
> to have any confidence as to what they are related to. The RCU CPU
> stall warning happened only once, so it might have been introduced in
> mainline sometime in the last few days. The BUG was with Peter's patch
> on an intermediate state of x86/entry, so it might be specific to that
> intermediate state. Or to my commit/patch confusion, perhaps.
>
> > So it looks your up to something here.
>
> Let's recap.
>
> I ran 140 hours each of TREE04 and TREE05 with Peter's patch on top of
> x86/entry in -tip with no complaints of any kind. So that is good,
> and it means we have a good fix for the too-short grace periods.
> I already verified TASKS03 yesterday (not to be confused with TREE03).
> So we have a clean bill of health for x86/entry from my end with respect
> to too-short grace periods with insanely high confidence.
>
> I have started 28*TREE03 for a few hours with Peter's patch on top
> of x86/entry in -tip, which I expect will reproduce your result of
> no NULL pointer. If so (as I fully expect it to), I will join you in
> proclaiming Peter's patch to be a fix for the NULL pointer problem.
It did pass, so I hereby join you in proclaiming Peter's patch to be
a fix for the NULL pointer problem. ;-)
And a big "Thank You" to you guys for tracking this one down. It was
not at all straightforward!
> Then I follow up on https://paste.debian.net/1151842 and also on
> https://paste.debian.net/1151809.
>
> First, I run TREE03 longer on 44ebe016df3a ("Merge branch 'proc-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
> in mainline without Peter's patch ignoring any occurrences of the NULL
> pointer problem to see what happens. If that reproduces the RCU CPU
> stall in https://paste.debian.net/1151842 or the BUG on line 1046 of
> kernel/sched/rt.c in https://paste.debian.net/1151809, I will attempt
> to bisect those in mainline.
And the run on mainline without Peter's patch did in fact reproduce the
RCU CPU stall warning. So this is a mainline bug that I will track down
separately. This appears to be a failure to awaken RCU's grace-period
kthread, with the kthread remaining in 0x402 sleeping state for more
than 21 seconds, which is a bit excessive for a three-jiffy sleep. On
the other hand, many of the other CPUs seem to be stuck in stop-machine.
The stall persists.
This happened one time in 112 hours of TREE03 rcutorture, so bisection
will take some time, assuming that it works at all in this case. ;-)
So Peter's patch is fully in the clear:
Tested-by: Paul E. McKenney <paulmck@...nel.org>
Thanx, Paul
> If neither of those two reproduce, on to other things.
>
> Seem reasonable?
>
> Thanx, Paul
Powered by blists - more mailing lists