lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Sat, 13 Jun 2020 16:40:30 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Thomas Gleixner <tglx@...utronix.de>
Cc:     Peter Zijlstra <peterz@...radead.org>, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, linux-kernel@...r.kernel.org
Subject: Re: BUG: kernel NULL pointer dereference from check_preempt_wakeup()

On Sat, Jun 13, 2020 at 07:57:19AM -0700, Paul E. McKenney wrote:
> On Sat, Jun 13, 2020 at 09:26:40AM +0200, Thomas Gleixner wrote:
> > "Paul E. McKenney" <paulmck@...nel.org> writes:
> > > And an update based on your patch (https://paste.debian.net/1151802/)
> > > against 44ebe016df3a ("Merge branch 'proc-linus' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace").
> > 
> > I'm running this patch since midnight on top of x86/entry. Still no NULL
> > pointer deref.
> > 
> > The cross-check with plain x86/entry has triggered it on all instances
> > by now.
> 
> That is consistent with my experience.  I have not yet see a NULL pointer
> dereference with Peter's patch.  As I said earlier, tests thus far
> at my end give 95% confidence that it is a fix for the NULL pointer
> problem.
> 
> I have seen two other problems, but I haven't yet see them often enough
> to have any confidence as to what they are related to.  The RCU CPU
> stall warning happened only once, so it might have been introduced in
> mainline sometime in the last few days.  The BUG was with Peter's patch
> on an intermediate state of x86/entry, so it might be specific to that
> intermediate state.  Or to my commit/patch confusion, perhaps.
> 
> > So it looks your up to something here.
> 
> Let's recap.
> 
> I ran 140 hours each of TREE04 and TREE05 with Peter's patch on top of
> x86/entry in -tip with no complaints of any kind.  So that is good,
> and it means we have a good fix for the too-short grace periods.
> I already verified TASKS03 yesterday (not to be confused with TREE03).
> So we have a clean bill of health for x86/entry from my end with respect
> to too-short grace periods with insanely high confidence.
> 
> I have started 28*TREE03 for a few hours with Peter's patch on top
> of x86/entry in -tip, which I expect will reproduce your result of
> no NULL pointer.  If so (as I fully expect it to), I will join you in
> proclaiming Peter's patch to be a fix for the NULL pointer problem.

It did pass, so I hereby join you in proclaiming Peter's patch to be
a fix for the NULL pointer problem.  ;-)

And a big "Thank You" to you guys for tracking this one down.  It was
not at all straightforward!

> Then I follow up on https://paste.debian.net/1151842 and also on
> https://paste.debian.net/1151809.
> 
> First, I run TREE03 longer on 44ebe016df3a ("Merge branch 'proc-linus' of
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace")
> in mainline without Peter's patch ignoring any occurrences of the NULL
> pointer problem to see what happens.  If that reproduces the RCU CPU
> stall in https://paste.debian.net/1151842 or the BUG on line 1046 of
> kernel/sched/rt.c in https://paste.debian.net/1151809, I will attempt
> to bisect those in mainline.

And the run on mainline without Peter's patch did in fact reproduce the
RCU CPU stall warning.  So this is a mainline bug that I will track down
separately.  This appears to be a failure to awaken RCU's grace-period
kthread, with the kthread remaining in 0x402 sleeping state for more
than 21 seconds, which is a bit excessive for a three-jiffy sleep. On
the other hand, many of the other CPUs seem to be stuck in stop-machine.
The stall persists.

This happened one time in 112 hours of TREE03 rcutorture, so bisection
will take some time, assuming that it works at all in this case.  ;-)

So Peter's patch is fully in the clear:

Tested-by: Paul E. McKenney <paulmck@...nel.org>

							Thanx, Paul

> If neither of those two reproduce, on to other things.
> 
> Seem reasonable?
> 
> 							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ