lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <20170320125000.GG3637@linux.vnet.ibm.com>
Date:   Mon, 20 Mar 2017 05:50:00 -0700
From:   "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>
To:     Tomeu Vizoso <tomeu@...euvizoso.net>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Ingo Molnar <mingo@...hat.com>, fweisbec@...il.com
Subject: Re: RCU used on incoming CPU before rcu_cpu_starting() called

On Mon, Mar 20, 2017 at 09:32:37AM +0100, Tomeu Vizoso wrote:
> On 9 March 2017 at 16:50, Paul E. McKenney <paulmck@...ux.vnet.ibm.com> wrote:
> >
> > On Thu, Mar 09, 2017 at 07:29:26AM -0800, Paul E. McKenney wrote:
> > > On Thu, Mar 09, 2017 at 04:12:55PM +0100, Peter Zijlstra wrote:
> > > > On Thu, Mar 09, 2017 at 02:08:23PM +0100, Thomas Gleixner wrote:
> > > > > On Wed, 8 Mar 2017, Paul E. McKenney wrote:
> > > > > > [   30.694013]  lockdep_rcu_suspicious+0xe7/0x120
> > > > > > [   30.694013]  get_work_pool+0x82/0x90
> > > > > > [   30.694013]  __queue_work+0x70/0x5f0
> > > > > > [   30.694013]  queue_work_on+0x33/0x70
> > > > > > [   30.694013]  clear_sched_clock_stable+0x33/0x40
> > > > > > [   30.694013]  early_init_intel+0xe7/0x2f0
> > > > > > [   30.694013]  init_intel+0x11/0x350
> > > > > > [   30.694013]  identify_cpu+0x344/0x5a0
> > > > > > [   30.694013]  identify_secondary_cpu+0x18/0x80
> > > > > > [   30.694013]  smp_store_cpu_info+0x39/0x40
> > > > > > [   30.694013]  start_secondary+0x4e/0x100
> > > > > > [   30.694013]  start_cpu+0x14/0x14
> > > > > >
> > > > > > Here is the relevant code from x86's smp_callin():
> > > > > >
> > > > > >         /*
> > > > > >          * Save our processor parameters. Note: this information
> > > > > >          * is needed for clock calibration.
> > > > > >          */
> > > > > >         smp_store_cpu_info(cpuid);
> > > > > >
> > > > > > The problem is that smp_store_cpu_info() indirectly invokes
> > > > > > schedule_work(), which wants to use RCU.  But RCU isn't informed
> > > > > > of the incoming CPU until the call to notify_cpu_starting(), which
> > > > > > causes lockdep to complain bitterly about the use of RCU by the
> > > > > > premature call to schedule_work().
> > > > >
> > > > > Right. And that want's to be fixed, not hacked around by silencing RCU.
> > > > >
> > > > > Peter????
> > > >
> > > > I'm thinking this is hotplug? 30 seconds after boot is far too late for
> > > > SMP bringup, or you have a stupid slow machine.
> > >
> > > And this certainly does qualify as "shortly", thank you!
> > >
> > > Yes, this only happens on hotplug with lockdep enabled, specifically
> > > on rcutorture scenarios TASKS01 and TREE05.
> > >
> > > > Because it only calls schedule_work() after SMP-init. In which case
> > > > there's then two cases, either:
> > > >
> > > >  - TSC was stable, hotplug wrecked it, TSC is now unstable, and we're
> > > >    screwed.
> > > >
> > > >  - TSC was unstable, hotplug triggers and we want to mark it unstable
> > > >    _again_.
> > > >
> > > > If this is the second, the below should fix it, if its the first, I've
> > > > no idea yet on how to fix that properly :/
> > >
> > > I have applied this patch and started tests on TREE05 and TASKS01, should
> > > get results shortly.
> >
> > And the below patch passed light rcutorture testing, so looking good!
> 
> I'm having trouble finding this patch in linux-next, has it been pushed already?

Peter pointed out that this v4.11-rc2 patch should fix the problem, see
Message-ID: <20170316155310.afq6zfzkzrnsqm5n@...ez.programming.kicks-ass.net>.
I rebased to v4.11-rc2, and haven't seen the problem, so I dropped the
patch referred to above.

  f94c8d116997 ("sched/clock, x86/tsc: Rework the x86 'unstable' sched_clock() interface")

I am not sure whether or not Peter is sending another patch or if he
was instead was going to amend f94c8d116997's changelog.

							Thanx, Paul

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ