lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20160122223329.GC25832@tassilo.jf.intel.com>
Date:	Fri, 22 Jan 2016 14:33:29 -0800
From:	Andi Kleen <ak@...ux.intel.com>
To:	Dave Jones <davej@...emonkey.org.uk>,
	Linux Kernel <linux-kernel@...r.kernel.org>, x86@...nel.org
Cc:	tglx@...utronix.de, paulmck@...ux.vnet.ibm.com, rostedt@...dmis.org
Subject: Re: suspicious RCU usage in msr tracing.

On Thu, Jan 21, 2016 at 07:34:27PM -0500, Dave Jones wrote:
> Just hit this on Linus' current tree.
> 
> [ INFO: suspicious RCU usage. ]
> 4.4.0-think+ #1 Tainted: G        W      
> -------------------------------
> ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
> 
> other info that might help us debug this:
> 
> 
> RCU used illegally from idle CPU!
> rcu_scheduler_active = 1, debug_locks = 0
> RCU used illegally from extended quiescent state!
> no locks held by swapper/3/0.
> 
> stack backtrace:
> CPU: 3 PID: 0 Comm: swapper/3 Tainted: G        W       4.4.0-think+ #1
>  ffffffff8ef82ac0 c4dd1c3486ada576 ffff880468e07f08 ffffffff8e566ae1
>  ffff880464905340 ffff880468e07f38 ffffffff8e135bf8 ffffffff8f665b00
>  ffff880464918000 0000000000000000 ffff880464920000 ffff880468e07f70
> Call Trace:
>  <IRQ>  [<ffffffff8e566ae1>] dump_stack+0x4e/0x7d
>  [<ffffffff8e135bf8>] lockdep_rcu_suspicious+0xf8/0x110
>  [<ffffffff8e5b3f36>] do_trace_write_msr+0x136/0x140
>  [<ffffffff8e061753>] native_apic_msr_eoi_write+0x23/0x30
>  [<ffffffff8e054456>] smp_call_function_single_interrupt+0x36/0x50
>  [<ffffffff8e05447e>] smp_call_function_interrupt+0xe/0x10
>  [<ffffffff8ed1bdc0>] call_function_interrupt+0x90/0xa0
>  <EOI>  [<ffffffff8e2f1b20>] ? __asan_store4+0x80/0x80
>  [<ffffffff8eac6477>] ? poll_idle+0x67/0xc0
>  [<ffffffff8eac5a94>] cpuidle_enter_state+0x174/0x430
>  [<ffffffff8eac5da7>] cpuidle_enter+0x17/0x20
>  [<ffffffff8e128ff5>] cpu_startup_entry+0x4c5/0x5a0
>  [<ffffffff8e128b30>] ? default_idle_call+0x60/0x60
>  [<ffffffff8e18d354>] ? clockevents_config_and_register+0x64/0x70
>  [<ffffffff8e055489>] start_secondary+0x269/0x300
>  [<ffffffff8e055220>] ? set_cpu_sibling_map+0x970/0x970
> ------------[ cut here ]------------
> 
>  44 DEFINE_EVENT(msr_trace_class, write_msr,
>  45              TP_PROTO(unsigned msr, u64 val, int failed),
>  46              TP_ARGS(msr, val, failed)
>  47 );
> 
> Andi, could this be caused by 7f47d8cc039f8746e0038fe05f1ddcb15a2e27f0 ?

Yes, it's likely. The trace points use RCU, and with X2APIC on there is a 
MSR write with trace point at the end up of interrupts to ack the APIC.

This is what the IPI uses:

static inline void entering_ack_irq(void)
{
        ack_APIC_irq();
        entering_irq();
}

I assume the problem is that RCU thinks we're in idle until
the entering_irq(), but the MSR write with the new trace point
is already in the ack_APIC_irq()

Thomas, can the two call be reordered? I assume it's safe.

-Andi

-- 
ak@...ux.intel.com -- Speaking for myself only

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ