lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 1 Apr 2015 14:39:13 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Chris J Arges <chris.j.arges@...onical.com>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Rafael David Tinoco <inaddy@...ntu.com>,
	Peter Anvin <hpa@...or.com>,
	Jiang Liu <jiang.liu@...ux.intel.com>,
	Peter Zijlstra <peterz@...radead.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Jens Axboe <axboe@...nel.dk>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Gema Gomez <gema.gomez-solano@...onical.com>,
	the arch/x86 maintainers <x86@...nel.org>
Subject: Re: [debug PATCHes] Re: smp_call_function_single lockups


* Chris J Arges <chris.j.arges@...onical.com> wrote:

> This was only tested only on the L1, so I can put this on the L0 host and run
> this as well. The results:
> 
> [  124.897002] apic: vector c1, new-domain move in progress                    
> [  124.954827] apic: vector d1, sent cleanup vector, move completed            
> [  163.477270] apic: vector d1, new-domain move in progress                    
> [  164.041938] apic: vector e1, sent cleanup vector, move completed            
> [  213.466971] apic: vector e1, new-domain move in progress                    
> [  213.775639] apic: vector 22, sent cleanup vector, move completed            
> [  365.996747] apic: vector 22, new-domain move in progress                    
> [  366.011136] apic: vector 42, sent cleanup vector, move completed            
> [  393.836032] apic: vector 42, new-domain move in progress                    
> [  393.837727] apic: vector 52, sent cleanup vector, move completed            
> [  454.977514] apic: vector 52, new-domain move in progress                    
> [  454.978880] apic: vector 62, sent cleanup vector, move completed            
> [  467.055730] apic: vector 62, new-domain move in progress                    
> [  467.058129] apic: vector 72, sent cleanup vector, move completed            
> [  545.280125] apic: vector 72, new-domain move in progress                    
> [  545.282801] apic: vector 82, sent cleanup vector, move completed            
> [  567.631652] apic: vector 82, new-domain move in progress                    
> [  567.632207] apic: vector 92, sent cleanup vector, move completed            
> [  628.940638] apic: vector 92, new-domain move in progress                    
> [  628.965274] apic: vector a2, sent cleanup vector, move completed            
> [  635.187433] apic: vector a2, new-domain move in progress                    
> [  635.191643] apic: vector b2, sent cleanup vector, move completed            
> [  673.548020] apic: vector b2, new-domain move in progress                    
> [  673.553843] apic: vector c2, sent cleanup vector, move completed            
> [  688.221906] apic: vector c2, new-domain move in progress                    
> [  688.229487] apic: vector d2, sent cleanup vector, move completed            
> [  723.818916] apic: vector d2, new-domain move in progress                    
> [  723.828970] apic: vector e2, sent cleanup vector, move completed            
> [  733.485435] apic: vector e2, new-domain move in progress                    
> [  733.615007] apic: vector 23, sent cleanup vector, move completed            
> [  824.092036] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [ksmd:26] 

Are these all the messages? Looks like Linus's warnings went away, or 
did you filter them out?

But ... the affinity setting message does not appear to trigger, and 
that's the only real race I can see in the code. Also, the frequency 
of these messages appears to be low, while the race window is narrow. 
So I'm not sure the problem is related to the irq-move mechanism.

One thing that appears to be weird: why is there irq-movement activity 
to begin with? Is something changing irq-affinities?

Could you put a dump_stack() into the call? Something like the patch 
below, in addition to all patches so far. (if it conflicts with the 
previous debugging patches then just add the code manually to after 
the debug printout.)

Thanks,

	Ingo

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 6cedd7914581..79d6de6fdf0a 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -144,6 +144,8 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
 			cfg->move_in_progress =
 			   cpumask_intersects(cfg->old_domain, cpu_online_mask);
 			cpumask_and(cfg->domain, cfg->domain, tmp_mask);
+			if (cfg->move_in_progress)
+				dump_stack();
 			break;
 		}
 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ