lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <54A1638A.1050800@oracle.com>
Date:	Mon, 29 Dec 2014 09:22:02 -0500
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Davidlohr Bueso <dave@...olabs.net>
CC:	Li Bin <huawei.libin@...wei.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>, rui.xiang@...wei.com,
	wengmeiling.weng@...wei.com
Subject: Re: sched: spinlock recursion in sched_rr_get_interval

On 12/28/2014 03:17 PM, Davidlohr Bueso wrote:
> On Sat, 2014-12-27 at 10:52 -0500, Sasha Levin wrote:
>> > There's a chance that lock->owner would change, but how would you explain
>> > it changing to 'current'?
> So yeah, the above only deals with the weird printk values, not the
> actual issue that triggers the BUG_ON. Lets sort this out first and at
> least get correct data.

Is there an issue with weird printk values? I haven't seen a report of
something like that, nor have seen it myself.

>> > That is, what race condition specifically creates the
>> > 'lock->owner == current' situation in the debug check?
> Why do you suspect a race as opposed to a legitimate recursion issue?
> Although after staring at the code for a while, I cannot see foul play
> in sched_rr_get_interval.
> 
> Given that all reports show bogus contending CPU and .owner_cpu, I do
> wonder if this is actually a symptom of the BUG_ON where something fishy
> is going on.. although I have no evidence to support that. I also ran
> into this https://lkml.org/lkml/2014/11/7/762 which shows the same bogus
> values yet a totally different stack.
> 
> Sasha, I ran trinity with CONFIG_DEBUG_SPINLOCK=y all night without
> triggering anything. How are you hitting this?

I don't have any reliable way of reproducing it. The only two things I
can think of are:

 - Try running as root in a disposable vm
 - Try running with really high load (I use ~800 children on 16 vcpu
guests).


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ