linux-kernel - Re: sched: odd values for effective load calculations

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <548FBA62.5090603@oracle.com>
Date:	Mon, 15 Dec 2014 23:51:46 -0500
From:	Sasha Levin <sasha.levin@...cle.com>
To:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>
CC:	LKML <linux-kernel@...r.kernel.org>, Dave Jones <davej@...hat.com>,
	Andrey Ryabinin <a.ryabinin@...sung.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: sched: odd values for effective load calculations

On 12/15/2014 07:12 AM, Peter Zijlstra wrote:
> 
> Sorry for the long delay, I was out for a few weeks due to having become
> a dad for the second time.

Congrats! May you be able to sleep at night sooner rather than later.

> On Sat, Dec 13, 2014 at 09:30:12AM +0100, Ingo Molnar wrote:
>> * Sasha Levin <levinsasha928@...il.com> wrote:
>>
>>> Hi all,
>>>
>>> I was fuzzing with trinity inside a KVM tools guest, running the latest -next
>>> kernel along with the undefined behaviour sanitizer patch, and hit the following:
>>>
>>> [  787.894288] ================================================================================
>>> [  787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
>>> [  787.898981] signed integer overflow:
>>> [  787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
> 
> So that's:
> 
> 	this_eff_load *= this_load +
> 		effective_load(tg, this_cpu, weight, weight);
> 
> Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
> makes that. Which makes the rhs 'large'. Do you have
> CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
> are you using?

CONFIG_FAIR_GROUP_SCHED is enabled. There's no cgroup set-up initially,
but I figure that trinity is able to do crazy things here.

> In any case, bit sad this doesn't have a register dump included :/
> 
> Is this easy to reproduce or something that happened once?

It's fairy reproducible, I've seen it happen quite a few times. What other
information might be useful?

>>> The values for effective load seem a bit off (and are overflowing!).
>>
>> It definitely looks like a bug in SMP load balancing!
> 
> Yeah, although theoretically (and somewhat practical) this can be
> triggered in more places if you manage to run up the 'weight' with
> enough tasks.
> 
> That said, it should at worst result in 'funny' balancing behaviour, not
> anything else.

I'm not sure if you've caught up on the RCU stall issue we've been trying
to track down (https://lkml.org/lkml/2014/11/14/656), but could this "funny"
balancing behaviour be "funny" enough to cause a stall?


Thanks,
Sasha

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/