[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <548FBA62.5090603@oracle.com>
Date: Mon, 15 Dec 2014 23:51:46 -0500
From: Sasha Levin <sasha.levin@...cle.com>
To: Peter Zijlstra <peterz@...radead.org>,
Ingo Molnar <mingo@...nel.org>
CC: LKML <linux-kernel@...r.kernel.org>, Dave Jones <davej@...hat.com>,
Andrey Ryabinin <a.ryabinin@...sung.com>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: sched: odd values for effective load calculations
On 12/15/2014 07:12 AM, Peter Zijlstra wrote:
>
> Sorry for the long delay, I was out for a few weeks due to having become
> a dad for the second time.
Congrats! May you be able to sleep at night sooner rather than later.
> On Sat, Dec 13, 2014 at 09:30:12AM +0100, Ingo Molnar wrote:
>> * Sasha Levin <levinsasha928@...il.com> wrote:
>>
>>> Hi all,
>>>
>>> I was fuzzing with trinity inside a KVM tools guest, running the latest -next
>>> kernel along with the undefined behaviour sanitizer patch, and hit the following:
>>>
>>> [ 787.894288] ================================================================================
>>> [ 787.897074] UBSan: Undefined behaviour in kernel/sched/fair.c:4541:17
>>> [ 787.898981] signed integer overflow:
>>> [ 787.900066] 361516561629678 * 101500 cannot be represented in type 'long long int'
>
> So that's:
>
> this_eff_load *= this_load +
> effective_load(tg, this_cpu, weight, weight);
>
> Going by the numbers the 101500 must be 'this_eff_load', 100 * ~1024
> makes that. Which makes the rhs 'large'. Do you have
> CONFIG_FAIR_GROUP_SCHED enabled? If so, what kind of cgroup hierarchy
> are you using?
CONFIG_FAIR_GROUP_SCHED is enabled. There's no cgroup set-up initially,
but I figure that trinity is able to do crazy things here.
> In any case, bit sad this doesn't have a register dump included :/
>
> Is this easy to reproduce or something that happened once?
It's fairy reproducible, I've seen it happen quite a few times. What other
information might be useful?
>>> The values for effective load seem a bit off (and are overflowing!).
>>
>> It definitely looks like a bug in SMP load balancing!
>
> Yeah, although theoretically (and somewhat practical) this can be
> triggered in more places if you manage to run up the 'weight' with
> enough tasks.
>
> That said, it should at worst result in 'funny' balancing behaviour, not
> anything else.
I'm not sure if you've caught up on the RCU stall issue we've been trying
to track down (https://lkml.org/lkml/2014/11/14/656), but could this "funny"
balancing behaviour be "funny" enough to cause a stall?
Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists