linux-kernel - Re: Divide-by-zero in post_init_entity_util

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160609102946.GO30909@twins.programming.kicks-ass.net>
Date:	Thu, 9 Jun 2016 12:29:46 +0200
From:	Peter Zijlstra <peterz@...radead.org>
To:	Chris Wilson <chris@...is-wilson.co.uk>
Cc:	Andrey Ryabinin <aryabinin@...tuozzo.com>,
	Yuyang Du <yuyang.du@...el.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Mike Galbraith <efault@....de>,
	Thomas Gleixner <tglx@...utronix.de>, bsegall@...gle.com,
	morten.rasmussen@....com, pjt@...gle.com, steve.muckle@...aro.org,
	linux-kernel@...r.kernel.org
Subject: Re: Divide-by-zero in post_init_entity_util_avg

On Thu, Jun 09, 2016 at 10:01:42AM +0100, Chris Wilson wrote:
> 
> [15774.966082] divide error: 0000 [#1] SMP
> [15774.966137] Modules linked in: i915 intel_gtt
> [15774.966208] CPU: 1 PID: 15319 Comm: gemscript Not tainted 4.7.0-rc1+ #330
> [15774.966252] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
> [15774.966317] task: ffff880276a55e80 ti: ffff880272c10000 task.ti: ffff880272c10000
> [15774.966377] RIP: 0010:[<ffffffff810b12e3>]  [<ffffffff810b12e3>] post_init_entity_util_avg+0x43/0x80
> [15774.966463] RSP: 0018:ffff880272c13e30  EFLAGS: 00010057
> [15774.966504] RAX: 0000000022f00000 RBX: ffff880276a51f80 RCX: 00000000000000e8
> [15774.966550] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880276a52000
> [15774.966593] RBP: ffff880272c13e30 R08: 0000000000000000 R09: 0000000000000008
> [15774.966635] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880276a52544
> [15774.966678] R13: ffff880276a52000 R14: ffff880276a521d8 R15: 0000000000003bda
> [15774.966721] FS:  00007fe4df95a740(0000) GS:ffff88027fd00000(0000) knlGS:0000000000000000
> [15774.966781] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [15774.966822] CR2: 00007fe4de7f0378 CR3: 0000000275e15000 CR4: 00000000001006e0
> [15774.966860] Stack:
> [15774.966894]  ffff880272c13e68 ffffffff810a69c5 0000000000000206 0000000000003bd7
> [15774.966987]  0000000000000000 0000000000000000 ffff880276a51f80 ffff880272c13ee8
> [15774.967073]  ffffffff8107a364 0000000000003bd7 ffffffffffffffff 0000000000000000
> [15774.967162] Call Trace:
> [15774.967208]  [<ffffffff810a69c5>] wake_up_new_task+0x95/0x130
> [15774.967256]  [<ffffffff8107a364>] _do_fork+0x184/0x400
> [15774.967302]  [<ffffffff8107a6b7>] SyS_clone+0x37/0x50
> [15774.967353]  [<ffffffff8100197d>] do_syscall_64+0x5d/0xe0
> [15774.967402]  [<ffffffff815719bc>] entry_SYSCALL64_slow_path+0x25/0x25
> [15774.967444] Code: 89 d1 48 c1 e9 3f 48 01 d1 48 d1 f9 48 85 c9 7e 38 48 85 c0 74 35 48 0f af 07 31 d2 48 89 87 e0 00 00 00 48 8b 76 70 48 83 c6 01 <48> f7 f6 48 39 c8 77 18 48 89 c1 48 89 87 e0 00 00 00 69 c9 7e 
> [15774.968086] RIP  [<ffffffff810b12e3>] post_init_entity_util_avg+0x43/0x80
> [15774.968142]  RSP <ffff880272c13e30>
> 
> I've presumed commit 2b8c41daba327 ("sched/fair: Initiate a new task's
> util avg to a bounded value") to be at fault, hence the CCs. Though it
> may just be a victim.
> 
> gdb says 0x43/0x80 is
> 
>    725			if (cfs_rq->avg.util_avg != 0) {
>    726				sa->util_avg  = cfs_rq->avg.util_avg * se->load.weight;
> -> 727				sa->util_avg /= (cfs_rq->avg.load_avg + 1);
>    728	
>    729				if (sa->util_avg > cap)
>    730					sa->util_avg = cap;
>    731			} else {
> 
> I've run the same fork-heavy workload that seemed to hit the initial
> fault under kasan. kasan has not reported any errors, nor has the bug
> reoccurred after a day (earlier I had a couple of panics within a few
> hours). 
> 
> Is it possible for a race window where cfg_rq->avg.load_avg is indeed
> -1? Any evidence of other memcorruption in the above?
> -Chris

Maybe; I need to look harder and at the generated code; but at the very
least serialization isn't right.

We compute/update averages under rq->lock, but post_init_entity_avg()
looks at these values without holding it.

Something like the below should cure that, something similar should
probably be done for the other callsite as well, I'll look harder after
lunch.


diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 385c947482e1..289d99d91883 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2536,9 +2536,9 @@ void wake_up_new_task(struct task_struct *p)
 	set_task_cpu(p, select_task_rq(p, task_cpu(p), SD_BALANCE_FORK, 0));
 #endif
 	/* Post initialize new task's util average when its cfs_rq is set */
-	post_init_entity_util_avg(&p->se);
 
 	rq = __task_rq_lock(p, &rf);
+	post_init_entity_util_avg(&p->se);
 	activate_task(rq, p, 0);
 	p->on_rq = TASK_ON_RQ_QUEUED;
 	trace_sched_wakeup_new(p);