lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141219002956.GA25405@intel.com>
Date:	Fri, 19 Dec 2014 08:29:56 +0800
From:	Yuyang Du <yuyang.du@...el.com>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	LKML <linux-kernel@...r.kernel.org>,
	Dave Jones <davej@...hat.com>,
	Andrey Ryabinin <a.ryabinin@...sung.com>,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: sched: odd values for effective load calculations

On Tue, Dec 16, 2014 at 10:09:48AM +0800, Yuyang Du wrote:
> 
> Sasha, it might be helpful to see this_load is from:
> 
> this_load1: this_load = target_load(this_cpu, idx);
> 
> or
> 
> this_load2: this_load += effective_load(tg, this_cpu, -weight, -weight);
> 
> It really does not seem to be this_load1, while the calc of effective_load is a bit
> complicated to see what the problem is.

Hi all,

I finally managed to reproduce this, but not by trinity, just by keeping rebooting.

Indeed, the problem is from:

this_load2: this_load += effective_load(tg, this_cpu, -weight, -weight);

After digging into effective_load(), the root cause is:

wl = (w * tg->shares) / W;

if we have negative w, then it will be cast to unsigned long, and then may or may not
overflow, and end up an insane number.

I tried this in userspace, interestingly if we have:

wl = w * tg->shares;
wl /= W;

the result is ok, but not ok with the lines combined as the original one.

Anyway, the following patch can fix this.

---
Subject: [PATCH] sched: Fix long and unsigned long multiplication error in
 effective_load

In effective_load, we have (long w * unsigned long tg->shares) / long W,
when w is negative, it is cast to unsigned long and hence the product is
insanely large. Fix this by casting tg->shares to long.

Reported-by: Sasha Levin <sasha.levin@...cle.com>
Signed-off-by: Yuyang Du <yuyang.du@...el.com>
---
 kernel/sched/fair.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index df2cdf7..6b99659 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4424,7 +4424,7 @@ static long effective_load(struct task_group *tg, int cpu, long wl, long wg)
 		 * wl = S * s'_i; see (2)
 		 */
 		if (W > 0 && w < W)
-			wl = (w * tg->shares) / W;
+			wl = (w * (long)tg->shares) / W;
 		else
 			wl = tg->shares;
 
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ