linux-kernel - Re: [PROBLEM] possible divide by 0 in kernel/sched/cputime.c scale

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20131125014302.6a4394ce@spike>
Date:	Mon, 25 Nov 2013 01:43:02 +0100
From:	Christian Engelmayer <cengelma@....at>
To:	Stanislaw Gruszka <sgruszka@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>
Cc:	Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
	Thomas Gleixner <tglx@...utronix.de>, fweisbec@...il.com,
	Paul Turner <pjt@...gle.com>
Subject: Re: [PROBLEM] possible divide by 0 in kernel/sched/cputime.c
 scale_stime()

On Mon, 18 Nov 2013 18:27:06 +0100, Peter Zijlstra <peterz@...radead.org> wrote:
> That is not actually correct in the case time wraps.
> 
> There's a further problem with this code though -- ever since Frederic
> added NO_HZ_FULL a CPU can in fact aggregate a runtime delta larger than
> 4 seconds, due to running without a tick.
> 
> Therefore we need to be able to deal with u64 deltas.
> 
> The below is a compile tested only attempt to deal with both these
> problems. Comments?

I had this patch applied during daily use. No systematic testing, but no user
perceived regressions either. The originally reported divide by 0 scenario
could no longer be reproduced with this change.

> +/* 
> + * delta_exec * weight / lw.weight
> + *   OR
> + * (delta_exec * (weight * lw->inv_weight)) >> WMULT_SHIFT
> + *
> + * Either weight := NICE_0_LOAD and lw \e prio_to_wmult[], in which case
> + * we're guaranteed shift stays positive because inv_weight is guaranteed to
> + * fit 32 bits, and NICE_0_LOAD gives another 10 bits; therefore shift >= 22.
> + *
> + * Or, weight =< lw.weight (because lw.weight is the runqueue weight), thus
> + * XXX mind got twisted, but I'm fairly sure shift will stay positive.
> + *
> + */
> +static u64 __calc_delta(u64 delta_exec, unsigned long weight, struct load_weight *lw)

The patch itself seems comprehensible to me, although I have to admit that I
would have to read into the code more deeply in order to understand why the
changed __calc_delta() will always prove correct.

On Mon, 18 Nov 2013 15:19:56 +0100, Peter Zijlstra <peterz@...radead.org> wrote:
> I'm not sure what tool you used to generate that, but its broken, that's
> model 0x25 (37), it somehow truncates the upper model bits.

Correct, that was the fairly outdated cpuid (http://www.ka9q.net/code/cpuid)
currently shipped with Ubuntu 13.10. Debian already switched to packaging a
maintained version (http://www.etallen.com/cpuid.html).

> That said, its a westmere core and I've seen wsm-ep (dual socket)
> machines loose their TSC sync quite regularly, but this would be the
> first case a single socket wsm would loose its TSC sync.
>
> That leads me to believe your BIOS is screwing you over with SMIs or the
> like.

Having rechecked the running microcode as hinted by Henrique de Moraes Holschuh
off-list and running the Intel BIOS Implementation Test Suite (http://biosbits.org)
that seems to be an educated guess.

Regards,
Christian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/