linux-kernel - Re: [PATCH] x86/tsc: improve arithmetic division

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <35AFFA5A-B499-4D64-9E98-42B9A642EB0F@zytor.com>
Date:   Sat, 01 Feb 2020 14:29:35 -0800
From:   hpa@...or.com
To:     Wen Yang <wenyang@...ux.alibaba.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Borislav Petkov <bp@...en8.de>, Ingo Molnar <mingo@...hat.com>
CC:     x86@...nel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] x86/tsc: improve arithmetic division

On January 30, 2020 5:08:38 AM PST, Wen Yang <wenyang@...ux.alibaba.com> wrote:
>do_div() does a 64-by-32 division. Use div64_ul64() or div64_ul()
>instead of it if the divisor is 'ul64' or 'unsigned long', to avoid
>truncation to lower 32-bit.
>And as a nice side effect also cleans up the function a bit.
>
>Signed-off-by: Wen Yang <wenyang@...ux.alibaba.com>
>Cc: Thomas Gleixner <tglx@...utronix.de>
>Cc: Ingo Molnar <mingo@...hat.com>
>Cc: Borislav Petkov <bp@...en8.de>
>Cc: "H. Peter Anvin" <hpa@...or.com>
>Cc: x86@...nel.org
>Cc: linux-kernel@...r.kernel.org
>---
> arch/x86/kernel/tsc.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
>diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
>index 7e322e2daaf5..4c0320e68699 100644
>--- a/arch/x86/kernel/tsc.c
>+++ b/arch/x86/kernel/tsc.c
>@@ -357,9 +357,7 @@ static unsigned long calc_pmtimer_ref(u64 deltatsc,
>u64 pm1, u64 pm2)
> 	pm2 -= pm1;
> 	tmp = pm2 * 1000000000LL;
> 	do_div(tmp, PMTMR_TICKS_PER_SEC);
>-	do_div(deltatsc, tmp);
>-
>-	return (unsigned long) deltatsc;
>+	return (unsigned long) div64_u64(deltatsc, tmp);
> }
> 
> #define CAL_MS		10
>@@ -778,8 +776,7 @@ static unsigned long
>pit_hpet_ptimer_calibrate_cpu(void)
> 		tsc_ref_min = min(tsc_ref_min, (unsigned long) tsc2);
> 
> 		/* Check the reference deviation */
>-		delta = ((u64) tsc_pit_min) * 100;
>-		do_div(delta, tsc_ref_min);
>+		delta = div64_ul(((u64) tsc_pit_min) * 100, tsc_ref_min);
> 
> 		/*
> 		 * If both calibration results are inside a 10% window

This is a *lot* more expensive on 32 bits (something like 10x) and as the output is truncated to unsigned long anyway, it is also unnecessary.

We don't use the remainder, so using do_div() is not merely unnecessary but almost certainly generates worse code: we are multiplying and then dividing by a constant, and most of the time gcc can optimize that into a single multiply/shift operation; otherwise we can do that optimization for it (see timeconst.bc.)

The one thing that gcc can't necessary do automatically is to know when a 64/32 → 32 division is safe; C semantics are truncation, but the CPU will trap. If it can turn it into a multiply then that problem obviously goes away.

So first I would test with regular / operators and see what code comes out.

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.