linux-kernel - Re: [PATCH] Make shr to divide by power of 2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Date:	Sun, 9 Aug 2009 12:40:48 +0300
From:	Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To:	Andi Kleen <andi@...stfloor.org>
Cc:	Robert Hancock <hancockrwd@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Make shr to divide by power of 2

On (08/08/09 10:22), Robert Hancock wrote:
> Actually, the Intel Architecture Optimization Reference Manual doesn't
> say divide may be faster, but it does say that "On processors based on
> Intel NetBurst microarchitecture, latencies of some instructions are
> relatively significant (including shifts, rotates, integer multiplies,
> and moves from memory with sign extension)." and that "The SHIFT and
> ROTATE instructions have a longer latency on processor with a CPUID
> signature corresponding to family 15 and model encoding of 0, 1, or 2.
> The latency of a sequence of adds will be shorter for left shifts of three or less."

Intel Architecture Optimization Reference Manual does say about latency:

Table C-13a. General Purpose Instructions
Instruction		Latency				Throughput
IDIV 		| 11-21	13-23	17-41	22	| 5-13	5-14	12-36	22
SAL/SAR/SHL/SHR	| 1	1	1		| 0.33	0.33	0.33

For example,
Table 12-2. Intel® Atom™ Microarchitecture Instructions Latency Data
Instruction		Latency		Throughput
IDIV r/m8; IDIV r/m16;	| 33;42;	| 32;41;56;196
IDIV r/m32; IDIV r/m64;	| 57;197	|
			|		|
ROL; ROR; SAL; 		| 1		| 1
SAR; SHL; SHR		|		|

*SHLD/SHRD		|4;2-11	|3;1-10



On (08/08/09 09:35), Andi Kleen wrote:
> DIV should be always slower than a SHIFT.
>
> But it has nothing really to do with the CPU. The point is that the compiler
> always selects a suitable one by itself. Rewriting x / 2 to x >> 1 is
> one of the easiest exercises in compiler optimizations.
>
> The only case when the compiler cannot do this easily by itself is
> when the dividend is not a constant.
>

        int width = (vc->vc_font.width + 7) >> 3;

> That said -Os sometimes screws us up on this, but it's still not worth
> doing this change manually.
>

My point is that it should 'look the same'.
I mean there are 5
        int width = (vc->vc_font.width + 7) >> 3;
*not exactly this one, but vc->vc_font.width (+ 7)? >> 3

and _only_ one
        int width = (vc->vc_font.width + 7) / 8;

P.S.
Sorry, hit "reply", not "reply to all".

        Sergey

Download attachment "signature.asc" of type "application/pgp-signature" (316 bytes)