[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20090809094048.GA3100@localdomain.by>
Date: Sun, 9 Aug 2009 12:40:48 +0300
From: Sergey Senozhatsky <sergey.senozhatsky@...il.com>
To: Andi Kleen <andi@...stfloor.org>
Cc: Robert Hancock <hancockrwd@...il.com>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [PATCH] Make shr to divide by power of 2
On (08/08/09 10:22), Robert Hancock wrote:
> Actually, the Intel Architecture Optimization Reference Manual doesn't
> say divide may be faster, but it does say that "On processors based on
> Intel NetBurst microarchitecture, latencies of some instructions are
> relatively significant (including shifts, rotates, integer multiplies,
> and moves from memory with sign extension)." and that "The SHIFT and
> ROTATE instructions have a longer latency on processor with a CPUID
> signature corresponding to family 15 and model encoding of 0, 1, or 2.
> The latency of a sequence of adds will be shorter for left shifts of three or less."
Intel Architecture Optimization Reference Manual does say about latency:
Table C-13a. General Purpose Instructions
Instruction Latency Throughput
IDIV | 11-21 13-23 17-41 22 | 5-13 5-14 12-36 22
SAL/SAR/SHL/SHR | 1 1 1 | 0.33 0.33 0.33
For example,
Table 12-2. Intel® Atom™ Microarchitecture Instructions Latency Data
Instruction Latency Throughput
IDIV r/m8; IDIV r/m16; | 33;42; | 32;41;56;196
IDIV r/m32; IDIV r/m64; | 57;197 |
| |
ROL; ROR; SAL; | 1 | 1
SAR; SHL; SHR | |
*SHLD/SHRD |4;2-11 |3;1-10
On (08/08/09 09:35), Andi Kleen wrote:
> DIV should be always slower than a SHIFT.
>
> But it has nothing really to do with the CPU. The point is that the compiler
> always selects a suitable one by itself. Rewriting x / 2 to x >> 1 is
> one of the easiest exercises in compiler optimizations.
>
> The only case when the compiler cannot do this easily by itself is
> when the dividend is not a constant.
>
int width = (vc->vc_font.width + 7) >> 3;
> That said -Os sometimes screws us up on this, but it's still not worth
> doing this change manually.
>
My point is that it should 'look the same'.
I mean there are 5
int width = (vc->vc_font.width + 7) >> 3;
*not exactly this one, but vc->vc_font.width (+ 7)? >> 3
and _only_ one
int width = (vc->vc_font.width + 7) / 8;
P.S.
Sorry, hit "reply", not "reply to all".
Sergey
Download attachment "signature.asc" of type "application/pgp-signature" (316 bytes)
Powered by blists - more mailing lists