lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <4574A508.1090805@cosmosbay.com>
Date:	Mon, 04 Dec 2006 23:45:28 +0100
From:	Eric Dumazet <dada1@...mosbay.com>
To:	David Miller <davem@...emloft.net>
Cc:	akpm@...l.org, clameter@....com, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] SLAB : use a multiply instead of a divide in obj_to_index()

David Miller a écrit :
> From: Eric Dumazet <dada1@...mosbay.com>
> Date: Mon, 04 Dec 2006 22:34:29 +0100
> 
>> On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1 cycle
>> for a multiply.
> 
> For UltraSPARC I and II (which is what this 200mhz guy probably is),
> it's 4 cycle latency for a multiply (32-bit or 64-bit) and 68 cycles
> for a 64-bit divide (32-bit divide is 37 cycles).

I must have Ultra-2 (running Solaris :( )

         for (ui = 0 ; ui < 100000000 ; ui++)
                 val += reciprocal_divide(ui, reciproc);

    100000cb0:   83 31 20 00     srl  %g4, 0, %g1
    100000cb4:   82 48 40 12     mulx  %g1, %l2, %g1
    100000cb8:   83 30 70 20     srlx  %g1, 0x20, %g1
    100000cbc:   88 01 20 01     inc  %g4
    100000cc0:   80 a1 00 05     cmp  %g4, %g5
    100000cc4:   08 4f ff fb     bleu  %icc, 100000cb0
    100000cc8:   b0 06 00 01     add  %i0, %g1, %i0

I confirm that this block uses 20 cycles/iteration,
while next one uses 72 cycles/iteration

         for (ui = 0 ; ui < 100000000 ; ui++)
                 val += ui / value;

    100000ca8:   83 31 20 00     srl  %g4, 0, %g1
    100000cac:   82 68 40 11     udivx  %g1, %l1, %g1
    100000cb0:   88 01 20 01     inc  %g4
    100000cb4:   80 a1 00 05     cmp  %g4, %g5
    100000cb8:   08 4f ff fc     bleu  %icc, 100000ca8
    100000cbc:   b0 06 00 01     add  %i0, %g1, %i0


> 
> UltraSPARC-III and IV are worse, 6 cycles for multiply and 40/71
> cycles (32/64-bit) for integer divides.
> 
> Niagara is even worse :-)  11 cycle integer multiply and a 72 cycle
> integer divide (regardless of 32-bit or 64-bit).
> 
> (more details in gcc/config/sparc/sparc.c:{ultrasparc,ultrasparc3,niagara}_cost).
> 
> So this change has tons of merit for sparc64 chips at least :-)
> 
> Also, the multiply can parallelize with other operations but it
> seems that integer divide stalls the pipe for most of the duration
> of the calculation.  So this makes the divide even worse.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ