lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 17 Feb 2012 04:49:08 -0800
From:	Paul Turner <pjt@...gle.com>
To:	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Cc:	linux-kernel@...r.kernel.org, Venki Pallipadi <venki@...gle.com>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	Mike Galbraith <efault@....de>,
	Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
	Ben Segall <bsegall@...gle.com>, Ingo Molnar <mingo@...e.hu>,
	Vaidyanathan Srinivasan <svaidy@...ux.vnet.ibm.com>
Subject: Re: [RFC PATCH 13/14] sched: make __update_entity_runnable_avg() fast

On Mon, Feb 6, 2012 at 12:48 PM, Peter Zijlstra <a.p.zijlstra@...llo.nl> wrote:
> On Wed, 2012-02-01 at 17:38 -0800, Paul Turner wrote:
>> +/* Precomputed fixed inverse multiplies for multiplication by y^n */
>> +const static u32 runnable_avg_yN_inv[] = {
>> +       0xffffffff,0xfa83b2db,0xf5257d15,0xefe4b99b,0xeac0c6e7,0xe5b906e7,
>> +       0xe0ccdeec,0xdbfbb797,0xd744fcca,0xd2a81d91,0xce248c15,0xc9b9bd86,
>> +       0xc5672a11,0xc12c4cca,0xbd08a39f,0xb8fbaf47,0xb504f333,0xb123f581,
>> +       0xad583eea,0xa9a15ab4,0xa5fed6a9,0xa2704303,0x9ef53260,0x9b8d39b9,
>> +       0x9837f051,0x94f4efa8,0x91c3d373,0x8ea4398b,0x8b95c1e3,0x88980e80,
>> +       0x85aac367,0x82cd8698,
>> +};
>
> I wonder if modern Intel isn't at the point where computing this thing
> is cheaper than the cacheline miss. You can compute y^n in O(log n) time
> and with n < 32 that's 5 multiplications (see fixed_power_int). Add to
> that the division.
>
> Of course there's platforms, ARM?, where reverse is likely true. Bugger
> that.
>
>> +/* Precomputed \Sum y^k { 1<=k<=n } */
>> +const static u32 runnable_avg_yN_sum[] = {
>> +           0, 1002, 1982, 2941, 3880, 4798, 5697, 6576, 7437, 8279, 9103,
>> +        9909,10698,11470,12226,12966,13690,14398,15091,15769,16433,17082,
>> +       17718,18340,18949,19545,20128,20698,21256,21802,22336,22859,23371,
>> +};
>
> Right, can't see a fast way to compute this..
>
> The asymmetry in the tables annoys me though 32 vs 33 entries,

We could shrink yN sum to 32 entries but then
__compute_runnable_contrib() ends up a lot uglier with a bunch of
conditionals about the n=0 case.  (and if we did that the same
argument could then be made for making runnable_avg_yN_inv 31
entries.. at which point we're asymmetric again.. ahhhhh!! must go
sort my pencils..)

>hex vs dec :-)
>

So I personally think hex versus dec is somewhat reasonable in that:

\Sum y^n converges to a/(1-r) -- about ~47k -- which is a very human
readable number.  It's easy to do the mental math on what a given
entry in that table is going to do to your runnable_avg / runnable_sum
from just looking at it.

Conversely, for the inverse multiplies they're the completely
unintelligible ~0*y^k.  We could write them in decimal, but the table
would just be enormous.

Since the second case would be enormously ugly and we already have
precedent for using hex for inverses wmult; the only unified option is
then to use hex for both.  But then I personally certainly can't
eyeball the numbers for \Sum y^n in the same fashion.  But perhaps
this is something we can lose.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ