linux-kernel - Re: [PATCH] x86 rwsem optimization extreme

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4B7CC14F.7000802@redhat.com>
Date:	Wed, 17 Feb 2010 18:25:51 -1000
From:	Zachary Amsden <zamsden@...hat.com>
To:	"H. Peter Anvin" <hpa@...or.com>
CC:	Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, Thomas Gleixner <tglx@...utronix.de>,
	Ingo Molnar <mingo@...hat.com>, x86@...nel.org,
	Avi Kivity <avi@...hat.com>
Subject: Re: [PATCH] x86 rwsem optimization extreme

>
> On 02/17/2010 05:53 PM, Linus Torvalds wrote:
>    
>>   - but adc _throughput_ is also typically much higher, which indicates
>>     that even if you do flag renaming, the 'adc' quite likely only
>>     schedules in a single ALU unit.
>>
>> For example, on a Pentium, adc/sbb can only go in the U pipe, and I think
>> the same is true of 'stc'. Now, nobody likely cares about Pentiums any
>> more, but the point is, 'adc' does often have constraints that a regular
>> 'add' does not, and there's an example of a 'stc+adc' pair would at the
>> very least have to be scheduled with an instruction in between.
>>      
> No doubt.  I doubt it much matters in this context, but either way I
> think the patch is probably a bad idea... much for the same as my incl
> hack was - since the code isn't actually inline, saving a handful bytes
> is not the right tradeoff.
>
> 	-hpa
>
>    

Incidentally, the cost of putting all the rwsem code inline, using the 
straightforward approach, for git-tip, using defconfig on x86_64 is 3565 
bytes / 20971778 bytes total, or 0.0168%, using gcc 4.4.3.

That's small enough to actually consider it.

Even smaller if you leave trylock as a function... actually no, that 
didn't work, size increased.  I'm guessing many call sites also end up 
calling the explicit form as a fallback.

If you inline only read_lock functions and write release, nope, that 
didn't work either.

If you inline only read_lock functions, that still isn't it.  Many other 
permutations are possible, but I've wasted enough time.

Although, with a more clever inline implementation, if some of the 
constraints to %rdx go away...

Zach
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/