lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Sat, 15 Aug 2009 18:32:14 +0200
From:	Manfred Spraul <manfred@...orfullife.com>
To:	Nick Piggin <npiggin@...e.de>
CC:	Andrew Morton <akpm@...ux-foundation.org>,
	Nadia Derbey <Nadia.Derbey@...l.net>,
	Pierre Peiffer <peifferp@...il.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] [patch 4a/4] ipc: sem optimise simple operations

On 08/15/2009 04:49 PM, Nick Piggin wrote:
>
> I don't see how you've argued that yours is better.
>
>    
Lower number of new code lines,
Lower total code size increase.
Lower number of seperate codepaths.
Lower runtime memory consumption.
Two seperate patches for the two algorithm improvements.

The main advantage of your version is that you optimize more cases.
> If you are worried about memory consumption, we can add _rcu variants
> to hlists and use them.
There is no need for _rcu, the whole code runs under a spinlock.
Thus the wait_for_zero queue could be converted to a hlist immediately.

Hmm: Did you track my proposals for your version?

- exit_sem() is not a hot path.
I would propose to tread every exit_sem as update_queue, not an 
update_queue_simple for every individual UNDO.

- create an unlink_queue() helper that contains the updates to q->lists 
and sma->complex_count.
Three copies ask for errors.

- now: use a hlist for the zero queue.

>   And if you are worried about text size, then
> I would bet my version actually uses less icache in the case of
> simple ops being used.
>    
It depends. After disabling inlining, including all helper functions 
that differ:

My proposal: 301 bytes for update_queue.

"simple", only negv: 226 bytes
"simple, negv+zero: 354 bytes
simple+complex: 526 bytes.

Thus with only +-1 simple ops, your version uses less icache. If both 
+-1 and 0 ops are used, your version uses more icache.

Could you please send me your benchmark app?

--
     Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ