lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Fri, 18 May 2007 09:35:17 +0200
From:	Jarek Poplawski <jarkao2@...pl>
To:	Oleg Nesterov <oleg@...sign.ru>
Cc:	Tejun Heo <htejun@...il.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Chinner <dgc@....com>,
	David Howells <dhowells@...hat.com>,
	Gautham Shenoy <ego@...ibm.com>, Ingo Molnar <mingo@...e.hu>,
	Srivatsa Vaddagiri <vatsa@...ibm.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] make cancel_rearming_delayed_work() reliable

On Wed, May 16, 2007 at 10:52:03PM +0400, Oleg Nesterov wrote:
...
> Ah, but this is something different. Both lock/unlock are full barriers,
> but they protect only one direction. A memory op must not leak out of the
> critical section, but it may leak in.
> 
> 	A = B;		// 1
> 	lock();		// 2
> 	C = D;		// 3
> 
> this can be re-ordered to
> 
> 	lock();		// 2
> 	C = D;		// 3
> 	A = B;		// 1
> 
> but 2 and 3 must not be re-ordered.
> 
> To be sure, I contacted Paul E. McKenney privately, and his reply is
> 
> 	> No.  See for example IA64 in file include/asm-ia64/spinlock.h,
> 	> line 34 for spin_lock() and line 92 for spin_unlock().  The
> 	> spin_lock() case uses a ,acq completer, which will allow preceding
> 	> reads to be reordered into the critical section.  The spin_unlock()
> 	> uses the ,rel completer, which will allow subsequent writes to be
> 	> reordered into the critical section.  The locking primitives are
> 	> guaranteed to keep accesses bound within the critical section, but
> 	> are free to let outside accesses be reordered into the critical
> 	> section.
> 	>
> 	> Download the Itanium Volume 2 manual:
> 	>
> 	>         http://developer.intel.com/design/itanium/manuals/245318.htm
> 	>
> 	> Table 2.3 on page 2:489 (physical page 509) shows an example of how
> 	> the rel and acq completers work.

Hi,

I managed to find some time to look at this all once more,
and here is my current opinion:

1. I think this above mentioned lock vs. barrier behavior is
in compliance with memory-barriers.txt, and BTW I'd like to
thank and congratulate David Howells of very good style
(and beautiful pictures).

2. IMHO the current solution with smp barriers is very good:
these barriers are really needed and they should be enough.
Oleg repeats all the time he hates barriers, but I think
it's wrong approach - they should be seen as something
natural for programming modern processors and trying to
change the code only to avoid them is wrong (unless it
could be done with some minor changes).

3. The alternative solution without barriers, based on the
idea of Tejun Heo and presented in the patch proposal from
2007-05-13, could be probably a little faster (for some
processors), after some changes:

- I think, instead of separate __set_bit and __clear_bit,
WORK_STRUCT_QUEUED bit should be changed during set_wq_data
(additional parameter or new variant eg: set_wq_data_queued),
and similarly something like  work_clear_pending_queued -
so, without additional operations instead of these barriers.

- changing this if condition in try_to_grab_pending.

But there is a question - is this worth of one (last) bit
occupied only for optimisation (but not always)?

So, maybe it would be better to think about solving those
other problems in workqueue, mentioned by Oleg somewhere?

Regards,
Jarek P.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists