linux-kernel - Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless update of refcount

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <51CF4BBF.30608@hp.com>
Date:	Sat, 29 Jun 2013 17:03:59 -0400
From:	Waiman Long <waiman.long@...com>
To:	Thomas Gleixner <tglx@...utronix.de>
CC:	Alexander Viro <viro@...iv.linux.org.uk>,
	Jeff Layton <jlayton@...hat.com>,
	Miklos Szeredi <mszeredi@...e.cz>,
	Ingo Molnar <mingo@...hat.com>, linux-fsdevel@...r.kernel.org,
	LKML <linux-kernel@...r.kernel.org>,
	Linus Torvalds <torvalds@...ux-foundation.org>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Andi Kleen <andi@...stfloor.org>,
	"Chandramouleeswaran, Aswin" <aswin@...com>,
	"Norton, Scott J" <scott.norton@...com>,
	Peter Zijlstra <peterz@...radead.org>,
	Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v2 1/2] spinlock: New spinlock_refcount.h for lockless
 update of refcount

On 06/27/2013 10:44 AM, Thomas Gleixner wrote:
> On Wed, 26 Jun 2013, Waiman Long wrote:
>> On 06/26/2013 07:06 PM, Thomas Gleixner wrote:
>>> On Wed, 26 Jun 2013, Waiman Long wrote:
>>> This is a complete design disaster. You are violating every single
>>> rule of proper layering. The differentiation of spinlock, raw_spinlock
>>> and arch_spinlock is there for a reason and you just take the current
>>> implementation and map your desired function to it. If we take this,
>>> then we fundamentally ruled out a change to the mappings of spinlock,
>>> raw_spinlock and arch_spinlock. This layering is on purpose and it
>>> took a lot of effort to make it as clean as it is now. Your proposed
>>> change undoes that and completely wreckages preempt-rt.
>> Would you mind enlighten me how this change will break preempt-rt?
> The whole spinlock layering was done for RT. In mainline spinlock is
> mapped to raw_spinlock. On RT spinlock is mapped to a PI aware
> rtmutex.
>
> So your mixing of the various layers and the assumption of lock plus
> count being adjacent, does not work on RT at all.

Thank for the explanation. I had downloaded and looked at the RT patch. 
My code won't work for the full RT kernel. I guess that is no way to 
work around this and only logical choice is to disable it for the full 
RT kernel.

>> The only architecture that will break, according to data in the
>> respectively arch/*/include/asm/spinlock_types.h files is PA-RISC
>> 1.x (it is OK in PA-RISC 2.0) whose arch_spinlock type has a size of
>> 16 bytes. I am not sure if 32-bit PA-RISC 1.x is still supported or
>> not, but we can always disable the optimization for this case.
> You have to do that right from the beginning with a proper data
> structure and proper accessor functions. Introducing wreckage and then
> retroactivly fixing it, is not a really good idea.

For architecture that needs a larger than 32-bit arch_spin_lock type, 
the optimization code will be disabled just like the non-SMP and full RT 
cases.

>>> So what you really want is a new data structure, e.g. something like
>>> struct spinlock_refcount() and a proper set of new accessor functions
>>> instead of an adhoc interface which is designed solely for the needs
>>> of dcache improvements.
>> I had thought about that. The only downside is we need more code changes to
>> make existing code to use the new infrastructure. One of the drivers in my
> That's not a downside. It makes the usage of the infrastructure more
> obvious and not hidden behind macro magic. And the changes are trivial
> to script.

Making the code from d_lock to lockref.lock, for example, is trivial. 
However, there are about 40 different files that need to be changed with 
different maintainers. Do I need to get an ACK from all of them or can I 
just go ahead with such trivial changes without their ACK? You know it 
can be hard to get responses from so many maintainers in a timely 
manner. This is actually my main concern.

>> design is to minimize change to existing code. Personally, I have no
>> objection of doing that if others think this is the right way to go.
> Definitely. Something like this would be ok:
>
> struct lock_refcnt {
>         int                 refcnt;
>         struct spinlock     lock;
> };
>
> It does not require a gazillion of ifdefs and works for
> UP,SMP,DEBUG....
>
> extern bool lock_refcnt_mod(struct lock_refcnt *lr, int mod, int cond);
>
> You also want something like:
>
> extern void lock_refcnt_disable(void);
>
> So we can runtime disable it e.g. when lock elision is detected and
> active. So you can force lock_refcnt_mod() to return false.
>
> static inline bool lock_refcnt_inc(struct lock_refcnt *sr)
> {
> #ifdef CONFIG_HAVE_LOCK_REFCNT
> 	return lock_refcnt_mod(sr, 1, INTMAX);
> #else
> 	return false;
> #endif
> }
>
> That does not break any code as long as CONFIG_HAVE_SPINLOCK_REFCNT=n.
>
> So we can enable it per architecture and make it depend on SMP. For RT
> we simply can force this switch to n.
>
> The other question is the semantic of these refcount functions. From
> your description the usage pattern is:
>
>       if (refcnt_xxx())
> 	   return;
>       /* Slow path */
>       spin_lock();
>       ...
>       spin_unlock();
>
> So it might be sensible to make this explicit:
>
> static inline bool refcnt_inc_or_lock(struct lock_refcnt *lr))
> {
> #ifdef CONFIG_HAVE_SPINLOCK_REFCNT
> 	if (lock_refcnt_mod(lr, 1, INTMAX))
> 	   return true;
> #endif
> 	spin_lock(&lr->lock);
> 	return false;
> }

Yes, it is a good idea to have a config variable to enable/disable it as 
long as the default is "y". Of course, an full RT kernel or an non-SMP 
kernel will have it disabled.

Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/