linux-kernel - Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to 'xchg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20160315123714.GA12289@gmail.com>
Date:	Tue, 15 Mar 2016 13:37:14 +0100
From:	Ingo Molnar <mingo@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Frédéric Weisbecker <fweisbec@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>,
	Andrew Morton <akpm@...ux-foundation.org>
Subject: Re: [PATCH] atomic: Fix bugs in 'fetch_or()' and rename it to
 'xchg_or()'


* Ingo Molnar <mingo@...nel.org> wrote:

> 
> * Ingo Molnar <mingo@...nel.org> wrote:
> 
> > But IMHO this really highlights a fundamental weakness of all this macro magic, 
> > it's all way too fragile.
> > 
> > Why don't we introduce a boring family of APIs:
> > 
> > 	cmpxchg_8()
> > 	cmpxchg_16()
> > 	cmpxchg_32()
> > 	cmpxchg_64()
> > 
> > 	xchg_or_32()
> > 	xchg_or_64()
> > 	...
> > 
> > ... with none of this pesky auto-typing property and none of the 
> > macro-inside-a-macro crap? We could do clean types and would write them all in 
> > proper C, not fragile CPP.
> > 
> > It's not like we migrate between the types all that frequently - and even if we 
> > do, it's trivial.
> > 
> > hm?
> 
> So if we are still on the same page at this point, we'd have to add a pointer 
> variant too I suspect:
> 
> 	cmpxchg_ptr()
> 	xchg_ptr()
> 
> ... whose bitness may differ between architectures(subarches), but it would still 
> be a single variant per architecture, i.e. still with pretty clear type 
> propagation and with a very clear notion of which architecture supports what.
> 
> It looks like a lot of work, but it's all low complexity work AFAICS that could be 
> partly automated.

Btw., if we do all this, we could still add auto-type API variants, but now they 
would be implemented at the highest level, with none of the auto-type complexity 
pushed down to the architecture level. Architectures just provide their set of 
APIs for a given list of types, and that's it.

I hate to see all the auto-typing complexity pushed down to the arch assembly 
level:

/*
 * Atomic compare and exchange.  Compare OLD with MEM, if identical,
 * store NEW in MEM.  Return the initial value in MEM.  Success is
 * indicated by comparing RETURN with OLD.
 */
#define __raw_cmpxchg(ptr, old, new, size, lock)			\
({									\
	__typeof__(*(ptr)) __ret;					\
	__typeof__(*(ptr)) __old = (old);				\
	__typeof__(*(ptr)) __new = (new);				\
	switch (size) {							\
	case __X86_CASE_B:						\
	{								\
		volatile u8 *__ptr = (volatile u8 *)(ptr);		\
		asm volatile(lock "cmpxchgb %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "q" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_W:						\
	{								\
		volatile u16 *__ptr = (volatile u16 *)(ptr);		\
		asm volatile(lock "cmpxchgw %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_L:						\
	{								\
		volatile u32 *__ptr = (volatile u32 *)(ptr);		\
		asm volatile(lock "cmpxchgl %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	case __X86_CASE_Q:						\
	{								\
		volatile u64 *__ptr = (volatile u64 *)(ptr);		\
		asm volatile(lock "cmpxchgq %2,%1"			\
			     : "=a" (__ret), "+m" (*__ptr)		\
			     : "r" (__new), "0" (__old)			\
			     : "memory");				\
		break;							\
	}								\
	default:							\
		__cmpxchg_wrong_size();					\
	}								\
	__ret;								\
})

it makes things harder to read, harder to debug and harder to optimize.

Thanks,

	Ingo