linux-kernel - Re: sched: memory corruption on completing completions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1423175834.6835.27.camel@stgolabs.net>
Date:	Thu, 05 Feb 2015 14:37:14 -0800
From:	Davidlohr Bueso <dave@...olabs.net>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Sasha Levin <sasha.levin@...cle.com>,
	Waiman Long <Waiman.Long@...com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrey Ryabinin <a.ryabinin@...sung.com>,
	Dave Jones <davej@...emonkey.org.uk>,
	LKML <linux-kernel@...r.kernel.org>,
	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
Subject: Re: sched: memory corruption on completing completions

On Thu, 2015-02-05 at 13:34 -0800, Linus Torvalds wrote:
> On Thu, Feb 5, 2015 at 1:02 PM, Sasha Levin <sasha.levin@...cle.com> wrote:
> >
> > Interestingly enough, according to that article this behaviour seems to be
> > "by design":
> 
> Oh, it's definitely by design, it's just that the design looked at
> spinlocks without the admittedly very subtle issue of lifetime vs
> unlocking.
> 
> Spinlocks (and completions) are special - for other locks we have
> basically allowed lifetimes to be separate from the lock state, and if
> you have a data structure with a mutex in it, you'll have to have some
> separate lifetime rule outside of the lock itself. But spinlocks and
> completions have their locking state tied into their lifetime.

For spinlocks I find this very much a virtue. Tight lifetimes allow the
overall locking logic to be *simple* - keeping people from being "smart"
and bloating up spinlocks. Similarly, I hate how the paravirt
alternative blends in with regular (sane) bare metal code. What was
preventing this instead??

#ifdef CONFIG_PARAVIRT_SPINLOCKS
static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
{
	if (!static_key_false(&paravirt_ticketlocks_enabled))
		return;

	add_smp(&lock->tickets.head, TICKET_LOCK_INC);
	/* Do slowpath tail stuff... */
}
#else
static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
{
	__add(&lock->tickets.head, TICKET_LOCK_INC, UNLOCK_LOCK_PREFIX);
}
#endif

I just don't see the point to all this TICKET_SLOWPATH_FLAG:

#ifdef CONFIG_PARAVIRT_SPINLOCKS
#define __TICKET_LOCK_INC	2
#define TICKET_SLOWPATH_FLAG	((__ticket_t)1)
#else
#define __TICKET_LOCK_INC	1
#define TICKET_SLOWPATH_FLAG	((__ticket_t)0)
#endif

when it is only for paravirt -- and the word slowpath implies the
general steps as part of the generic algorithm. Lets keep code for
simple locks simple.

> Completions are so very much by design (because dynamic completions on
> the stack is one of the core use cases), and spinlocks do it because
> in some cases you cannot sanely avoid it (and one of those cases is
> the implementation of completions - they aren't actually first-class
> locking primitives of their own, although they actually *used* to be,
> originally).
> 
> It is possible that the paravirt spinlocks could be saved by:
> 
>  - moving the clearing of TICKET_SLOWPATH_FLAG into the fastpath locking code.

Ouch, to avoid deadlocks they explicitly need the unlock to occur before
the slowpath tail flag is read.

>  - making sure that the "unlock" path never does a *write* to the
> possibly stale lock. KASan would still complain about the read, but we
> could just say that it's a speculative read - bad form, but not
> disastrous.

Yeah, you just cannot have a slowpath without reads or writes :D

Thanks,
Davidlohr

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/