linux-kernel - Re: sched: memory corruption on completing completions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CA+55aFzBPrvOUcAhmxsYRCfyq2wF1=RBJ_qSxeG78pq90Hp7=Q@mail.gmail.com>
Date:	Thu, 5 Feb 2015 13:34:40 -0800
From:	Linus Torvalds <torvalds@...ux-foundation.org>
To:	Sasha Levin <sasha.levin@...cle.com>
Cc:	Davidlohr Bueso <dave@...olabs.net>,
	Waiman Long <Waiman.Long@...com>,
	Peter Zijlstra <peterz@...radead.org>,
	Ingo Molnar <mingo@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Andrey Ryabinin <a.ryabinin@...sung.com>,
	Dave Jones <davej@...emonkey.org.uk>,
	LKML <linux-kernel@...r.kernel.org>,
	Raghavendra K T <raghavendra.kt@...ux.vnet.ibm.com>
Subject: Re: sched: memory corruption on completing completions

On Thu, Feb 5, 2015 at 1:02 PM, Sasha Levin <sasha.levin@...cle.com> wrote:
>
> Interestingly enough, according to that article this behaviour seems to be
> "by design":

Oh, it's definitely by design, it's just that the design looked at
spinlocks without the admittedly very subtle issue of lifetime vs
unlocking.

Spinlocks (and completions) are special - for other locks we have
basically allowed lifetimes to be separate from the lock state, and if
you have a data structure with a mutex in it, you'll have to have some
separate lifetime rule outside of the lock itself. But spinlocks and
completions have their locking state tied into their lifetime.
Completions are so very much by design (because dynamic completions on
the stack is one of the core use cases), and spinlocks do it because
in some cases you cannot sanely avoid it (and one of those cases is
the implementation of completions - they aren't actually first-class
locking primitives of their own, although they actually *used* to be,
originally).

It is possible that the paravirt spinlocks could be saved by:

 - moving the clearing of TICKET_SLOWPATH_FLAG into the fastpath locking code.

 - making sure that the "unlock" path never does a *write* to the
possibly stale lock. KASan would still complain about the read, but we
could just say that it's a speculative read - bad form, but not
disastrous.

but right now they are clearly horribly broken. Because the unlock
path doesn't just read the value, it can end up writing to it too.

Looking at the Xen version, for example, the unlock_kick part looks
fine. It just compares the pointer to the lock against its data
structures, and doesn't touch the lock itself. Which is fine.

But right now, the real problem is that "__ticket_unlock_slowpath()"
will clear the TICKET_SLOWPATH_FLAG on a lock that has already been
unlocked - which means that it may actually *change* a byte that has
already been freed. It might not be a spinlock any more, it might have
been reused for something else, and might be anything else, and
clearing TICKET_SLOWPATH_FLAG might be clearing a random bit in a
kernel pointer or other value..

It is *very* hard to hit, though. And it obviously doesn't happen in
raw hardware, but paravirt people need to think hard about this.

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/