linux-kernel - Re: [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150612080425.GC8759@gmail.com>
Date:	Fri, 12 Jun 2015 10:04:25 +0200
From:	Ingo Molnar <mingo@...nel.org>
To:	Linus Torvalds <torvalds@...ux-foundation.org>
Cc:	Waiman Long <Waiman.Long@...com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	Borislav Petkov <bp@...en8.de>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Andy Lutomirski <luto@...capital.net>,
	linux-mml@...r.kernel.org,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Brian Gerst <brgerst@...il.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd_list from
 the Xen guest code

* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> On Jun 12, 2015 00:23, "Ingo Molnar" <mingo@...nel.org> wrote:
> >
> > We might make it so: but that would mean restricting certain clone_flags 
> > variants - not sure that's possible with our current ABI usage?
> 
> We already do that. You can't share signal info unless you share the mm. And a 
> shared signal state is what defines a thread group.
> 
> So I think the only issue is that ->mm can become NULL when the thread group 
> leader dies - a non-NULL mm should always be shared among all threads.

Indeed, we do that in exit_mm().

So we could add tsk->mm_leader or so, which does not get cleared and which the 
scheduler does not look at, but I'm not sure it's entirely safe that way: we don't 
have a refcount, and when the last thread exits it becomes bogus for a small 
window until the zombie leader is unlinked from the task list.

To close that race we'd have __mmdrop() or so clear out tsk->mm_leader - but the 
task doing the mmdrop() might be a lazy thread totally unrelated to the original 
thread group so we don't know which tsk->mm_leader to clear out.

To solve that we'd have to track the leader owning an MM in mm_struct - which gets 
interesting for the exec() case where the thread group gets a new leader, so we'd 
have to re-link the mm's leader pointer there.

So unless I missed some simpler solution there a good number of steps where this 
could go wrong, in small looking race windows - how about we just live with 
iterating through all tasks instead of just all processes, once per 512 GB of 
memory mapped?

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/