[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150612080425.GC8759@gmail.com>
Date: Fri, 12 Jun 2015 10:04:25 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Waiman Long <Waiman.Long@...com>,
Thomas Gleixner <tglx@...utronix.de>,
Denys Vlasenko <dvlasenk@...hat.com>,
Borislav Petkov <bp@...en8.de>,
Andrew Morton <akpm@...ux-foundation.org>,
Oleg Nesterov <oleg@...hat.com>,
Andy Lutomirski <luto@...capital.net>,
linux-mml@...r.kernel.org,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
Brian Gerst <brgerst@...il.com>,
"H. Peter Anvin" <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH 07/12] x86/virt/guest/xen: Remove use of pgd_list from
the Xen guest code
* Linus Torvalds <torvalds@...ux-foundation.org> wrote:
> On Jun 12, 2015 00:23, "Ingo Molnar" <mingo@...nel.org> wrote:
> >
> > We might make it so: but that would mean restricting certain clone_flags
> > variants - not sure that's possible with our current ABI usage?
>
> We already do that. You can't share signal info unless you share the mm. And a
> shared signal state is what defines a thread group.
>
> So I think the only issue is that ->mm can become NULL when the thread group
> leader dies - a non-NULL mm should always be shared among all threads.
Indeed, we do that in exit_mm().
So we could add tsk->mm_leader or so, which does not get cleared and which the
scheduler does not look at, but I'm not sure it's entirely safe that way: we don't
have a refcount, and when the last thread exits it becomes bogus for a small
window until the zombie leader is unlinked from the task list.
To close that race we'd have __mmdrop() or so clear out tsk->mm_leader - but the
task doing the mmdrop() might be a lazy thread totally unrelated to the original
thread group so we don't know which tsk->mm_leader to clear out.
To solve that we'd have to track the leader owning an MM in mm_struct - which gets
interesting for the exec() case where the thread group gets a new leader, so we'd
have to re-link the mm's leader pointer there.
So unless I missed some simpler solution there a good number of steps where this
could go wrong, in small looking race windows - how about we just live with
iterating through all tasks instead of just all processes, once per 512 GB of
memory mapped?
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists