[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <52964EC7.9000504@zytor.com>
Date:	Wed, 27 Nov 2013 11:57:59 -0800
From:	"H. Peter Anvin" <hpa@...or.com>
To:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Andy Lutomirski <luto@...capital.net>
CC:	Andi Kleen <andi@...stfloor.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	Andi Kleen <ak@...ux.intel.com>
Subject: Re: [PATCH] Add a text_poke syscall v2
On 11/26/2013 12:03 PM, Linus Torvalds wrote:
> On Tue, Nov 26, 2013 at 11:05 AM, Andy Lutomirski <luto@...capital.net> wrote:
>>
>> IIRC someone proposed that, rather than specifying a "handler", that any
>> user thread that traps just wait until the poke completes.  This would
>> complicate the kernel implementation a bit, but it would make the user
>> code a good deal simpler.  Is there any reason that this is a bad idea?
> 
> Please do it this way instead, because user space will get the
> callback version wrong and then - because it never actually triggers
> in practice in normal situations - it will cause very *very* subtle
> bugs that we can't fix.
> 
> Making the kernel serialize the accesses is the right thing to do.
> Just a new per-mm mutex should trivially do it, then you don't even
> have to check the "current->mm == bp_target_mm" thing at all, you just
> make the bp handler do a simple
> 
>     if (mutex_lock_interruptible(&mm->text_poke_mutex) >= 0)
>         mutex_unlock(&mm->text_poke_mutex);
> 
> and return. All done.
> 
> Plus the callback thing is pointless if we can do the instruction
> switch atomically (which would be true for UP, for single-thread, and
> potentially for certain sizes/alignments coupled with known rules for
> particular micro-architectures). So it's not a particularly good
> interface anyway.
> 
> Btw, I also think that there's a separate problem wrt shared pages.
> Should we perhaps only allow this in private mappings? Because right
> now it has that "current->mm == bp_target_mm" thing, and generally it
> only works on one particular mm, but by using "get_user_pages_fast(,
> 1,..)" it really only requires write permissions on the page. So it
> could be shared mapping, and I could easily see people doing that on
> purpose ("open executable file, then use text_poke() to change it for
> this architecture") and THAT DOES NOT WORK with the current patch if
> somebody else is also running that app..
> 
I have already said I don't think this is a good interface as it doesn't
provide sane semantics for batching changes -- the whole handler and
timeout mechanism (which isn't even implemented yet, and as a result
probably will never be used by userspace) is basically a hack for that.
The other alternative is that we only expose the "synchronize all cores"
operation as a system call, and let user space do the rest of the work.
 Anything else can be done in user space, and a lot of the subtleties
with locking pages, user space access and so on simply go away... user
space will be responsible or will get protection faults.
	-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Powered by blists - more mailing lists
 
