[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110822011645.GM2203@ZenIV.linux.org.uk>
Date: Mon, 22 Aug 2011 02:16:45 +0100
From: Al Viro <viro@...IV.linux.org.uk>
To: Andrew Lutomirski <luto@....edu>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
"H. Peter Anvin" <hpa@...or.com>, mingo@...hat.com,
Richard Weinberger <richard@....at>,
user-mode-linux-devel@...ts.sourceforge.net,
linux-kernel@...r.kernel.org
Subject: Re: SYSCALL, ptrace and syscall restart breakages (Re: [RFC] weird
crap with vdso on uml/i386)
On Sun, Aug 21, 2011 at 08:44:12PM -0400, Andrew Lutomirski wrote:
> This is, IMO, gross -- if the values in pt_regs matched what they were
> when sysenter / syscall was issued, then we'd be fine -- we could
> restart the syscall and everything would work. Apparently ptrace
> users have a problem with that, so we're stuck with the "lie" (i.e.
> reporting values as of __kernel_vsyscall, not as of the actual kernel
> entry).
Um, _no_. If nothing else, pt_regs is seen by sys_.... And they don't
bloody know or care how the syscall had been entered.
> Which suggests an easy-ish fix: if sysenter is used or if syscall is
> entered from the EIP is is supposed to be entered from, then just
> change ip in the argument save to point to the int 0x80 instruction.
> This might also require tweaking the userspace stack. That way,
> restart would hit int 0x80 instead of syscall/sysenter and the
> registers are exactly as expected.
Huh? Actions after SYSENTER differ from those after int 0x80. If nothing
else, you would need to tweak saved userland stack pointer as well. It is
possible, but I seriously doubt that it's a better way to deal with that
mess. And in any case, SYSEXIT buggers CX/DX, so we'd need two separate
post-syscall sequences in vdso. Yucky... I really don't like it.
The really ugly part for the SYSCALL variant is that right now we *can*
do things like this:
read_it:
pushl %ebp
movl $__NR_read, %eax
movl $0, %ebx
movl $array, %ebp
movl $100, %edx
syscall
movl $__USER32_DS, %ecx
movl %ecx, %ss
popl %ebp
ret
anywhere in your userland and have it act as an equivalent of
int read_it(void)
{
return read(0, array, 100);
}
Is that ability a part of userland ABI or are we declaring that hopelessly
wrong and require to go through the function in vdso32? Linus?
As it is, I don't see any cheap ways to deal with restarts if that thing
has to be preserved. For sysenter it's flatly prohibited and that allows
us to play such games with adjusted return address. Here, OTOH...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists