lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAObL_7HGJqP-hdXpdRLYkksfeVtQ1z1ngP58gdTnWwD57duaXA@mail.gmail.com>
Date:	Sun, 21 Aug 2011 09:37:18 -0400
From:	Andrew Lutomirski <luto@....edu>
To:	Al Viro <viro@...iv.linux.org.uk>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	"H. Peter Anvin" <hpa@...or.com>, mingo@...hat.com,
	Richard Weinberger <richard@....at>,
	user-mode-linux-devel@...ts.sourceforge.net,
	linux-kernel@...r.kernel.org
Subject: Re: SYSCALL, ptrace and syscall restart breakages (Re: [RFC] weird
 crap with vdso on uml/i386)

On Sun, Aug 21, 2011 at 7:24 AM, Andrew Lutomirski <luto@....edu> wrote:
> On Sun, Aug 21, 2011 at 4:42 AM, Al Viro <viro@...iv.linux.org.uk> wrote:
>> On Sun, Aug 21, 2011 at 07:34:43AM +0100, Al Viro wrote:
>>> Suppose we have a traced process.  foo6() is called and the thing it
>>> stopped before the sys_foo6() is reached kernel-side.  The sixth argument
>>> is on stack, ebp is set to user esp.  SYSENTER happens, we read the
>>> 6th argument from userland stack and put it along with the rest into
>>> pt_regs.  tracer examines the arguments, modifies them (including the last
>>> one) and lets the tracee run free - e.g. detaches from the tracee.
>>>
>>> What should happen if we happen to get a signal that would restart that
>>> sucker?  Granted, it's not going to happen with mmap() - it doesn't, AFAICS,
>>> do anything of that kind.  However, I wouldn't bet a dime on other 6-argument
>>> syscalls not stepping on that.  sendto() and recvfrom(), in particular...
>>>
>>> OK, we return to userland.  The sixth argument is placed into %ebp.  Linus'
>>> "pig and proud of that" trick works and we end up slapping userland
>>> %esp into %ebp and hitting SYSENTER again.  Only one problem, though -
>>> the sixth argument on user stack is completely unaffected by what tracer
>>> had done.  Unlike the rest of arguments, that *are* changed.
>>>
>>> We could deal with that in case of SYSENTER if we e.g. replaced that
>>>         jmp .Lenter_kernel
>>> with
>>>         jmp .Lrestart
>>> and added
>>> .Lrestart:
>>>       movl %ebp, (%esp)
>>>       jmp .Lenter_kernel
>>> but in case of SYSCALL it seems to be even messier...  Comments?
>>
>> Oh, hell...  Compat SYSCALL one is really buggered on syscall restarts,
>> ptrace or no ptrace.  Look: calling conventions for SYSCALL are
>>        arg1..5: ebx, ebp, edx, edi, esi.  arg6: stack
>> and after syscall restart we end up with
>>        arg1..5: ebx, ecx, edx, edi, esi.  arg6: ebp
>> so restart will instantly clobber arg2, in effect replacing it with arg6.
>>
>> And yes, adding ptrace to the mix makes things even uglier.  For one thing,
>> changes to ECX via ptrace are completely lost on the fast exit.  Not pretty,
>> and might make life painful for uml, but not for the majority of programs.
>> What's worse, combination of ptrace with restart will lose changes to arg6
>> (again, value on stack left as it was, changes to arg6 by tracer lost) *and*
>> it will lose changes to arg2 (along with arg2 itself - see above).
>>
>> Linus' Dirty Trick(tm) is not trivial to apply - with SYSCALL we *do* retain
>> the address of next insn and that's where we end up going.  IOW, SYSCALL not
>> inside vdso32 currently works (for small values of "works", due to restart
>> issues).  Playing with return elsewhere might break some userland code...
>>
>> Guys, that's *way* out of the area I'm comfortable with.
>>
>
> I don't see the point of all this hackery at all.  sysenter/sysexit
> indeed screws up some registers, but we can return on the iret path in
> the case of restart.
>
> So why do we lie to ptrace (and iret!) at all?  Why not just fill in
> pt_regs with the registers as they were (at least the
> non-clobbered-by-sysenter ones), set the actual C parameters correctly
> to contain the six arguments (in rdi, rsi, etc.), do the syscall, and
> return back to userspace without any funny business?  Is there some
> ABI reason that, once we've started lying to tracers, we have to keep
> doing so?

Gack.  Is this a holdover from the 32-bit code that shares the
argument save area with the parameters passed on the C stack?  If so,
we could just set up the argument save area honestly and pass the real
parameters in registers like 64-bit C code expects.

If the tracing and restart cases use iret to return to userspace, this
should all just work.  ptrace users shouldn't notice the overhead, and
syscall restart is presumably slow enough anyway that it wouldn't
matter.  The userspace entry code would be as simple as:

sysenter
ret

or

sysexit
ret

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ