linux-kernel - Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re: [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120119161127.GP7180@jl-vm1.vm.bytemark.co.uk>
Date:	Thu, 19 Jan 2012 16:11:27 +0000
From:	Jamie Lokier <jamie@...reable.org>
To:	Indan Zupancic <indan@....nu>
Cc:	Chris Evans <scarybeasts@...il.com>,
	Andi Kleen <andi@...stfloor.org>,
	Andrew Lutomirski <luto@....edu>,
	Oleg Nesterov <oleg@...hat.com>,
	Will Drewry <wad@...omium.org>, linux-kernel@...r.kernel.org,
	keescook@...omium.org, john.johansen@...onical.com,
	serge.hallyn@...onical.com, coreyb@...ux.vnet.ibm.com,
	pmoore@...hat.com, eparis@...hat.com, djm@...drot.org,
	torvalds@...ux-foundation.org, segoon@...nwall.com,
	rostedt@...dmis.org, jmorris@...ei.org, avi@...hat.com,
	penberg@...helsinki.fi, viro@...iv.linux.org.uk, mingo@...e.hu,
	akpm@...ux-foundation.org, khilman@...com, borislav.petkov@....com,
	amwang@...hat.com, ak@...ux.intel.com, eric.dumazet@...il.com,
	gregkh@...e.de, dhowells@...hat.com, daniel.lezcano@...e.fr,
	linux-fsdevel@...r.kernel.org,
	linux-security-module@...r.kernel.org, olofj@...omium.org,
	mhalcrow@...gle.com, dlaor@...hat.com,
	Roland McGrath <mcgrathr@...omium.org>
Subject: Re: Compat 32-bit syscall entry from 64-bit task!? [was: Re:
 [RFC,PATCH 1/2] seccomp_filters: system call filtering using BPF]

Indan Zupancic wrote:
> On Thu, January 19, 2012 09:16, Chris Evans wrote:
> > On Wed, Jan 18, 2012 at 4:14 PM, Indan Zupancic <indan@....nu> wrote:
> >> On Wed, January 18, 2012 22:13, Chris Evans wrote:
> >>> On Wed, Jan 18, 2012 at 4:12 AM, Indan Zupancic <indan@....nu> wrote:
> >>>> On Wed, January 18, 2012 06:43, Chris Evans wrote:
> >>>>> 2) Tracee traps
> >>>>> 2b) Tracee could take a SIGKILL here
> >>>>> 3) Tracer looks at registers; bad syscall
> >>>>> 3b) Or tracee could take a SIGKILL here
> >>>>> 4) The only way to stop the bad syscall from executing is to rewrite
> >>>>> orig_eax (PTRACE_CONT + SIGKILL only kills the process after the
> >>>>> syscall has finished)
> >>>>
> >>>> Yes, we rewrite it to -1.
> >>>>
> >>>>> 5) Disaster: the tracee took a SIGKILL so any attempt to address it by
> >>>>> pid (such as PTRACE_SETREGS) fails.
> >>>>
> >>>> I assume that if a task can execute system calls and we get ptrace events
> >>>> for that, that we can do other ptrace operations too. Are you saying that
> >>>> the kernel has this ptrace gap between SIGKILL and task exit where ptrace
> >>>> doesn't work but the task continues executing system calls? That would be
> >>>> a huge bug, but it seems very unlikely too, as the task is stopped and
> >>>> shouldn't be able to disappear till it is continued by the tracer.
> >>>>
> >>>> I mean, really? That would be stupid.
> >>
> >> Okay, I tested this scenario and you're right, we're screwed.
> >>
> >> What the hell guys?
> >
> > Steady on :) ptrace() has never been sold as a technology upon which
> > its safe to build security solutions.
> 
> Well, that can be said of pretty much all kernel functionality.
> That is no excuse for crazy behaviour.
> 
> I more or less fixed it by turning all SIGKILLs into SIGTERMs.
> Perhaps I should use a more obscure signal instead.
> 
> >> What about other PID checks in the kernel, are they still
> >> safe if the process looks dead but is still active? Or is it a ptrace-only
> >> problem?
> >>
> >>>> If true we have to work around it by disallowing SIGKILL and just sending
> >>>> them ourselves within the jail. Meh.
> >>
> >> I guess this helps a bit. It doesn't prevent external signals, but prisoners
> >> don't have control over that.
> >
> > Well.... a prisoner may be able to play other tricks:
> > - Allocate lots of memory... kernel may start spraying around SIGKILLs
> > - Sending SIGKILL via prctl()
> 
> prctl is disallowed within our jail. Did you had PR_SET_PDEATHSIG in mind?
> But doesn't the tracer become the parent when ptracing or not for this?
> Or were you thinking about enabling SECCOMP and counting on the SIGKILL
> being process-wide instead of thread-specific?
> 
> > - Sending SIGKILL via fcntl()
> 
> I haven't written the fcntl demultiplexor yet, but I missed fcntl could
> be used for sending signals. I knew there was whacky stuff in there, but
> didn't expect it to be that bad. Thanks.
> 
> > - Sending SIGKILL via clone()
> 
> How? And can you send it to another process than yourself?
> 
> >
> >>
> >> Is this SIGKILL specific or is it true for all task ending signals?
> >
> > Can't remember - try it?
> 
> Tried: It's safe with SIGTERM, so I assume the others are fine too.
> I'll double check though...
> 
> >>
> >>>> How will you avoid file path races with BPF?
> >>>
> >>> There is typically no need for file-path based access control in an FTP server.
> >>> Take for example anonymous FTP, which will typically be inside a
> >>> chroot() to /var/ftp. Inside that filesystem tree -- if you can open()
> >>> it, you can have it.
> >>
> >> Ah, you count on having root access. We don't.
> >>
> >> Do you know any more crazy security destroying holes?
> >
> > Try spraying SIGCONT and / or SIGSTOP at tracees. It may be possible
> > to confuse the tracer about whether a SIGTRAP event is syscall entry
> > or exit.
> 
> Yes, heard about that weirdness before, but it's all ignored. We're
> using PTRACE_O_TRACESYSGOOD.
> 
> > Try doing an execve() that fails. May cause similar state confusion in
> > the tracer.
> 
> Our jailer pretty much ignores all signals and only handles syscalls
> and task exits. We actually check execve's return value to know if we
> have to do our stuff or not.

Take a look at the file README-linux-ptrace in recent strace Git.
(Thanks Denys!)

It describes some *really* ugly things Linux does to ptrace on execve
when there are threads: The most exciting being the return value is
sent to a different tid than called execve(), and other tids magically
disappear without notification.

You can use PTRACE_O_TRACEEXEC to see if the execve() succeeds, btw.
It has the useful side-effect of preventing the legacy behaviour of
SIGTRAP being sent as a normal queued signal after successful execve().

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/