[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120126103157.GE18613@jl-vm1.vm.bytemark.co.uk>
Date: Thu, 26 Jan 2012 10:31:57 +0000
From: Jamie Lokier <jamie@...reable.org>
To: Indan Zupancic <indan@....nu>
Cc: Denys Vlasenko <vda.linux@...glemail.com>,
Oleg Nesterov <oleg@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Andi Kleen <andi@...stfloor.org>,
Andrew Lutomirski <luto@....edu>,
Will Drewry <wad@...omium.org>, linux-kernel@...r.kernel.org,
keescook@...omium.org, john.johansen@...onical.com,
serge.hallyn@...onical.com, coreyb@...ux.vnet.ibm.com,
pmoore@...hat.com, eparis@...hat.com, djm@...drot.org,
segoon@...nwall.com, rostedt@...dmis.org, jmorris@...ei.org,
scarybeasts@...il.com, avi@...hat.com, penberg@...helsinki.fi,
viro@...iv.linux.org.uk, mingo@...e.hu, akpm@...ux-foundation.org,
khilman@...com, borislav.petkov@....com, amwang@...hat.com,
ak@...ux.intel.com, eric.dumazet@...il.com, gregkh@...e.de,
dhowells@...hat.com, daniel.lezcano@...e.fr,
linux-fsdevel@...r.kernel.org,
linux-security-module@...r.kernel.org, olofj@...omium.org,
mhalcrow@...gle.com, dlaor@...hat.com,
Roland McGrath <mcgrathr@...omium.org>
Subject: Re: Compat 32-bit syscall entry from 64-bit task!?
Indan Zupancic wrote:
> On Thu, January 26, 2012 02:08, Jamie Lokier wrote:
> > Is it disambiguated by PTRACE_EVENT_EXEC happening before the execve
> > returns, and you knowing the TID always changes to the PID? I haven't
> > yet checked which TID gets the PTRACE_EVENT_EXEC event, but if it's
> > not the old one, perhaps that could be changed.
>
> Please don't ever change the behaviour of PTRACE_EVENT_EXEC, it's
> barely documented already, but if if ever changes it will be also
> unreliable.
>
> It's still unclear if the PTRACE_EVENT_EXEC comes before or after
> or instead of the post-execve ptrace event. I guess before, but
> can I count on that? If it is after then I get a stray weird
> execve event that messes up the system call cadence.
It should be *sent* before because the exec steps must finish before
the execve() syscall "returns".
I'm not sure if the events are guaranteed to be received in the same
order as they are sent.
> >> > Thus, minimally we need one new option, PTRACE_O_TRACE_SYSENTRY -
> >> > "on syscall entry ptrace stop, set a nonzero event value in wait status"
> >> > , and two event values: PTRACE_EVENT_SYSCALL_ENTRY (for native entry),
> >> > PTRACE_EVENT_SYSCALL_ENTRY1 for compat one.
> >>
> >> Not all code wants to receive a syscall exit event all the time, so
> >> if you add PTRACE_O_TRACE_SYSENTRY, please add PTRACE_O_TRACE_SYSEXIT
> >> too. That would pretty much halve ptrace's overhead for my use case.
> >> But this is orthogonal to the compat problem.
> >
> > I agree. I would like to ignore the exit for most syscalls but see a
> > few of them. I guess PTRACE_SETOPTIONS could be used to toggle it,
> > with some overhead.
>
> Yes, that's what I had in mind.
>
> > But in the spirit of this thread,
> > PTRACE_O_TRACE_BPF would be even better, to completely ignore
> > irrelevant syscalls :-)
>
> Yes, that's the only reason I'm interested in BPF, really.
> Most system calls are either always allowed, or always denied.
> Of the ones that need checking, most of them have file paths.
> For those I'm not interested in the post-syscall event.
Same here, though for tracing file paths rather than blocking anything.
> >> > To future-proof this scheme we may reserve a few more event values
> >> > PTRACE_EVENT_SYSCALL_ENTRY2, PTRACE_EVENT_SYSCALL_ENTRY3, etc,
> >> > if we'll ever have arches with more than one non-native syscall
> >> > entry. I'm no expert, but looking at strace code, ARM may already have
> >> > more than one additional convention how to pass syscall args.
> >>
> >> Please, no! This way lays madness, just one PTRACE_EVENT_SYSCALL_ENTRY,
> >> no PTRACE_EVENT_SYSCALL_ENTRY1 or PTRACE_EVENT_SYSCALL_ENTRY2, that
> >> would be horrible. Keep arch specific stuff in arch specific areas,
> >> please don't spread it around.
> >>
> >> What was wrong with using eflags again? Is it too simple or something?
> >
> > Well it doesn't deal with the equivalent issue on ARM and PA-RISC.
>
> Those issues are not equivalent. ARM only has that OABI thing which
> is hopefully not used in practice.
I am still using OABI on some currently-sold and still-developed
devices with userspace libraries that I can't replace or rebuild.
Maybe I'm the only one, but the issue is still there. It should be
supported in ptrace() as long as it's supported in the kernel at all.
I don't know if the PA-RISC thing is real.
But it's occurred to me that there are a lot of 32/64 archs now (I was
extracting all their syscall number tables last night), and it would
be good if there were a consistent, arch-independent way to signal if
the syscall number is in the 32 or 64-bit table - or at least, in the
same ABI as the tracer gets from <asm/unistd.h>. For tracers doing
simple things to avoid needing a ton of arch-specific knowledge.
-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists