linux-kernel - Re: Compat syscall instrumentation and return from execve issue

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <1800505568.71478.1447119541972.JavaMail.zimbra@efficios.com>
Date:	Tue, 10 Nov 2015 01:39:01 +0000 (UTC)
From:	Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To:	rostedt <rostedt@...dmis.org>
Cc:	Andy Lutomirski <luto@...capital.net>,
	Andy Lutomirski <luto@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>,
	lttng-dev <lttng-dev@...ts.lttng.org>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: Compat syscall instrumentation and return from execve issue

----- On Nov 9, 2015, at 4:12 PM, rostedt rostedt@...dmis.org wrote:

> On Mon, 9 Nov 2015 12:57:06 -0800
> Andy Lutomirski <luto@...capital.net> wrote:
> 
>> > The solution I suggested wouldn't touch any asm code. The only change
>> > would be to reserve the TS_EXECVE flag. Actually, come to think of it,
>> > we could have Mathieu's TS_ORIG_COMPAT flag, and still only have the
>> > tracepoint syscall set it, such that the matching tracepoint syscall
>> > exit would know that the initial call was COMPAT or not.
>> 
>> Someone needs to clear TS_EXECVE, though.
> 
> Well, it gets set and cleared by the syscall enter (same for
> TS_ORIG_COMPAT), and exit for that matter.
> 
> It's trivial to have a tracepoint hook added when either system call
> enter or exit tracepoints are enabled. Thus, the setting and clearing of
> the flag can be done by another callback at those tracepoints.

There is one issue with relying on the tracepoint hook on system call
enter to set the status flag (whichever of TS_EXECVE or TS_ORIG_COMPAT):
let's suppose a thread is preempted for a rather long time between
syscall enter and syscall exit, within an execve system call. At that
point, we enable syscall tracing. This means we may have missed setting
or clearing TS_ORIG_COMPAT, and we then hit the syscall exit tracepoint
with the flag uninitialized.

So if we go for this kind of flag solution, we have two choices:

1) We always set/clear the TS_ORIG_COMPAT flag on system call entry, not
   just within a tracepoint which can be dynamically wired up at arbitrary
   point in time.

2) We set/clear the TS_ORIG_COMPAT flag within the syscall entry tracepoint,
   but whenever we wire up that tracepoint, we iterate on all existing threads
   to figure out if a thread is currently running or preempted within an
   execve system call.

Option 2 seems rather more complicated, but has the upside of not setting
the flag when tracing is inactive. I'm really not sure that the tiny overhead
of setting a flag non-atomically is worth the trouble of doing option 2
though.

> 
>> 
>> >
>> > The goal is only to make sure that the system call exit tracepoint
>> > matches the system call enter tracepoint.
>> >
>> > The system call enter would set or clear the TS_ORIG_COMPAT if the
>> > TS_COMPAT is set when entering the system call, and it would check that
>> > flag when exiting the system call.
>> 
>> This seems a bit odd, though, since we aren't very good about
>> preserving the syscall nr or the args through syscall processing.  In
>> any event, in the new improved x86 syscall code, we know what arch we
>> are just by following the control flow, so no flags should be needed.
>> Hence my suggestion of just adding an "unsigned int arch" to the
>> return slowpath.
> 
> I guess I don't understand this "unsigned int arch".
> 
> When the execve system call is called, it's running in x86_64 mode, and
> then the execve changes the state to ia32 bit mode. Then on return, the
> tracepoint system call exit, has the x86_64 system call number, but if
> it checks to see what state the task is in, it will see ia32 state, and
> then report the number for ia32 instead.
> 
> For example, in x86_64, execve is 59, and that number is passed to the
> system call enter tracepoint. Now on return of the system call, the
> system call exit tracepoint gets called with 59 as the system call as
> well, but if that tracepoint checks the state, it will think its
> returning the "olduname" system call (that's 59 for ia32).
> 
> What change are you making to solve this?

I share your concern that Andy's proposal does not appear to address the
issue at hand. But I may be missing something too. Our issue is not about
knowing the current architecture when returning from execve system call;
we very well know that with is_compat_arch(). The issue is the mismatch
between the system call number that led us there and the current arch
when returning from execve to userspace.

Thanks,

Mathieu

> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/