linux-kernel - Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for execveat(2)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150109233725.GA4574@brightrain.aerifal.cx>
Date:	Fri, 9 Jan 2015 18:37:25 -0500
From:	Rich Felker <dalias@...ifal.cx>
To:	Andy Lutomirski <luto@...capital.net>
Cc:	Al Viro <viro@...iv.linux.org.uk>,
	David Drysdale <drysdale@...gle.com>,
	"Michael Kerrisk (man-pages)" <mtk.manpages@...il.com>,
	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Meredydd Luff <meredydd@...atehouse.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	David Miller <davem@...emloft.net>,
	Thomas Gleixner <tglx@...utronix.de>,
	Stephen Rothwell <sfr@...b.auug.org.au>,
	Oleg Nesterov <oleg@...hat.com>,
	Ingo Molnar <mingo@...hat.com>,
	"H. Peter Anvin" <hpa@...or.com>,
	Kees Cook <keescook@...omium.org>,
	Arnd Bergmann <arnd@...db.de>,
	Christoph Hellwig <hch@...radead.org>, X86 ML <x86@...nel.org>,
	linux-arch <linux-arch@...r.kernel.org>,
	Linux API <linux-api@...r.kernel.org>,
	sparclinux@...r.kernel.org
Subject: Re: [PATCHv10 man-pages 5/5] execveat.2: initial man page for
 execveat(2)

On Fri, Jan 09, 2015 at 03:24:12PM -0800, Andy Lutomirski wrote:
> On Fri, Jan 9, 2015 at 3:12 PM, Rich Felker <dalias@...ifal.cx> wrote:
> > On Fri, Jan 09, 2015 at 10:57:43PM +0000, Al Viro wrote:
> >> On Fri, Jan 09, 2015 at 05:42:52PM -0500, Rich Felker wrote:
> >>
> >> > Here's a very simple way it could work -- it could put the O_PATH fd
> >> > on a previously-unused fd number, and put a special flag on the fd,
> >> > like FD_CLOEXEC, but that causes the kernel to close it whenever it's
> >> > opened. The pathname passed could then simply be /dev/fd/%d or
> >> > /proc/self/fd/%d, and although this is presently dependent on /proc
> >> > being mounted, virtual /dev/fd/* could someday be something completely
> >> > independent of procfs. The kernel keeps all the freedom to choose how
> >> > to pass the name to the interpreter. I'm not proposing any kernel
> >> > API/ABI lock-in and I'm with you in opposing such lock-in.
> >>
> >> Huh?  open() on procfs symlinks does *NOT* work the way - the symlink is
> >> traversed and after that point there is no information whatsoever how we
> >> got to that vfsmount/dentry pair.  I can imagine several kludges that would
> >> work, but they are unspeakably ugly, and do_last() is already far too
> >> convoluted as it is.
> >
> > I'm not sure where you're disagreeing with me. open of procfs symlinks
> > does not resolve the symlink and open the resulting pathname. They are
> > "magic symlinks" which are bound to the inode of the open file. I
> > don't see why this action, which is already special for magic
> > symlinks, can't check a flag on the magic symlink and possibly close
> > the corresponding file descriptor as part of its action.
> >
> > In any case, whether/how fexecve works with interpreters is something
> > the kernel can change without breaking userspace expectations. My goal
> > is to avoid creating any new API/ABI requirement here.
> 
> I think that, if we really want to support clean fexecve on O_CLOEXEC
> scripts some day, the right way to do it is to fix the script
> interface for real.  Have a special flag in the headers of script
> interpreters that support a new interface that says "when I'm a script
> interpreter, I expect an auxv entry AT_SCRIPT_FD with an  open fd with
> CLOEXEC set".  Then we can directly exec scripts by fd, even with
> O_CLOEXEC set, without any races.

This is also acceptable, but I don't think you'd really need a special
header flag. Just pass it, and also pass /dev/fd/%d or
/proc/self/fd/%d in argv[]. If the interpreter supports it, everything
works fine. If not, it still works as long as /proc is mounted, but
with a partial fd leak. (Note: the leak is not so bad since the
interpreter would inherit a close-on-exec fd and thus would not leak
it further.)

Aside from setting up the new auxv entry, the main trick the kernel
would have to do is bypassing FD_CLOEXEC at exec time while keeping
the FD_CLOEXEC flag present on the fd after exec.

Rich
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/