linux-kernel - Re: [PATCH v4 0/7] kernel tinification: optionally compile out splice family of syscalls (splice, vmsplice, tee and sendfile)

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20141125185310.GA24891@cloud>
Date:	Tue, 25 Nov 2014 10:53:10 -0800
From:	josh@...htriplett.org
To:	David Miller <davem@...emloft.net>
Cc:	rdunlap@...radead.org, pieter@...sman.nl,
	alexander.h.duyck@...el.com, viro@...iv.linux.org.uk,
	ast@...mgrid.com, akpm@...ux-foundation.org, beber@...eeweb.net,
	catalina.mocanu@...il.com, dborkman@...hat.com,
	edumazet@...gle.com, ebiederm@...ssion.com, fabf@...net.be,
	fuse-devel@...ts.sourceforge.net, geert@...ux-m68k.org,
	hughd@...gle.com, iulia.manda21@...il.com, JBeulich@...e.com,
	bfields@...ldses.org, jlayton@...chiereds.net,
	linux-api@...r.kernel.org, linux-fsdevel@...r.kernel.org,
	linux-kernel@...r.kernel.org, linux-nfs@...r.kernel.org,
	mcgrof@...e.com, mattst88@...il.com, mgorman@...e.de,
	mst@...hat.com, miklos@...redi.hu, netdev@...r.kernel.org,
	oleg@...hat.com, Paul.Durrant@...rix.com,
	paulmck@...ux.vnet.ibm.com, pefoley2@...oley.com, tgraf@...g.ch,
	therbert@...gle.com, trond.myklebust@...marydata.com,
	willemb@...gle.com, xiaoguangrong@...ux.vnet.ibm.com,
	zhenglong.cai@...c.com.cn
Subject: Re: [PATCH v4 0/7] kernel tinification: optionally compile out
 splice family of syscalls (splice, vmsplice, tee and sendfile)

On Tue, Nov 25, 2014 at 12:13:05PM -0500, David Miller wrote:
> From: Randy Dunlap <rdunlap@...radead.org>
> Date: Tue, 25 Nov 2014 08:17:58 -0800
> 
> > Is the splice family of syscalls the only one that tiny has identified
> > for optional building or can we expect similar treatment for other
> > syscalls?
> > 
> > Why will many embedded systems not need these syscalls?  You know
> > exactly what apps they run and you are positive that those apps do
> > not use splice?
> 
> I think starting to compile out system calls is a very slippery
> slope we should not begin the journey down.
> 
> This changes the forward facing interface to userspace.

It's not a "slippery slope"; it's been our standard practice for ages.
We started down that road long, long ago, when we first introduced
Kconfig and optional/modular features.  /dev/* are user-facing
interfaces, yet you can compile them out or make them modular.  /sys/*
and/proc/* are user-facing interfaces, yet you can compile part or all
of them out.  Filesystem names passed to mount are user-facing
interfaces, yet you can compile them out.  (Not just things like ext4;
think FUSE or overlayfs, which some applications will build upon and
require.)  Some prctls are optional, new syscalls like BPF or inotify or
process_vm_{read,write}v are optional, hardware interfaces are optional,
control groups are optional, containers and namespaces are optional,
checkpoint/restart is optional, KVM is optional, kprobes are optional,
kmsg is optional, /dev/port is optional, ACL support is optional, USB
support (as used by libusb) is optional, sound interfaces are optional,
GPU interfaces are optional, even futexes are optional.

For every single one of those, userspace programs or libraries may
depend on that functionality, and summarily exit if it doesn't exist,
perhaps with a warning that you need to enable options in your kernel,
or perhaps with a simple "Function not implemented" or "No such file or
directory".

Out of the entire list above and the many more where that came from,
what makes syscalls unique?  What's wildly different between
open("/dev/foo", ...) returning an error and sys_foo returning an error?
What makes syscalls so special out of the entire list above?  We're not
breaking the ability to run old userspace on a new kernel, which *must*
be supported, and that includes not just syscalls but all user-facing
interfaces; we don't break userspace.  But we've *never* guaranteed that
you can run old userspace on a new *allnoconfig* kernel.

All of these features will remain behind CONFIG_EXPERT, and all of them
warn that you can only use them if your userspace can cope.

I've actually been thinking of introducing a new CONFIG_ALL_SYSCALLS,
under which all the "enable support for foo syscall" can live, rather
than just piling all of them directly under CONFIG_EXPERT; that option
would then repeat in very clear terms the warning that if you disable
that option and then disable specific syscalls, you need to know exactly
what your target userspace uses.  That would group together this whole
family of options, and make it clearer what the implications are.

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/