linux-kernel - More documentation: system call how-to

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-Id: <200708011806.l71I6v9N002535@devserv.devel.redhat.com>
Date:	Wed, 1 Aug 2007 14:06:57 -0400
From:	Ulrich Drepper <drepper@...hat.com>
To:	linux-kernel@...r.kernel.org
Cc:	akpm@...ux-foundation.org
Subject: More documentation: system call how-to

How about adding the attached text to the Documentation directory?  I
had to correct over the years to one or the other system call design
problems.  Other problems couldn't be corrected anymore and we have to
live with them.  Maybe spelling out the rules explicitly will help a bit.

I've added a few rules I could think of right now.  What should be
added as well is a rule for 64-bit parameters on 32-bit platforms.  I
leave this to the s390 people who have the biggest restrictions when
it comes to this.

Signed-off-by: Ulrich Drepper <drepper@...hat.com>

Rules for designing new system calls
------------------------------------

1. Do not use multiplexing system calls.

   A practical argument is that it invariably reduces the number of
   available parameters to the system call which will haunt people who
   have to care about architectures with a limited set of registers
   reserved for this purpose.

   Another aspect is that it is most likely slower.  The caller in
   most cases knows exactly which sub-function of the system call is
   needed.  If the decision about the sub-function is dynamic the
   computation of the code could just as well be a computation of a
   system call number.  The difference lies in the kernel where the
   multiplexing always has to happen, even if the required
   sub-function is known to the caller ahead of time.

   Adding new system calls is much cheaper: it is a word in a table.
   This is much less code and data than the switch statement or
   if-cascade needed to implement the multiplexer.

   Bad examples: sys_socketcall on x86, sys_futex, and several more

2. Use of ENOSYS:

   The runtime has to be able to distinguish non-existing system calls
   due to old kernel versions from error conditions in an implemented
   system call.  This means the ENOSYS error should never be used in
   an error condition once a system call is implemented.

   Example: In sys_fallocate, if the file system does not implement the
   fallocate operation, return EOPNOTSUPP and not ENOSYS.

   There is one exception to the rule: if rule #1 is violated and a
   multiplexer system call is used, invalid sub-function codes should
   be signaled using ENOSYS.

   Example: sys_futex

3. Choose parameters for growth

   It makes today no sense anymore to implement any system call which
   restricts even on 32-bit machines the size of values indicating
   file sizes or offsets to 32-bits.  64-bit values should be used
   throughout.

   Example: sys_fadvise64, which should have been defined from day 1
   like sys_fadvise64_64.

   Similarly, timeout granularity of seconds is not suitable anymore.
   Most interfaces use nano-second resolution and a often used way
   to specify such times and intervals is using the timespec structure.

4. 32-bit compatibility

   Kernels for architectures like x86-64 and PPC64 have to be able to
   execute 32-bit binaries as well.  The implementation of the actual
   system calls is of course shared.  The types for the system call
   parameters and return values on 32-bit and 64-bit systems can be
   different.  This is where compatibility wrappers come in.

   These functions, usually named compat_sys_XYZ for a system call
   sys_XYZ, are only needed in case the system call parameter is
   a pointer to a structure which has a different representation in
   32- and 64-bit mode.  Differences in size of integer or pointer
   arguments does not require a compatibility wrapper.

   Examples: compat_sys_utimensat, which has to convert a timespec
   structure from 32-bit to 64-bit.  See also rule #3.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/