[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAK7LNASEHi6e4AyqDvCH_94DQ6AVWeS8yw0Sz4nHdaB=CMVAtQ@mail.gmail.com>
Date: Sun, 14 Mar 2021 14:10:36 +0900
From: Masahiro Yamada <masahiroy@...nel.org>
To: Willy Tarreau <w@....eu>
Cc: Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
linux-api@...r.kernel.org
Subject: Re: Why is the bit size different between a syscall and its wrapper?
Willy,
Thanks for the explanation.
On Fri, Mar 12, 2021 at 12:27 PM Willy Tarreau <w@....eu> wrote:
>
> On Fri, Mar 12, 2021 at 11:48:11AM +0900, Masahiro Yamada wrote:
> > Hi.
> >
> > I think I am missing something, but
> > is there any particular reason to
> > use a different bit size between
> > a syscall and its userspace wrapper?
> >
> >
> >
> > For example, for the unshare syscall,
> >
> > unshare(2) says the parameter is int.
> >
> >
> > SYNOPSIS
> > #define _GNU_SOURCE
> > #include <sched.h>
> >
> > int unshare(int flags);
> >
> >
> >
> >
> > In the kernel, it is unsigned long.
> >
> >
> > SYSCALL_DEFINE1(unshare, unsigned long, unshare_flags)
> > {
> > return ksys_unshare(unshare_flags);
> > }
>
> The syscalls must have a well defined interface for a given architecture.
> Thus in practice the ABI will define that arg1 goes into this register,
> arg2 into this one etc, regardless of their type (plenty of them are
> pointers for example). The long is the size of a register so it can carry
> any of the types we care about. So by defining each syscall as a function
> taking 1 to 6 fixed-size arguments you can implement about all syscalls.
>
> Regarding the libc, it has to offer an interface which is compatible with
> the standard definition of the syscalls as defined by POSIX or as commonly
> found on other OSes, and this regardless of the platform.
>
> For example look at recv(), it takes an int, a pointer, a size_t and an
> int. It requires to be defined like this for portability, but at the OS
> level all these will typically be passed as a register each.
>
You are right.
Functions in POSIX such as 'recv' should be portable with other OSes.
For the syscall ABI level, we have more freedom to choose
parameter types more convenient for the kernel.
IIUC, 'unshare' seems to be Linux-specific, and
I think "other OSes" do not exist.
Using types that have the same width as registers
avoids the ambiguity about the upper 32-bits
in 64-bit registers anyway. This is a benefit.
Historically, it caused a issue:
https://nvd.nist.gov/vuln/detail/CVE-2009-0029
We do not need to be worried since
commit 1a94bc34768e463a93cb3751819709ab0ea80a01.
All parameters are properly sign-extended by
forcibly casting to (long).
--
Best Regards
Masahiro Yamada
Powered by blists - more mailing lists