[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200819134629.mvd4nupme7q2hmtz@wittgenstein>
Date: Wed, 19 Aug 2020 15:46:29 +0200
From: Christian Brauner <christian.brauner@...ntu.com>
To: "Eric W. Biederman" <ebiederm@...ssion.com>
Cc: Matthew Wilcox <willy@...radead.org>, peterz@...radead.org,
Christoph Hewllig <hch@...radead.org>,
linux-kernel@...r.kernel.org,
Linus Torvalds <torvalds@...ux-foundation.org>,
linux-arch@...r.kernel.org, Jonathan Corbet <corbet@....net>,
Yoshinori Sato <ysato@...rs.sourceforge.jp>,
Tony Luck <tony.luck@...el.com>,
Fenghua Yu <fenghua.yu@...el.com>,
Geert Uytterhoeven <geert@...ux-m68k.org>,
Ley Foon Tan <ley.foon.tan@...el.com>,
"David S. Miller" <davem@...emloft.net>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
x86@...nel.org, Arnd Bergmann <arnd@...db.de>,
Steven Rostedt <rostedt@...dmis.org>,
Stafford Horne <shorne@...il.com>,
Kars de Jong <jongk@...ux-m68k.org>,
Kees Cook <keescook@...omium.org>,
Greentime Hu <green.hu@...il.com>,
Mauro Carvalho Chehab <mchehab+huawei@...nel.org>,
Alexandre Chartre <alexandre.chartre@...cle.com>,
Masami Hiramatsu <mhiramat@...nel.org>,
Tom Zanussi <zanussi@...nel.org>,
Xiao Yang <yangx.jy@...fujitsu.com>, linux-doc@...r.kernel.org,
uclinux-h8-devel@...ts.sourceforge.jp, linux-ia64@...r.kernel.org,
linux-m68k@...ts.linux-m68k.org, sparclinux@...r.kernel.org,
kgdb-bugreport@...ts.sourceforge.net,
linux-kselftest@...r.kernel.org
Subject: Re: [PATCH 00/11] Introduce kernel_clone(), kill _do_fork()
On Wed, Aug 19, 2020 at 08:32:59AM -0500, Eric W. Biederman wrote:
> Matthew Wilcox <willy@...radead.org> writes:
>
> > On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote:
> >> On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@...radead.org wrote:
> >> > On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
> >> > > On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
> >> > > > The only remaining function callable outside of kernel/fork.c is
> >> > > > _do_fork(). It doesn't really follow the naming of kernel-internal
> >> > > > syscall helpers as Christoph righly pointed out. Switch all callers and
> >> > > > references to kernel_clone() and remove _do_fork() once and for all.
> >> > >
> >> > > My only concern is around return type. long, int, pid_t ... can we
> >> > > choose one and stick to it? pid_t is probably the right return type
> >> > > within the kernel, despite the return type of clone3(). It'll save us
> >> > > some work if we ever go through the hassle of growing pid_t beyond 31-bit.
> >> >
> >> > We have at least the futex ABI restricting PID space to 30 bits.
> >>
> >> Ok, looking into kernel/futex.c I see
> >>
> >> pid_t pid = uval & FUTEX_TID_MASK;
> >>
> >> which is probably what this referes to and /proc/sys/kernel/threads-max
> >> is restricted to FUTEX_TID_MASK.
> >>
> >> Afaict, that doesn't block switching kernel_clone() to return pid_t. It
> >> can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at
> >> userspace. But it means that _if_ we were to change the size of pid_t
> >> we'd likely need a new futex API.
> >
> > Yes, there would be a lot of work to do to increase the size of pid_t.
> > I'd just like to not do anything to make that harder _now_. Stick to
> > using pid_t within the kernel.
>
> Just so people are aware. If you look in include/linux/threads.h you
> can see that the maximum value of PID_MAX_LIMIT limits pids to 22 bits.
>
> Further the design decisions of pids keeps us densly using pids. So I
> expect it will be a while before we even come close to using 30 bits of
> pid space.
Also because it's simply annoying to have to type really large pid
numbers on the shell. Yes yes, that's a very privileged
developer-centric complaint but it matters when you have to do a quick
kill -9. Chromebook users obviously won't care about how large their
pids are for sure.
Tbf, related to discussions last year, systemd now actually raises the
default limit from ~33000 to 4194304. Which seems like an ok compromise.
Christian
Powered by blists - more mailing lists