[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87tud1s5r9.fsf@email.froward.int.ebiederm.org>
Date: Mon, 14 Feb 2022 11:43:54 -0600
From: "Eric W. Biederman" <ebiederm@...ssion.com>
To: Solar Designer <solar@...nwall.com>
Cc: linux-kernel@...r.kernel.org, Alexey Gladkov <legion@...nel.org>,
Kees Cook <keescook@...omium.org>,
Shuah Khan <shuah@...nel.org>,
Christian Brauner <brauner@...nel.org>,
Ran Xiaokai <ran.xiaokai@....com.cn>,
Michal Koutn?? <mkoutny@...e.com>, stable@...r.kernel.org
Subject: Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling
during setuid()+execve
"Eric W. Biederman" <ebiederm@...ssion.com> writes:
> Solar Designer <solar@...nwall.com> writes:
>
>> On Thu, Feb 10, 2022 at 08:13:19PM -0600, Eric W. Biederman wrote:
>>> As of commit 2863643fb8b9 ("set_user: add capability check when
>>> rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
>>> should check RLIMIT_NPROC is buggy, as it tests the capabilites from
>>> before the credential change and not aftwards.
>>>
>>> As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
>>> ucounts") examining the rlimit is buggy as cred->ucounts has not yet
>>> been properly set in the new credential.
>>>
>>> Make the code correct and more robust moving the test to see if
>>> execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
>>> of the rest of the logic into execve() itself.
>>>
>>> As the flag only indicateds that RLIMIT_NPROC should be checked
>>> in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.
>>>
>>> Cc: stable@...r.kernel.org
>>> Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@suse.com
>>> Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@suse.com
>>> Reported-by: Michal Koutn?? <mkoutny@...e.com>
>>> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
>>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@...ssion.com>
>>
>> On one hand, this looks good.
>>
>> On the other, you asked about the Apache httpd suexec scenario in the
>> other thread, and here's what this means for it (per my code review):
>>
>> In that scenario, we have two execve(): first from httpd to suexec, then
>> from suexec to the CGI script. Previously, the limit check only
>> occurred on the setuid() call by suexec, and its effect was deferred
>> until execve() of the script. Now wouldn't it occur on both execve()
>> calls, because commit_creds() is also called on execve() (such as in
>> case the program is SUID, which suexec actually is)?
>
> Yes. Moving the check into commit_creds means that the exec after a
> suid exec will perform an RLIMIT_NPROC check and could possibly fail. I
> would call that a bug. Anything happening in execve should be checked
> and handled in execve as execve can fail.
>
> It also points out that our permission checks for increasing
> RLIMIT_NPROC are highly inconsistent.
>
> One set of permissions in fork().
> Another set of permissions in set*id() and delayed until execve.
> No permission checks for the uid change in execve.
>
> Every time I look into the previous behavior of RLIMIT_NPROC I seem
> to find issues. Currently I am planning a posting to linux-api
> so sorting out what when RLIMIT_NPROC should be enforced and how
> RLIMIT_NPROC gets accounted receives review. I am also planning a
> feature branch to deal with the historical goofiness.
>
> I really like how cleanly this patch seems to be. Unfortunately it is
> wrong.
Hmm. Maybe not as wrong as I thought. An suid execve does not change
the real user.
Still a bit wrong from a conservative change point of view because the
user namespace can change in setns and CLONE_NEWUSER which will change
the accounting now. Which with the ucount rlimit stuff changes where
things should be accounted.
I am playing with the idea of changing accounting aka (cred->ucounts &
cred->user) to only change in fork (aka clone without CLONE_THREAD) and
exec. I think that would make maintenance and cleaning all of this up
easier.
That would also remove serious complications from RLIMIT_SIGPENDING as
well.
I thought SIGPENDING was only a multi-threaded process issue but from
one signal to the next the set*id() family functions can be called.
Eric
Powered by blists - more mailing lists