[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGudoHGWrf5sD+6nKjhxZdFOwDP=pArH1cEjpveYbrZ_4WNXEQ@mail.gmail.com>
Date: Sun, 26 Oct 2025 15:49:34 +0100
From: Mateusz Guzik <mjguzik@...il.com>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Alexey Gladkov <legion@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
David Howells <dhowells@...hat.com>, "Paul E. McKenney" <paulmck@...nel.org>, linux-kernel@...r.kernel.org
Subject: Re: [RFC 2/1] kill task_ucounts()->rcu_read_lock(), add __task_ucounts()
On Sun, Oct 26, 2025 at 3:36 PM Oleg Nesterov <oleg@...hat.com> wrote:
>
> On 10/26, Oleg Nesterov wrote:
> >
> > NOTE: task_ucounts() returns the pointer to another rcu-protected data,
> > struct ucounts. So it should either be used when task->real_cred and thus
> > task->real_cred->ucounts is stable (release_task, copy_process, copy_creds),
> > or it should be called under rcu_read_lock(). In both cases it is pointless
> > to take rcu_read_lock() to read the cred->ucounts pointer.
>
> So I think task_ucounts() can just do
>
> /* The caller must ensure that ->real_cred is stable or take rcu_read_lock() */
> #define task_ucounts(task) \
> rcu_dereference_check((task)->real_cred, 1)->ucounts
>
> but this removes the lockdep checks altogether.
>
> But, otoh, task_cred_xxx(t, ucounts) (or, say, task_cred_xxx(task, user_ns)) can
> hide the problem. Lockdep won't complain if, for example, we remove rcu_read_lock()
> in task_sig() around get_rlimit_value(task_ucounts(p)). So perhaps something like
> below makes any sense?
>
>
> diff --git a/include/linux/cred.h b/include/linux/cred.h
> index 89ae50ad2ace..7078159486f0 100644
> --- a/include/linux/cred.h
> +++ b/include/linux/cred.h
> @@ -347,7 +347,14 @@ DEFINE_FREE(put_cred, struct cred *, if (!IS_ERR_OR_NULL(_T)) put_cred(_T))
>
> #define task_uid(task) (task_cred_xxx((task), uid))
> #define task_euid(task) (task_cred_xxx((task), euid))
> -#define task_ucounts(task) (task_cred_xxx((task), ucounts))
> +
> +// ->real_cred must be stable
> +#define __task_ucounts(task) \
> + rcu_dereference_protected((task)->real_cred, 1)->ucounts
> +
While this indeed should be fine, this invites potential for misuse
which perhaps can be warned about. It's not a big deal and can be
ignored, but should you be willing to further massage this:
As is, this is legally callable for tasks which are still under
construction or are already dead. Maybe there is a WARN_ON to that
effect which can be trivially slapped in for the non-rcu case?
As a nit in a nit, lack of a debug-only general kernel WARN_ON/BUG_ON
variants is discouraging frequent use and perhaps this could be used
as an impetus to add something of the sort, or a justification for not
bothering to add the new check. ;) I'm definitely not going to try to
add something like that and I can't good conscience suggest anyone
tries that either.
tl;dr the patch LGTM, but consider the first nit. thanks. ;)
> +// needs rcu_read_lock()
> +#define task_ucounts(task) \
> + rcu_dereference((task)->real_cred)->ucounts
>
> #define current_cred_xxx(xxx) \
> ({ \
> diff --git a/kernel/cred.c b/kernel/cred.c
> index dbf6b687dc5c..edddecec82e5 100644
> --- a/kernel/cred.c
> +++ b/kernel/cred.c
> @@ -305,7 +305,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
> p->real_cred = get_cred_many(p->cred, 2);
> kdebug("share_creds(%p{%ld})",
> p->cred, atomic_long_read(&p->cred->usage));
> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> + inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> return 0;
> }
>
> @@ -342,7 +342,7 @@ int copy_creds(struct task_struct *p, u64 clone_flags)
> #endif
>
> p->cred = p->real_cred = get_cred(new);
> - inc_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> + inc_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> return 0;
>
> error_put:
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f041f0c05ebb..80b0f1114bd3 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -252,7 +252,7 @@ void release_task(struct task_struct *p)
>
> /* don't need to get the RCU readlock here - the process is dead and
> * can't be modifying its own credentials. */
> - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> + dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
>
> pidfs_exit(p);
> cgroup_release(p);
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 3da0f08615a9..f2a6a3cd14ef 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2048,7 +2048,7 @@ __latent_entropy struct task_struct *copy_process(
> goto bad_fork_free;
>
> retval = -EAGAIN;
> - if (is_rlimit_overlimit(task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> + if (is_rlimit_overlimit(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC))) {
> if (p->real_cred->user != INIT_USER &&
> !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN))
> goto bad_fork_cleanup_count;
> @@ -2486,7 +2486,7 @@ __latent_entropy struct task_struct *copy_process(
> bad_fork_cleanup_delayacct:
> delayacct_tsk_free(p);
> bad_fork_cleanup_count:
> - dec_rlimit_ucounts(task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> + dec_rlimit_ucounts(__task_ucounts(p), UCOUNT_RLIMIT_NPROC, 1);
> exit_creds(p);
> bad_fork_free:
> WRITE_ONCE(p->__state, TASK_DEAD);
>
Powered by blists - more mailing lists